Does anyone have spark code style guide xml file ?
Hello, Appreciate if you have xml file with the following style code ? https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide thanks.
Is there some open source tools which implements draggable widget and make the app runing in a form of DAG ?
Hello , I am trying to find some tools but useless. So, as title described, Is there some open source tools which implements draggable widget and make the app running in a form of DAG like workflow ? Thanks, Minglei.
Is there a test like MiniCluster example in Spark just like hadoop ?
Hello, I want to find some test file in spark which support the same function just like in Hadoop MiniCluster test environment. But I can not find them. Anyone know about that ?
转发: Error:scalac: Error: assertion failed: List(object package$DebugNode, object package$DebugNode)
I’m sorry. The error is not when I build spark occurs. It’s happen when running the example with LogisticRegreesionWithElasticNetExample.scala. 发件人: zml张明磊 [mailto:mingleizh...@ctrip.com] 发送时间: 2015年12月31日 15:01 收件人: user@spark.apache.org 主题: Error:scalac: Error: assertion failed: List(object package$DebugNode, object package$DebugNode) Hello, Recently, I build spark from apache/master and getting the following error. From stackoverflow http://stackoverflow.com/questions/24165184/scalac-assertion-failed-while-run-scalatest-in-idea, I can not find Preferences > Scala he said in Intellij IDEA. And SBT is not worked for me in our company. Use maven instead. How can I fix and work around it ? Last : happy new year to everyone. Error:scalac: Error: assertion failed: List(object package$DebugNode, object package$DebugNode) java.lang.AssertionError: assertion failed: List(object package$DebugNode, object package$DebugNode) at scala.reflect.internal.Symbols$Symbol.suchThat(Symbols.scala:1678) at scala.reflect.internal.Symbols$ClassSymbol.companionModule0(Symbols.scala:2988) at scala.reflect.internal.Symbols$ClassSymbol.companionModule(Symbols.scala:2991) at scala.tools.nsc.backend.jvm.GenASM$JPlainBuilder.genClass(GenASM.scala:1371) at scala.tools.nsc.backend.jvm.GenASM$AsmPhase.run(GenASM.scala:120) at scala.tools.nsc.Global$Run.compileUnitsInternal(Global.scala:1583) at scala.tools.nsc.Global$Run.compileUnits(Global.scala:1557) at scala.tools.nsc.Global$Run.compileSources(Global.scala:1553) at scala.tools.nsc.Global$Run.compile(Global.scala:1662) at xsbt.CachedCompiler0.run(CompilerInterface.scala:126) thanks, Minglei.
Error:scalac: Error: assertion failed: List(object package$DebugNode, object package$DebugNode)
Hello, Recently, I build spark from apache/master and getting the following error. From stackoverflow http://stackoverflow.com/questions/24165184/scalac-assertion-failed-while-run-scalatest-in-idea, I can not find Preferences > Scala he said in Intellij IDEA. And SBT is not worked for me in our company. Use maven instead. How can I fix and work around it ? Last : happy new year to everyone. Error:scalac: Error: assertion failed: List(object package$DebugNode, object package$DebugNode) java.lang.AssertionError: assertion failed: List(object package$DebugNode, object package$DebugNode) at scala.reflect.internal.Symbols$Symbol.suchThat(Symbols.scala:1678) at scala.reflect.internal.Symbols$ClassSymbol.companionModule0(Symbols.scala:2988) at scala.reflect.internal.Symbols$ClassSymbol.companionModule(Symbols.scala:2991) at scala.tools.nsc.backend.jvm.GenASM$JPlainBuilder.genClass(GenASM.scala:1371) at scala.tools.nsc.backend.jvm.GenASM$AsmPhase.run(GenASM.scala:120) at scala.tools.nsc.Global$Run.compileUnitsInternal(Global.scala:1583) at scala.tools.nsc.Global$Run.compileUnits(Global.scala:1557) at scala.tools.nsc.Global$Run.compileSources(Global.scala:1553) at scala.tools.nsc.Global$Run.compile(Global.scala:1662) at xsbt.CachedCompiler0.run(CompilerInterface.scala:126) thanks, Minglei.
How can I get the column data based on specific column name and then stored these data in array or list ?
Hi, I am a new to Scala and Spark and trying to find relative API in DataFrame to solve my problem as title described. However, I just only find this API DataFrame.col(colName : String) : Column which returns an object of Column. Not the content. If only DataFrame support such API which like Column.toArray : Type is enough for me. But now, it doesn’t. How can I do can achieve this function ? Thanks, Minglei.
running spark application encouter an error (maven relative)
Hi, I am trying to figure out how maven works. When I add a dependency to my existing pom.xml and rebuild my spark application project. BUILD SUCCESS I can get from the console. However, when I running the spark application, the spark-shell was not happy and directly give me a message following : Exception : Java.lang.NoClassDefFoundError : com/github/stuxuhai/jpinyin/PinyinFormat Caused by : ClassNotFoundException : com.github.stuxuhai.jpinyin.PinyinFormat I go to the directory .m2/repo/com/github/stuxuhai/jpinyin/1/.1.1 and jpinyin-1.1.1.jar was there. What happened ? Can anyone help me ? Thanks, Minglei.
UnsupportedOperationException Schema for type String => Int is not supported
Hi, Spark-version : 1.4.1 Runing the code getting the following error, how can I fix the code and run collectly ? I don’t know why the schema don’t support this type system. If I use callUDF instead of udf. Everything is good. Thanks, Minglei. val index:(String => (String => Int)) = (value:String) => { (a:String) => if ( value.equals(a)) 1 else 0 } val sqlfunc = udf(index) var temp = df val meetsConditionValue = List("fergubo01m" ,"wrighha01m" ,"woodji01m" ,"mcbridi01m" ,"cravebi01m") for (i <- 0 until j) { temp = temp.withColumn(columnName + "_" + meetsConditionValue(i), sqlfunc(col(columnName))) } Exception in thread "main" java.lang.UnsupportedOperationException: Schema for type String => Int is not supported at org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:152) at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:28) at org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:63) at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:28) at org.apache.spark.sql.functions$.udf(functions.scala:1363) at com.asa.ml.toolimpl.DummyImpl.create_dummy(DummyImpl.scala:60) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.asa.ml.client.Client$.main(Client.scala:26) at com.asa.ml.client.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Are there some solution to complete the transform category variables into dummy variable in scala or spark ?
Hi , I am a new to scala and spark. Recently, I need to write a tool that transform category variables to dummy/indicator variables. I want to know are there some tools in scala and spark which support this transformation which like pandas.get_dummies in python ? Any example or study learning materials for me ? Thanks, Minglei.
YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
Yesterday night, I run the jar on my pseudo-distributed mode without WARN and ERROR. However, Today, Getting the WARN and directly leading to the ERROR below. My computer memory is 8GB and I think it’s not the issue as the LOG WARN describe. What ‘s wrong ? The code haven’t change yet. And the environment haven’t change too. So Strange. Can anybody help me ? Why ……. Thanks. Minglei. Here is the submit job script /bin/spark-submit --master local[*] --driver-memory 8g --executor-memory 8g --class com.ctrip.ml.client.Client /root/di-ml-tool/target/di-ml-tool-1.0-SNAPSHOT.jar Error below 15/12/16 10:22:01 WARN cluster.YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 15/12/16 10:22:04 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster has disassociated: 10.32.3.21:48311 15/12/16 10:22:04 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster has disassociated: 10.32.3.21:48311 15/12/16 10:22:04 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkYarnAM@10.32.3.21:48311] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 15/12/16 10:22:04 ERROR cluster.YarnClientSchedulerBackend: Yarn application has already exited with state FINISHED! Exception in thread "main" 15/12/16 10:22:04 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors Exception in thread "Yarn application state monitor" org.apache.spark.SparkException: Error asking standalone scheduler to shut down executors at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.stopExecutors(CoarseGrainedSchedulerBackend.scala:261) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.stop(CoarseGrainedSchedulerBackend.scala:266) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.stop(YarnClientSchedulerBackend.scala:158) at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:416) at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1411) at org.apache.spark.SparkContext.stop(SparkContext.scala:1644) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$$anon$1.run(YarnClientSchedulerBackend.scala:139) Caused by: java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1325) at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208) at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.stopExecutors(CoarseGrainedSchedulerBackend.scala:257)
RuntimeException: Failed to check null bit for primitive int type
Hi, My spark version is spark-1.4.1-bin-hadoop2.6. When I submit a spark job and read data from hive table. Getting the following error. Although it’s just a WARN. But it’s leading to the job failure. Maybe the following jira has solved. So, I am confusing. https://issues.apache.org/jira/browse/SPARK-3004 15/12/14 19:21:39 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 40.0 (TID 1255, minglei): java.lang.RuntimeException: Failed to check null bit for primitive int value. at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.expressions.GenericRow.getInt(rows.scala:82) at com.ctrip.ml.toolimpl.MetadataImpl$$anonfun$1.apply(MetadataImpl.scala:22) at com.ctrip.ml.toolimpl.MetadataImpl$$anonfun$1.apply(MetadataImpl.scala:22) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.convert.Wrappers$IteratorWrapper.next(Wrappers.scala:30) at org.spark-project.guava.collect.Ordering.leastOf(Ordering.java:658) at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37) at org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$29.apply(RDD.scala:1338) at org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$29.apply(RDD.scala:1335) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745 15/12/14 19:21:39 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 40.0 (TID 1255, minglei): java.lang.RuntimeException: Failed to check null bit for primitive int value. at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.expressions.GenericRow.getInt(rows.scala:82) at com.ctrip.ml.toolimpl.MetadataImpl$$anonfun$1.apply(MetadataImpl.scala:22) at com.ctrip.ml.toolimpl.MetadataImpl$$anonfun$1.apply(MetadataImpl.scala:22) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.convert.Wrappers$IteratorWrapper.next(Wrappers.scala:30) at org.spark-project.guava.collect.Ordering.leastOf(Ordering.java:658) at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37) at org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$29.apply(RDD.scala:1338) at org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$29.apply(RDD.scala:1335) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)