The error is very strange indeed, however without code that reproduces it, we can't really provide much help beyond speculation.
One thing that stood out to me immediately is that you say you have an RDD of Any where every Any should be a BigDecimal, so why not specify that type information? When using Any, a whole class of errors, that normally the typechecker could catch, can slip through. On Thu, Mar 17, 2016 at 10:25 AM, Dirceu Semighini Filho <[email protected]> wrote: > Hi Ted, thanks for answering. > The map is just that, whenever I try inside the map it throws this > ClassNotFoundException, even if I do map(f => f) it throws the exception. > What is bothering me is that when I do a take or a first it returns the > result, which make me conclude that the previous code isn't wrong. > > Kind Regards, > Dirceu > > > 2016-03-17 12:50 GMT-03:00 Ted Yu <[email protected]>: >> >> bq. $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1 >> >> Do you mind showing more of your code involving the map() ? >> >> On Thu, Mar 17, 2016 at 8:32 AM, Dirceu Semighini Filho >> <[email protected]> wrote: >>> >>> Hello, >>> I found a strange behavior after executing a prediction with MLIB. >>> My code return an RDD[(Any,Double)] where Any is the id of my dataset, >>> which is BigDecimal, and Double is the prediction for that line. >>> When I run >>> myRdd.take(10) it returns ok >>> res16: Array[_ >: (Double, Double) <: (Any, Double)] = >>> Array((1921821857196754403.000000000000000000,0.1690292052496703), >>> (4545756323742222427.000000000000000000,0.16902820241892452), >>> (989198096568001939.000000000000000000,0.16903432789699502), >>> (14284129652106187990.000000000000000000,0.16903517653451386), >>> (17980228074225252497.000000000000000000,0.16903151028332508), >>> (3861345958263692781.000000000000000000,0.16903056986183976), >>> (17558198701997383205.000000000000000000,0.1690295450319745), >>> (10651576092054552310.000000000000000000,0.1690286445174418), >>> (4534494349035056215.000000000000000000,0.16903303401862327), >>> (5551671513234217935.000000000000000000,0.16902303368995966)) >>> But when I try to run some map on it: >>> myRdd.map(_._1).take(10) >>> It throws a ClassCastException: >>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 >>> in stage 72.0 failed 4 times, most recent failure: Lost task 0.3 in stage >>> 72.0 (TID 1774, 172.31.23.208): java.lang.ClassNotFoundException: >>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1 >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >>> at java.lang.Class.forName0(Native Method) >>> at java.lang.Class.forName(Class.java:278) >>> at >>> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67) >>> at >>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) >>> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) >>> at >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >>> at >>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) >>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) >>> at >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >>> at >>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) >>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) >>> at >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >>> at >>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) >>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) >>> at >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) >>> at >>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72) >>> at >>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98) >>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) >>> at org.apache.spark.scheduler.Task.run(Task.scala:88) >>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> at java.lang.Thread.run(Thread.java:745) >>> Driver stacktrace: >>> at >>> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283) >>> at >>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271) >>> at >>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270) >>> at >>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) >>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) >>> at >>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270) >>> at >>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) >>> at >>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) >>> at scala.Option.foreach(Option.scala:236) >>> at >>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697) >>> at >>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1496) >>> at >>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458) >>> at >>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447) >>> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) >>> at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567) >>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824) >>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1837) >>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1850) >>> at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1302) >>> at >>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) >>> at >>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) >>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:310) >>> at org.apache.spark.rdd.RDD.take(RDD.scala:1276) >>> at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1316) >>> at >>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) >>> at >>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) >>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:310) >>> at org.apache.spark.rdd.RDD.first(RDD.scala:1315) >>> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:34) >>> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:39) >>> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:41) >>> at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:43) >>> at $iwC$$iwC$$iwC$$iwC.<init>(<console>:45) >>> at $iwC$$iwC$$iwC.<init>(<console>:47) >>> at $iwC$$iwC.<init>(<console>:49) >>> at $iwC.<init>(<console>:51) >>> at <init>(<console>:53) >>> at .<init>(<console>:57) >>> at .<clinit>(<console>) >>> at .<init>(<console>:7) >>> at .<clinit>(<console>) >>> at $print(<console>) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:497) >>> at >>> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) >>> at >>> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340) >>> at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) >>> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) >>> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) >>> at >>> org.apache.zeppelin.spark.IUberSparkInterpreter.interpretInput(IUberSparkInterpreter.java:865) >>> at >>> org.apache.zeppelin.spark.IUberSparkInterpreter.interpret(IUberSparkInterpreter.java:830) >>> at >>> org.apache.zeppelin.spark.IUberSparkInterpreter.interpret(IUberSparkInterpreter.java:818) >>> at >>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57) >>> at >>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93) >>> at >>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:331) >>> at org.apache.zeppelin.scheduler.Job.run(Job.java:171) >>> at >>> org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139) >>> at >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >>> at >>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) >>> at >>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >>> at java.lang.Thread.run(Thread.java:745) >>> Caused by: java.lang.ClassNotFoundException: >>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1 >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >>> at java.lang.Class.forName0(Native Method) >>> at java.lang.Class.forName(Class.java:278) >>> at >>> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67) >>> at >>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) >>> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) >>> at >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >>> at >>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) >>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) >>> at >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >>> at >>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) >>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) >>> at >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >>> at >>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) >>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) >>> at >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) >>> at >>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72) >>> at >>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98) >>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) >>> at org.apache.spark.scheduler.Task.run(Task.scala:88) >>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> ... 1 more >>> >>> I'm using spark 1.5.2 compiled with hadoop 2.6. >>> Does anybody know how to avoid this exception? >>> >>> Kind Regards, >>> Dirceu >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
