Re: ClassNotFoundException in RDD.map
Thanks Jacob, I've looked into the source code here and found that I miss this property there: spark.repl.class.uri Putting it solved the problem Cheers 2016-03-17 18:14 GMT-03:00 Jakob Odersky : > The error is very strange indeed, however without code that reproduces > it, we can't really provide much help beyond speculation. > > One thing that stood out to me immediately is that you say you have an > RDD of Any where every Any should be a BigDecimal, so why not specify > that type information? > When using Any, a whole class of errors, that normally the typechecker > could catch, can slip through. > > On Thu, Mar 17, 2016 at 10:25 AM, Dirceu Semighini Filho > wrote: > > Hi Ted, thanks for answering. > > The map is just that, whenever I try inside the map it throws this > > ClassNotFoundException, even if I do map(f => f) it throws the exception. > > What is bothering me is that when I do a take or a first it returns the > > result, which make me conclude that the previous code isn't wrong. > > > > Kind Regards, > > Dirceu > > > > > > 2016-03-17 12:50 GMT-03:00 Ted Yu : > >> > >> bq. $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1 > >> > >> Do you mind showing more of your code involving the map() ? > >> > >> On Thu, Mar 17, 2016 at 8:32 AM, Dirceu Semighini Filho > >> wrote: > >>> > >>> Hello, > >>> I found a strange behavior after executing a prediction with MLIB. > >>> My code return an RDD[(Any,Double)] where Any is the id of my dataset, > >>> which is BigDecimal, and Double is the prediction for that line. > >>> When I run > >>> myRdd.take(10) it returns ok > >>> res16: Array[_ >: (Double, Double) <: (Any, Double)] = > >>> Array((1921821857196754403.00,0.1690292052496703), > >>> (454575632374427.00,0.16902820241892452), > >>> (989198096568001939.00,0.16903432789699502), > >>> (14284129652106187990.00,0.16903517653451386), > >>> (17980228074225252497.00,0.16903151028332508), > >>> (3861345958263692781.00,0.16903056986183976), > >>> (17558198701997383205.00,0.1690295450319745), > >>> (10651576092054552310.00,0.1690286445174418), > >>> (4534494349035056215.00,0.16903303401862327), > >>> (5551671513234217935.00,0.16902303368995966)) > >>> But when I try to run some map on it: > >>> myRdd.map(_._1).take(10) > >>> It throws a ClassCastException: > >>> org.apache.spark.SparkException: Job aborted due to stage failure: > Task 0 > >>> in stage 72.0 failed 4 times, most recent failure: Lost task 0.3 in > stage > >>> 72.0 (TID 1774, 172.31.23.208): java.lang.ClassNotFoundException: > >>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1 > >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > >>> at java.security.AccessController.doPrivileged(Native Method) > >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > >>> at java.lang.Class.forName0(Native Method) > >>> at java.lang.Class.forName(Class.java:278) > >>> at > >>> > org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67) > >>> at > >>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) > >>> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) > >>> at > >>> > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) > >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > >>> at > >>> > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) > >>> at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) > >>> at > >>> > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > >>> at > >>> > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) > >>> at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) > >>> at > >>> > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > >>> at > >>> > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) > >>> at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) > >>> at > >>> > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > >>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) > >>> at > >>> > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72) > >>> at > >>> > org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98) > >>> at o
Re: ClassNotFoundException in RDD.map
The error is very strange indeed, however without code that reproduces it, we can't really provide much help beyond speculation. One thing that stood out to me immediately is that you say you have an RDD of Any where every Any should be a BigDecimal, so why not specify that type information? When using Any, a whole class of errors, that normally the typechecker could catch, can slip through. On Thu, Mar 17, 2016 at 10:25 AM, Dirceu Semighini Filho wrote: > Hi Ted, thanks for answering. > The map is just that, whenever I try inside the map it throws this > ClassNotFoundException, even if I do map(f => f) it throws the exception. > What is bothering me is that when I do a take or a first it returns the > result, which make me conclude that the previous code isn't wrong. > > Kind Regards, > Dirceu > > > 2016-03-17 12:50 GMT-03:00 Ted Yu : >> >> bq. $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1 >> >> Do you mind showing more of your code involving the map() ? >> >> On Thu, Mar 17, 2016 at 8:32 AM, Dirceu Semighini Filho >> wrote: >>> >>> Hello, >>> I found a strange behavior after executing a prediction with MLIB. >>> My code return an RDD[(Any,Double)] where Any is the id of my dataset, >>> which is BigDecimal, and Double is the prediction for that line. >>> When I run >>> myRdd.take(10) it returns ok >>> res16: Array[_ >: (Double, Double) <: (Any, Double)] = >>> Array((1921821857196754403.00,0.1690292052496703), >>> (454575632374427.00,0.16902820241892452), >>> (989198096568001939.00,0.16903432789699502), >>> (14284129652106187990.00,0.16903517653451386), >>> (17980228074225252497.00,0.16903151028332508), >>> (3861345958263692781.00,0.16903056986183976), >>> (17558198701997383205.00,0.1690295450319745), >>> (10651576092054552310.00,0.1690286445174418), >>> (4534494349035056215.00,0.16903303401862327), >>> (5551671513234217935.00,0.16902303368995966)) >>> But when I try to run some map on it: >>> myRdd.map(_._1).take(10) >>> It throws a ClassCastException: >>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 >>> in stage 72.0 failed 4 times, most recent failure: Lost task 0.3 in stage >>> 72.0 (TID 1774, 172.31.23.208): java.lang.ClassNotFoundException: >>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1 >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >>> at java.lang.Class.forName0(Native Method) >>> at java.lang.Class.forName(Class.java:278) >>> at >>> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67) >>> at >>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) >>> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) >>> at >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >>> at >>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) >>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) >>> at >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >>> at >>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) >>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) >>> at >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >>> at >>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) >>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) >>> at >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) >>> at >>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72) >>> at >>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98) >>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) >>> at org.apache.spark.scheduler.Task.run(Task.scala:88) >>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> at java.lang.Thread.run(Threa
Re: ClassNotFoundException in RDD.map
Hi Ted, thanks for answering. The map is just that, whenever I try inside the map it throws this ClassNotFoundException, even if I do map(f => f) it throws the exception. What is bothering me is that when I do a take or a first it returns the result, which make me conclude that the previous code isn't wrong. Kind Regards, Dirceu 2016-03-17 12:50 GMT-03:00 Ted Yu : > bq. $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1 > > Do you mind showing more of your code involving the map() ? > > On Thu, Mar 17, 2016 at 8:32 AM, Dirceu Semighini Filho < > dirceu.semigh...@gmail.com> wrote: > >> Hello, >> I found a strange behavior after executing a prediction with MLIB. >> My code return an RDD[(Any,Double)] where Any is the id of my dataset, >> which is BigDecimal, and Double is the prediction for that line. >> When I run >> myRdd.take(10) it returns ok >> res16: Array[_ >: (Double, Double) <: (Any, Double)] = >> Array((1921821857196754403.00,0.1690292052496703), >> (454575632374427.00,0.16902820241892452), >> (989198096568001939.00,0.16903432789699502), >> (14284129652106187990.00,0.16903517653451386), >> (17980228074225252497.00,0.16903151028332508), >> (3861345958263692781.00,0.16903056986183976), >> (17558198701997383205.00,0.1690295450319745), >> (10651576092054552310.00,0.1690286445174418), >> (4534494349035056215.00,0.16903303401862327), >> (5551671513234217935.00,0.16902303368995966)) >> But when I try to run some map on it: >> myRdd.map(_._1).take(10) >> It throws a ClassCastException: >> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 >> in stage 72.0 failed 4 times, most recent failure: Lost task 0.3 in stage >> 72.0 (TID 1774, 172.31.23.208): java.lang.ClassNotFoundException: >> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1 >> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) >> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >> at java.security.AccessController.doPrivileged(Native Method) >> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >> at java.lang.Class.forName0(Native Method) >> at java.lang.Class.forName(Class.java:278) >> at >> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67) >> at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) >> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >> at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) >> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >> at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) >> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >> at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) >> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) >> at >> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72) >> at >> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98) >> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) >> at org.apache.spark.scheduler.Task.run(Task.scala:88) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> Driver stacktrace: >> at org.apache.spark.scheduler.DAGScheduler.org >> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270) >> at >> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) >> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scal
Re: ClassNotFoundException in RDD.map
bq. $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1 Do you mind showing more of your code involving the map() ? On Thu, Mar 17, 2016 at 8:32 AM, Dirceu Semighini Filho < dirceu.semigh...@gmail.com> wrote: > Hello, > I found a strange behavior after executing a prediction with MLIB. > My code return an RDD[(Any,Double)] where Any is the id of my dataset, > which is BigDecimal, and Double is the prediction for that line. > When I run > myRdd.take(10) it returns ok > res16: Array[_ >: (Double, Double) <: (Any, Double)] = > Array((1921821857196754403.00,0.1690292052496703), > (454575632374427.00,0.16902820241892452), > (989198096568001939.00,0.16903432789699502), > (14284129652106187990.00,0.16903517653451386), > (17980228074225252497.00,0.16903151028332508), > (3861345958263692781.00,0.16903056986183976), > (17558198701997383205.00,0.1690295450319745), > (10651576092054552310.00,0.1690286445174418), > (4534494349035056215.00,0.16903303401862327), > (5551671513234217935.00,0.16902303368995966)) > But when I try to run some map on it: > myRdd.map(_._1).take(10) > It throws a ClassCastException: > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 > in stage 72.0 failed 4 times, most recent failure: Lost task 0.3 in stage > 72.0 (TID 1774, 172.31.23.208): java.lang.ClassNotFoundException: > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1 > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:278) > at > org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67) > at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) > at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) > at > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72) > at > org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Driver stacktrace: > at org.apache.spark.scheduler.DAGScheduler.org > $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697) > at > org.apache.spark.sch