[jira] [Commented] (SPARK-21928) ClassNotFoundException for custom Kryo registrator class during serde in netty threads
[ https://issues.apache.org/jira/browse/SPARK-21928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17788104#comment-17788104 ] Shivam Sharma commented on SPARK-21928: --- I am getting this intermittent failure on spark 2.4.3 version. Here is the full stack trace: {code:java} Exception in thread "main" java.lang.reflect.InvocationTargetExceptionat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498)at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65) at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 75 in stage 1.0 failed 4 times, most recent failure: Lost task 75.3 in stage 1.0 (TID 171, phx6-kwq.prod.xyz.internal, executor 71): java.io.IOException: org.apache.spark.SparkException: Failed to register classes with Kryoat org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1333)at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:208) at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66) at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66) at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96) at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:89) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at org.apache.spark.scheduler.Task.run(Task.scala:121)at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:411) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)Caused by: org.apache.spark.SparkException: Failed to register classes with Kryoat org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:140) at org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:324) at org.apache.spark.serializer.KryoSerializerInstance.(KryoSerializer.scala:309) at org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:218) at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:305) at org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBroadcastBlock$3(TorrentBroadcast.scala:235) at scala.Option.getOrElse(Option.scala:138)at org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBroadcastBlock$1(TorrentBroadcast.scala:211) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)... 14 moreCaused by: java.lang.ClassNotFoundException: com.xyz.datashack.SparkKryoRegistrarat java.lang.ClassLoader.findClass(ClassLoader.java:530)at org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.java:35) at java.lang.ClassLoader.loadClass(ClassLoader.java:424)at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.java:40) at org.apache.spark.util.ChildFirstURLClassLoader.loadClass(ChildFirstURLClassLoader.java:48) at java.lang.ClassLoader.loadClass(ClassLoader.java:357)at java.lang.Class.forName0(Native Method)at java.lang.Class.forName(Class.java:348)at org.apache.spark.serializer.KryoSerializer.$anonfun$newKryo$6(KryoSerializer.scala:135) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)at scala.collection.TraversableLike.map(TraversableLike.scala:237)at scala.collection.TraversableLike.map$(TraversableLike.scala:230)at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:135) ... 22 more Driver stacktrace:at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:1889) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:1877) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:1876) at scala.collection.mutable.Resiza
[jira] [Commented] (SPARK-21928) ClassNotFoundException for custom Kryo registrator class during serde in netty threads
[ https://issues.apache.org/jira/browse/SPARK-21928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16663798#comment-16663798 ] Imran Rashid commented on SPARK-21928: -- [~soravgul...@gmail.com] this is believed to be fixed in 2.3.0. Can you share more details about what you see -- the full stack trace and what you were trying to do? Its possible there is another cause of a similar exception. > ClassNotFoundException for custom Kryo registrator class during serde in > netty threads > -- > > Key: SPARK-21928 > URL: https://issues.apache.org/jira/browse/SPARK-21928 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.1, 2.2.0 >Reporter: John Brock >Assignee: Imran Rashid >Priority: Major > Fix For: 2.1.2, 2.2.1, 2.3.0 > > > From SPARK-13990 & SPARK-13926, Spark's SerializerManager has its own > instance of a KryoSerializer which does not have the defaultClassLoader set > on it. For normal task execution, that doesn't cause problems, because the > serializer falls back to the current thread's task loader, which is set > anyway. > however, netty maintains its own thread pool, and those threads don't change > their classloader to include the extra use jars needed for the custom kryo > registrator. That only matters when blocks are sent across the network which > force serde in the netty thread. That won't happen often, because (a) spark > tries to execute tasks where the RDDs are already cached and (b) broadcast > blocks generally don't require any serde in the netty threads (that occurs in > the task thread that is reading the broadcast value). However it can come up > with remote cache reads, or if fetching a broadcast block forces another > block to disk, which requires serialization. > This doesn't effect the shuffle path, because the serde is never done in the > threads created by netty. > I think a fix for this should be fairly straight-forward, we just need to set > the classloader on that extra kryo instance. > (original problem description below) > I unfortunately can't reliably reproduce this bug; it happens only > occasionally, when training a logistic regression model with very large > datasets. The training will often proceed through several {{treeAggregate}} > calls without any problems, and then suddenly workers will start running into > this {{java.lang.ClassNotFoundException}}. > After doing some debugging, it seems that whenever this error happens, Spark > is trying to use the {{sun.misc.Launcher$AppClassLoader}} {{ClassLoader}} > instance instead of the usual > {{org.apache.spark.util.MutableURLClassLoader}}. {{MutableURLClassLoader}} > can see my custom Kryo registrator, but the {{AppClassLoader}} instance can't. > When this error does pop up, it's usually accompanied by the task seeming to > hang, and I need to kill Spark manually. > I'm running a Spark application in cluster mode via spark-submit, and I have > a custom Kryo registrator. The JAR is built with {{sbt assembly}}. > Exception message: > {noformat} > 17/08/29 22:39:04 ERROR TransportRequestHandler: Error opening block > StreamChunkId{streamId=542074019336, chunkIndex=0} for request from > /10.0.29.65:34332 > org.apache.spark.SparkException: Failed to register classes with Kryo > at > org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:139) > at > org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:292) > at > org.apache.spark.serializer.KryoSerializerInstance.(KryoSerializer.scala:277) > at > org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:186) > at > org.apache.spark.serializer.SerializerManager.dataSerializeStream(SerializerManager.scala:169) > at > org.apache.spark.storage.BlockManager$$anonfun$dropFromMemory$3.apply(BlockManager.scala:1382) > at > org.apache.spark.storage.BlockManager$$anonfun$dropFromMemory$3.apply(BlockManager.scala:1377) > at org.apache.spark.storage.DiskStore.put(DiskStore.scala:69) > at > org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1377) > at > org.apache.spark.storage.memory.MemoryStore.org$apache$spark$storage$memory$MemoryStore$$dropBlock$1(MemoryStore.scala:524) > at > org.apache.spark.storage.memory.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:545) > at > org.apache.spark.storage.memory.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:539) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > org.apache.spark.storage.memory.Mem
[jira] [Commented] (SPARK-21928) ClassNotFoundException for custom Kryo registrator class during serde in netty threads
[ https://issues.apache.org/jira/browse/SPARK-21928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16663313#comment-16663313 ] Sourav Gulati commented on SPARK-21928: --- I am using Spark 2.30 version and I am stillĀ getting this Exception. Is it not fixed in Spark 2.3.0? > ClassNotFoundException for custom Kryo registrator class during serde in > netty threads > -- > > Key: SPARK-21928 > URL: https://issues.apache.org/jira/browse/SPARK-21928 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.1, 2.2.0 >Reporter: John Brock >Assignee: Imran Rashid >Priority: Major > Fix For: 2.1.2, 2.2.1, 2.3.0 > > > From SPARK-13990 & SPARK-13926, Spark's SerializerManager has its own > instance of a KryoSerializer which does not have the defaultClassLoader set > on it. For normal task execution, that doesn't cause problems, because the > serializer falls back to the current thread's task loader, which is set > anyway. > however, netty maintains its own thread pool, and those threads don't change > their classloader to include the extra use jars needed for the custom kryo > registrator. That only matters when blocks are sent across the network which > force serde in the netty thread. That won't happen often, because (a) spark > tries to execute tasks where the RDDs are already cached and (b) broadcast > blocks generally don't require any serde in the netty threads (that occurs in > the task thread that is reading the broadcast value). However it can come up > with remote cache reads, or if fetching a broadcast block forces another > block to disk, which requires serialization. > This doesn't effect the shuffle path, because the serde is never done in the > threads created by netty. > I think a fix for this should be fairly straight-forward, we just need to set > the classloader on that extra kryo instance. > (original problem description below) > I unfortunately can't reliably reproduce this bug; it happens only > occasionally, when training a logistic regression model with very large > datasets. The training will often proceed through several {{treeAggregate}} > calls without any problems, and then suddenly workers will start running into > this {{java.lang.ClassNotFoundException}}. > After doing some debugging, it seems that whenever this error happens, Spark > is trying to use the {{sun.misc.Launcher$AppClassLoader}} {{ClassLoader}} > instance instead of the usual > {{org.apache.spark.util.MutableURLClassLoader}}. {{MutableURLClassLoader}} > can see my custom Kryo registrator, but the {{AppClassLoader}} instance can't. > When this error does pop up, it's usually accompanied by the task seeming to > hang, and I need to kill Spark manually. > I'm running a Spark application in cluster mode via spark-submit, and I have > a custom Kryo registrator. The JAR is built with {{sbt assembly}}. > Exception message: > {noformat} > 17/08/29 22:39:04 ERROR TransportRequestHandler: Error opening block > StreamChunkId{streamId=542074019336, chunkIndex=0} for request from > /10.0.29.65:34332 > org.apache.spark.SparkException: Failed to register classes with Kryo > at > org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:139) > at > org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:292) > at > org.apache.spark.serializer.KryoSerializerInstance.(KryoSerializer.scala:277) > at > org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:186) > at > org.apache.spark.serializer.SerializerManager.dataSerializeStream(SerializerManager.scala:169) > at > org.apache.spark.storage.BlockManager$$anonfun$dropFromMemory$3.apply(BlockManager.scala:1382) > at > org.apache.spark.storage.BlockManager$$anonfun$dropFromMemory$3.apply(BlockManager.scala:1377) > at org.apache.spark.storage.DiskStore.put(DiskStore.scala:69) > at > org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1377) > at > org.apache.spark.storage.memory.MemoryStore.org$apache$spark$storage$memory$MemoryStore$$dropBlock$1(MemoryStore.scala:524) > at > org.apache.spark.storage.memory.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:545) > at > org.apache.spark.storage.memory.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:539) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > org.apache.spark.storage.memory.MemoryStore.evictBlocksToFreeSpace(MemoryStore.scala:539) > at > org.apache.spark.memory.StorageMemoryPool.acquireMemory(St
[jira] [Commented] (SPARK-21928) ClassNotFoundException for custom Kryo registrator class during serde in netty threads
[ https://issues.apache.org/jira/browse/SPARK-21928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16175452#comment-16175452 ] John Brock commented on SPARK-21928: Excellent, thanks for looking into this. > ClassNotFoundException for custom Kryo registrator class during serde in > netty threads > -- > > Key: SPARK-21928 > URL: https://issues.apache.org/jira/browse/SPARK-21928 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.1, 2.2.0 >Reporter: John Brock >Assignee: Imran Rashid > Fix For: 2.2.1, 2.3.0 > > > From SPARK-13990 & SPARK-13926, Spark's SerializerManager has its own > instance of a KryoSerializer which does not have the defaultClassLoader set > on it. For normal task execution, that doesn't cause problems, because the > serializer falls back to the current thread's task loader, which is set > anyway. > however, netty maintains its own thread pool, and those threads don't change > their classloader to include the extra use jars needed for the custom kryo > registrator. That only matters when blocks are sent across the network which > force serde in the netty thread. That won't happen often, because (a) spark > tries to execute tasks where the RDDs are already cached and (b) broadcast > blocks generally don't require any serde in the netty threads (that occurs in > the task thread that is reading the broadcast value). However it can come up > with remote cache reads, or if fetching a broadcast block forces another > block to disk, which requires serialization. > This doesn't effect the shuffle path, because the serde is never done in the > threads created by netty. > I think a fix for this should be fairly straight-forward, we just need to set > the classloader on that extra kryo instance. > (original problem description below) > I unfortunately can't reliably reproduce this bug; it happens only > occasionally, when training a logistic regression model with very large > datasets. The training will often proceed through several {{treeAggregate}} > calls without any problems, and then suddenly workers will start running into > this {{java.lang.ClassNotFoundException}}. > After doing some debugging, it seems that whenever this error happens, Spark > is trying to use the {{sun.misc.Launcher$AppClassLoader}} {{ClassLoader}} > instance instead of the usual > {{org.apache.spark.util.MutableURLClassLoader}}. {{MutableURLClassLoader}} > can see my custom Kryo registrator, but the {{AppClassLoader}} instance can't. > When this error does pop up, it's usually accompanied by the task seeming to > hang, and I need to kill Spark manually. > I'm running a Spark application in cluster mode via spark-submit, and I have > a custom Kryo registrator. The JAR is built with {{sbt assembly}}. > Exception message: > {noformat} > 17/08/29 22:39:04 ERROR TransportRequestHandler: Error opening block > StreamChunkId{streamId=542074019336, chunkIndex=0} for request from > /10.0.29.65:34332 > org.apache.spark.SparkException: Failed to register classes with Kryo > at > org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:139) > at > org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:292) > at > org.apache.spark.serializer.KryoSerializerInstance.(KryoSerializer.scala:277) > at > org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:186) > at > org.apache.spark.serializer.SerializerManager.dataSerializeStream(SerializerManager.scala:169) > at > org.apache.spark.storage.BlockManager$$anonfun$dropFromMemory$3.apply(BlockManager.scala:1382) > at > org.apache.spark.storage.BlockManager$$anonfun$dropFromMemory$3.apply(BlockManager.scala:1377) > at org.apache.spark.storage.DiskStore.put(DiskStore.scala:69) > at > org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1377) > at > org.apache.spark.storage.memory.MemoryStore.org$apache$spark$storage$memory$MemoryStore$$dropBlock$1(MemoryStore.scala:524) > at > org.apache.spark.storage.memory.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:545) > at > org.apache.spark.storage.memory.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:539) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > org.apache.spark.storage.memory.MemoryStore.evictBlocksToFreeSpace(MemoryStore.scala:539) > at > org.apache.spark.memory.StorageMemoryPool.acquireMemory(StorageMemoryPool.scala:92) > at > org.apache.spark.memory.StorageMemoryPool.acquireMemory(StorageMem
[jira] [Commented] (SPARK-21928) ClassNotFoundException for custom Kryo registrator class during serde in netty threads
[ https://issues.apache.org/jira/browse/SPARK-21928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16175410#comment-16175410 ] Imran Rashid commented on SPARK-21928: -- thanks [~jbrock], thats great. I think this is fully explained now. I updated the title and description so folks know it is not related to ML, hope that is OK. > ClassNotFoundException for custom Kryo registrator class during serde in > netty threads > -- > > Key: SPARK-21928 > URL: https://issues.apache.org/jira/browse/SPARK-21928 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.1, 2.2.0 >Reporter: John Brock >Assignee: Imran Rashid > Fix For: 2.2.1, 2.3.0 > > > From SPARK-13990 & SPARK-13926, Spark's SerializerManager has its own > instance of a KryoSerializer which does not have the defaultClassLoader set > on it. For normal task execution, that doesn't cause problems, because the > serializer falls back to the current thread's task loader, which is set > anyway. > however, netty maintains its own thread pool, and those threads don't change > their classloader to include the extra use jars needed for the custom kryo > registrator. That only matters when blocks are sent across the network which > force serde in the netty thread. That won't happen often, because (a) spark > tries to execute tasks where the RDDs are already cached and (b) broadcast > blocks generally don't require any serde in the netty threads (that occurs in > the task thread that is reading the broadcast value). However it can come up > with remote cache reads, or if fetching a broadcast block forces another > block to disk, which requires serialization. > This doesn't effect the shuffle path, because the serde is never done in the > threads created by netty. > I think a fix for this should be fairly straight-forward, we just need to set > the classloader on that extra kryo instance. > (original problem description below) > I unfortunately can't reliably reproduce this bug; it happens only > occasionally, when training a logistic regression model with very large > datasets. The training will often proceed through several {{treeAggregate}} > calls without any problems, and then suddenly workers will start running into > this {{java.lang.ClassNotFoundException}}. > After doing some debugging, it seems that whenever this error happens, Spark > is trying to use the {{sun.misc.Launcher$AppClassLoader}} {{ClassLoader}} > instance instead of the usual > {{org.apache.spark.util.MutableURLClassLoader}}. {{MutableURLClassLoader}} > can see my custom Kryo registrator, but the {{AppClassLoader}} instance can't. > When this error does pop up, it's usually accompanied by the task seeming to > hang, and I need to kill Spark manually. > I'm running a Spark application in cluster mode via spark-submit, and I have > a custom Kryo registrator. The JAR is built with {{sbt assembly}}. > Exception message: > {noformat} > 17/08/29 22:39:04 ERROR TransportRequestHandler: Error opening block > StreamChunkId{streamId=542074019336, chunkIndex=0} for request from > /10.0.29.65:34332 > org.apache.spark.SparkException: Failed to register classes with Kryo > at > org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:139) > at > org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:292) > at > org.apache.spark.serializer.KryoSerializerInstance.(KryoSerializer.scala:277) > at > org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:186) > at > org.apache.spark.serializer.SerializerManager.dataSerializeStream(SerializerManager.scala:169) > at > org.apache.spark.storage.BlockManager$$anonfun$dropFromMemory$3.apply(BlockManager.scala:1382) > at > org.apache.spark.storage.BlockManager$$anonfun$dropFromMemory$3.apply(BlockManager.scala:1377) > at org.apache.spark.storage.DiskStore.put(DiskStore.scala:69) > at > org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1377) > at > org.apache.spark.storage.memory.MemoryStore.org$apache$spark$storage$memory$MemoryStore$$dropBlock$1(MemoryStore.scala:524) > at > org.apache.spark.storage.memory.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:545) > at > org.apache.spark.storage.memory.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:539) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > org.apache.spark.storage.memory.MemoryStore.evictBlocksToFreeSpace(MemoryStore.scala:539) > at > org.apache.spark.memory.StorageMemo