LuciferYang commented on pull request #34368: URL: https://github.com/apache/spark/pull/34368#issuecomment-950555453
> I wonder what happens if we don't clear this field in the closure in this case - seems kind of risky to do this. That said, who knows what behavior differences arise if we don't @srowen I try to remove the following code and test the `repl` module https://github.com/apache/spark/blob/adf9b64c0be8e6e5bc6042eaaecd53518fbc5e25/core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala#L396-L398 there are 2 test failed due to `Caused by: java.io.NotSerializableException: NotSerializableClass` ``` - SPARK-31399: should clone+clean line object w/ non-serializable state in ClosureCleaner *** FAILED *** isContain was false Interpreter output did not contain 'r: Array[scala.collection.immutable.IndexedSeq[String]] = Array(Vector(), Vector(1someValue), Vector(1someValue, 1someValue, 2someValue))': scala> defined class NotSerializableClass scala> | | | | | ns: NotSerializableClass = NotSerializableClass@c63e6a1 topLevelValue: String = someValue closure: Int => scala.collection.immutable.IndexedSeq[String] = $Lambda$4794/0x0000000801fcedd0@7a715bb scala> org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:444) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:416) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:163) at org.apache.spark.SparkContext.clean(SparkContext.scala:2490) at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:414) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:406) at org.apache.spark.rdd.RDD.map(RDD.scala:413) ... 35 elided Caused by: java.io.NotSerializableException: NotSerializableClass Serialization stack: - object not serializable (class: NotSerializableClass, value: NotSerializableClass@c63e6a1) - field (class: $iw, name: ns, type: class NotSerializableClass) - object (class $iw, $iw@79ac2398) - element of array (index: 0) - array (class [Ljava.lang.Object;, size 1) - field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, type: class [Ljava.lang.Object;) - object (class java.lang.invoke.SerializedLambda, SerializedLambda[capturingClass=class $iw, functionalInterfaceMethod=scala/Function1.apply:(Ljava/lang/Object;)Ljava/lang/Object;, implementation=invokeStatic $anonfun$closure$1$adapted:(L$iw;Ljava/lang/Object;)Lscala/collection/immutable/IndexedSeq;, instantiatedMethodType=(Ljava/lang/Object;)Lscala/collection/immutable/IndexedSeq;, numCaptured=1]) - writeReplace data (class: java.lang.invoke.SerializedLambda) - object (class $Lambda$4794/0x0000000801fcedd0, $Lambda$4794/0x0000000801fcedd0@7a715bb) at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:41) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101) at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:441) ... 43 more scala> | _result_1635140723849: Int = 1 scala> (SingletonRepl2Suite.scala:96) - SPARK-31399: ClosureCleaner should discover indirectly nested closure in inner class *** FAILED *** isContain was false Interpreter output did not contain 'r: Array[scala.collection.immutable.IndexedSeq[String]] = Array(Vector(), Vector(1someValue), Vector(1someValue, 1someValue, 2someValue))': scala> defined class NotSerializableClass scala> | | | | | | | ns: NotSerializableClass = NotSerializableClass@3c10dced topLevelValue: String = someValue closure: Int => scala.collection.immutable.IndexedSeq[String] = $Lambda$4817/0x0000000801fe1330@1c0d770d scala> org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:444) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:416) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:163) at org.apache.spark.SparkContext.clean(SparkContext.scala:2490) at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:414) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:406) at org.apache.spark.rdd.RDD.map(RDD.scala:413) ... 35 elided Caused by: java.io.NotSerializableException: NotSerializableClass Serialization stack: - object not serializable (class: NotSerializableClass, value: NotSerializableClass@3c10dced) - field (class: $iw, name: ns, type: class NotSerializableClass) - object (class $iw, $iw@5a9299a9) - element of array (index: 0) - array (class [Ljava.lang.Object;, size 1) - field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, type: class [Ljava.lang.Object;) - object (class java.lang.invoke.SerializedLambda, SerializedLambda[capturingClass=class $iw, functionalInterfaceMethod=scala/Function1.apply:(Ljava/lang/Object;)Ljava/lang/Object;, implementation=invokeStatic $anonfun$closure$1$adapted:(L$iw;Ljava/lang/Object;)Lscala/collection/immutable/IndexedSeq;, instantiatedMethodType=(Ljava/lang/Object;)Lscala/collection/immutable/IndexedSeq;, numCaptured=1]) - writeReplace data (class: java.lang.invoke.SerializedLambda) - object (class $Lambda$4817/0x0000000801fe1330, $Lambda$4817/0x0000000801fe1330@1c0d770d) at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:41) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101) at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:441) ... 43 more scala> | _result_1635140724243: Int = 1 scala> (SingletonRepl2Suite.scala:96) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
