LuciferYang commented on pull request #34368:
URL: https://github.com/apache/spark/pull/34368#issuecomment-950555453


   > I wonder what happens if we don't clear this field in the closure in this 
case - seems kind of risky to do this. That said, who knows what behavior 
differences arise if we don't
   
   @srowen I try to remove the following code and test the `repl` module
   
   
https://github.com/apache/spark/blob/adf9b64c0be8e6e5bc6042eaaecd53518fbc5e25/core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala#L396-L398
   
   there are 2 test failed due to `Caused by: java.io.NotSerializableException: 
NotSerializableClass`
   ```
   - SPARK-31399: should clone+clean line object w/ non-serializable state in 
ClosureCleaner *** FAILED ***
     isContain was false Interpreter output did not contain 'r: 
Array[scala.collection.immutable.IndexedSeq[String]] = Array(Vector(), 
Vector(1someValue), Vector(1someValue, 1someValue, 2someValue))':
     
     scala> defined class NotSerializableClass
     
     scala>      |      |      |      |      | ns: NotSerializableClass = 
NotSerializableClass@c63e6a1
     topLevelValue: String = someValue
     closure: Int => scala.collection.immutable.IndexedSeq[String] = 
$Lambda$4794/0x0000000801fcedd0@7a715bb
     
     scala> org.apache.spark.SparkException: Task not serializable
       at 
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:444)
       at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:416)
       at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:163)
       at org.apache.spark.SparkContext.clean(SparkContext.scala:2490)
       at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:414)
       at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
       at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
       at org.apache.spark.rdd.RDD.withScope(RDD.scala:406)
       at org.apache.spark.rdd.RDD.map(RDD.scala:413)
       ... 35 elided
     Caused by: java.io.NotSerializableException: NotSerializableClass
     Serialization stack:
        - object not serializable (class: NotSerializableClass, value: 
NotSerializableClass@c63e6a1)
        - field (class: $iw, name: ns, type: class NotSerializableClass)
        - object (class $iw, $iw@79ac2398)
        - element of array (index: 0)
        - array (class [Ljava.lang.Object;, size 1)
        - field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, 
type: class [Ljava.lang.Object;)
        - object (class java.lang.invoke.SerializedLambda, 
SerializedLambda[capturingClass=class $iw, 
functionalInterfaceMethod=scala/Function1.apply:(Ljava/lang/Object;)Ljava/lang/Object;,
 implementation=invokeStatic 
$anonfun$closure$1$adapted:(L$iw;Ljava/lang/Object;)Lscala/collection/immutable/IndexedSeq;,
 
instantiatedMethodType=(Ljava/lang/Object;)Lscala/collection/immutable/IndexedSeq;,
 numCaptured=1])
        - writeReplace data (class: java.lang.invoke.SerializedLambda)
        - object (class $Lambda$4794/0x0000000801fcedd0, 
$Lambda$4794/0x0000000801fcedd0@7a715bb)
       at 
org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:41)
       at 
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
       at 
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
       at 
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:441)
       ... 43 more
     
     scala>      | _result_1635140723849: Int = 1
     
     scala> (SingletonRepl2Suite.scala:96)
   - SPARK-31399: ClosureCleaner should discover indirectly nested closure in 
inner class *** FAILED ***
     isContain was false Interpreter output did not contain 'r: 
Array[scala.collection.immutable.IndexedSeq[String]] = Array(Vector(), 
Vector(1someValue), Vector(1someValue, 1someValue, 2someValue))':
     
     scala> defined class NotSerializableClass
     
     scala>      |      |      |      |      |      |      | ns: 
NotSerializableClass = NotSerializableClass@3c10dced
     topLevelValue: String = someValue
     closure: Int => scala.collection.immutable.IndexedSeq[String] = 
$Lambda$4817/0x0000000801fe1330@1c0d770d
     
     scala> org.apache.spark.SparkException: Task not serializable
       at 
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:444)
       at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:416)
       at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:163)
       at org.apache.spark.SparkContext.clean(SparkContext.scala:2490)
       at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:414)
       at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
       at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
       at org.apache.spark.rdd.RDD.withScope(RDD.scala:406)
       at org.apache.spark.rdd.RDD.map(RDD.scala:413)
       ... 35 elided
     Caused by: java.io.NotSerializableException: NotSerializableClass
     Serialization stack:
        - object not serializable (class: NotSerializableClass, value: 
NotSerializableClass@3c10dced)
        - field (class: $iw, name: ns, type: class NotSerializableClass)
        - object (class $iw, $iw@5a9299a9)
        - element of array (index: 0)
        - array (class [Ljava.lang.Object;, size 1)
        - field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, 
type: class [Ljava.lang.Object;)
        - object (class java.lang.invoke.SerializedLambda, 
SerializedLambda[capturingClass=class $iw, 
functionalInterfaceMethod=scala/Function1.apply:(Ljava/lang/Object;)Ljava/lang/Object;,
 implementation=invokeStatic 
$anonfun$closure$1$adapted:(L$iw;Ljava/lang/Object;)Lscala/collection/immutable/IndexedSeq;,
 
instantiatedMethodType=(Ljava/lang/Object;)Lscala/collection/immutable/IndexedSeq;,
 numCaptured=1])
        - writeReplace data (class: java.lang.invoke.SerializedLambda)
        - object (class $Lambda$4817/0x0000000801fe1330, 
$Lambda$4817/0x0000000801fe1330@1c0d770d)
       at 
org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:41)
       at 
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
       at 
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
       at 
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:441)
       ... 43 more
     
     scala>      | _result_1635140724243: Int = 1
     
     scala> (SingletonRepl2Suite.scala:96)
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to