retronym edited a comment on pull request #28463:
URL: https://github.com/apache/spark/pull/28463#issuecomment-625082537


   Brainstorming a little. An alternative design of `ClosureCleaner` would be 
to customise serialization of REPL line wrapper objects so they replaced 
non-serializable values with null or some other sentinel value. If the 
deserialized code attempted to use one of the absent values, a runtime error 
could be issued.
   
   Pros:
      - Simplicity of implementation (no static analysis needed)
      - Will handle all the corner cases
   
   Cons:
      - Error comes lazily during user code execution
   
   Okay, just wanted to put that out there. 
   
   In case you're not aware, there is a handy secret handshake in the REPL to 
show the wrappers. Add ` // show` at the end of the line for:
   
   ```
   $ scala-ref 2.12.x -Yrepl-class-based -Yrepl-use-magic-imports
   ```
   ```
   scala> class NotSerializableClass(val x: Int); val ns = new 
NotSerializableClass(42); val topLevelValue = "someValue"; val closure =(j: 
Int) => { (1 to j).flatMap { x =>  (1 to x).map { y => y + topLevelValue } } } 
// show
   sealed class $read extends _root_.java.io.Serializable {
     def <init>() = {
       super.<init>;
       ()
     };
     sealed class $iw extends _root_.java.io.Serializable {
       def <init>() = {
         super.<init>;
         ()
       };
       class NotSerializableClass extends scala.AnyRef {
         <paramaccessor> val x: Int = _;
         def <init>(x: Int) = {
           super.<init>;
           ()
         }
       };
       val ns = new NotSerializableClass.<init>(42);
       val topLevelValue = "someValue";
       val closure = ((j: Int) => 1.to(j).flatMap(((x) => 1.to(x).map(((y) => y 
+ topLevelValue)))))
     };
     val $iw = new $iw.<init>
   }
   object $read extends scala.AnyRef {
     def <init>() = {
       super.<init>;
       ()
     };
     val INSTANCE = new $read.<init>
   }
   defined class NotSerializableClass
   ns: NotSerializableClass = NotSerializableClass@476fde05
   topLevelValue: String = someValue
   closure: Int => scala.collection.immutable.IndexedSeq[String] = 
$Lambda$1151/1215025252@5111de7c
   
   scala> import java.io._; def serialize(obj: AnyRef): Array[Byte] = { val 
buffer = new ByteArrayOutputStream; val out = new ObjectOutputStream(buffer); 
out.writeObject(obj); buffer.toByteArray }
   import java.io._
   serialize: (obj: AnyRef)Array[Byte]
   
   scala> serialize(closure) // show
   sealed class $read extends _root_.java.io.Serializable {
     def <init>() = {
       super.<init>;
       ()
     };
     val $line3$read: $line3.$read.INSTANCE.type = $line3.$read.INSTANCE;
     import $line3$read.$iw.closure;
     import _root_.scala.tools.nsc.interpreter.$u007B$u007B;
     import java.io._;
     import _root_.scala.tools.nsc.interpreter.$u007B$u007B;
     val $line4$read: $line4.$read.INSTANCE.type = $line4.$read.INSTANCE;
     import $line4$read.$iw.serialize;
     sealed class $iw extends _root_.java.io.Serializable {
       def <init>() = {
         super.<init>;
         ()
       };
       val res0 = serialize(closure)
     };
     val $iw = new $iw.<init>
   }
   object $read extends scala.AnyRef {
     def <init>() = {
       super.<init>;
       ()
     };
     val INSTANCE = new $read.<init>
   }
   java.io.NotSerializableException: NotSerializableClass
     at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
     at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
   ```
   
   We already have Spark-contributed code that inserts the temporary vals like 
`val $line3$read` to ensure we serialize the full object graph.
   
   We also have a have an extra post-REPL compiler phases that does some a 
countervailing cleanup to remove these temp vals when the import that gave rise 
to them doesn't actually contribute terms to the code.
   
   I'm wondering if the compiler also ought to bend the rules a little bit and 
force capture-by-copy of `topLevelValue`. In general, we can't do that in 
because it would would change evaluation order.
   
   ```
   class C {
     val closure = () => topLevelValue // must capture `C`, not the `String` to 
avoid seeing the `null`.
     val topLevelValue = "" 
   }
   ```
   
   Capturing early would leak the uninitialized field, but the expression ought 
not be valid the REPL because it has a forward reference. The REPL doesn't emit 
the "invalid forward reference" error for `scala> () => foo; val foo = ""`, but 
we could make it do so.
   
   So with a small change to the compiler we could make the new test pass 
without changes to `ClosureCleaner`. Is this worth it? That would depend on 
seeing a broader range of test cases, and seeing if they would also be fixed 
with a change to capture.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to