retronym commented on pull request #28463:
URL: https://github.com/apache/spark/pull/28463#issuecomment-625082537
Brainstorming a little. An alternative design of `ClosureCleaner` would be
to customise serialization of REPL line wrapper objects so they replaced
non-serializable values with null or some other sentinel value. If the
deserialized code attempted to use one of the absent values, a runtime error
could be issued.
Pros:
- Simplicity of implementation (no static analysis needed)
- Will handle all the corner cases
Cons:
- Error comes lazily during user code execution
Okay, just wanted to put that out there.
In case you're not aware, there is a handy secret handshake in the REPL to
show the wrappers. Add ` // show` at the end of the line for:
```
$ scala-ref 2.12.x -Yrepl-class-based -Yrepl-use-magic-imports
```
```
scala> class NotSerializableClass(val x: Int); val ns = new
NotSerializableClass(42); val topLevelValue = "someValue"; val closure =(j:
Int) => { (1 to j).flatMap { x => (1 to x).map { y => y + topLevelValue } } }
// show
sealed class $read extends _root_.java.io.Serializable {
def <init>() = {
super.<init>;
()
};
sealed class $iw extends _root_.java.io.Serializable {
def <init>() = {
super.<init>;
()
};
class NotSerializableClass extends scala.AnyRef {
<paramaccessor> val x: Int = _;
def <init>(x: Int) = {
super.<init>;
()
}
};
val ns = new NotSerializableClass.<init>(42);
val topLevelValue = "someValue";
val closure = ((j: Int) => 1.to(j).flatMap(((x) => 1.to(x).map(((y) => y
+ topLevelValue)))))
};
val $iw = new $iw.<init>
}
object $read extends scala.AnyRef {
def <init>() = {
super.<init>;
()
};
val INSTANCE = new $read.<init>
}
defined class NotSerializableClass
ns: NotSerializableClass = NotSerializableClass@476fde05
topLevelValue: String = someValue
closure: Int => scala.collection.immutable.IndexedSeq[String] =
$Lambda$1151/1215025252@5111de7c
scala> import java.io._; def serialize(obj: AnyRef): Array[Byte] = { val
buffer = new ByteArrayOutputStream; val out = new ObjectOutputStream(buffer);
out.writeObject(obj); buffer.toByteArray }
import java.io._
serialize: (obj: AnyRef)Array[Byte]
scala> serialize(closure) // show
sealed class $read extends _root_.java.io.Serializable {
def <init>() = {
super.<init>;
()
};
val $line3$read: $line3.$read.INSTANCE.type = $line3.$read.INSTANCE;
import $line3$read.$iw.closure;
import _root_.scala.tools.nsc.interpreter.$u007B$u007B;
import java.io._;
import _root_.scala.tools.nsc.interpreter.$u007B$u007B;
val $line4$read: $line4.$read.INSTANCE.type = $line4.$read.INSTANCE;
import $line4$read.$iw.serialize;
sealed class $iw extends _root_.java.io.Serializable {
def <init>() = {
super.<init>;
()
};
val res0 = serialize(closure)
};
val $iw = new $iw.<init>
}
object $read extends scala.AnyRef {
def <init>() = {
super.<init>;
()
};
val INSTANCE = new $read.<init>
}
java.io.NotSerializableException: NotSerializableClass
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
```
We already have Spark-contributed code that inserts the temporary vals like
`val $line3$read` to ensure we serialize the full object graph.
We also have a have an extra post-REPL compiler phases that does some a
countervailing cleanup to remove these temp vals when the import that gave rise
to them doesn't actually contribute terms to the code.
I'm wondering if the compiler also ought to bend the rules a little bit and
force capture-by-copy of `topLevelValue`. In general, we can't do that in
because it would would change evaluation order.
```
class C {
val closure = () => topLevelValue // must capture `C`, not the `String` to
avoid seeing the `null`.
val topLevelValue = ""
}
```
Capturing early would leak the uninitialized field, but the expression ought
not be valid the REPL because it has a forward reference. The REPL doesn't emit
the "invalid forward reference" error for `scala> () => foo; val foo = ""`, but
we could make it do so.
So with a small change to the compiler we could make the new test pass
without changes to `ClosureCleaner`. Is this worth it? That would depend on
seeing a broader range of test cases, and seeing if they would also be fixed
with a change to capture.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]