Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/6361#discussion_r31440654
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -126,6 +129,21 @@ private[spark] abstract class Task[T](val stageId:
Int, var partitionId: Int) ex
taskThread.interrupt()
}
}
+
+ /**
+ * Deserializes the task from the broadcast variable.
+ * If Kryo serialization is being sed, a copy of the buffer is made
because Kryo deserialization
+ * is not thread-safe w.r.t. the deserialization buffer (see SPARK-7708)
+ */
+ protected[this] def deserialize[T: ClassTag](taskBinary:
Broadcast[Array[Byte]]): T = {
+ val ser = SparkEnv.get.closureSerializer.newInstance()
--- End diff --
I'm hesitant to introduce that sort of global state, so I'd prefer an
approach that keeps the pool in an `Executor` instance so that we know that it
gets cleaned up properly.
I think that we can create a pool of `SerializerInstance`s at the top of
`Executor`, pass the pool into `TaskRunner` via its constructor, then borrow
instances from the pool in `TaskRunner.run()`. Specifically, I'm suggesting
that we make a new class that acts as the pool and implements `borrow()` and
`release()` methods, since I think that passing the pool interface into
`TaskRunner` is cleaner / easier to reason about than passing `Executor` itself
down to the runner. This class will have to be thread-safe. It should
allocate new serializer instances on demand when the pool is empty and an
instance is requested (this allocation should take place without holding a lock
/ synchronization).
Maybe Guava has a good standard implementation of this.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]