Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/6361#discussion_r31440654
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
    @@ -126,6 +129,21 @@ private[spark] abstract class Task[T](val stageId: 
Int, var partitionId: Int) ex
           taskThread.interrupt()
         }
       }
    +
    +  /**
    +   * Deserializes the task from the broadcast variable.
    +   * If Kryo serialization is being sed, a copy of the buffer is made 
because Kryo deserialization
    +   * is not thread-safe w.r.t. the deserialization buffer (see SPARK-7708)
    +   */
    +  protected[this] def deserialize[T: ClassTag](taskBinary: 
Broadcast[Array[Byte]]): T = {
    +    val ser = SparkEnv.get.closureSerializer.newInstance()
    --- End diff --
    
    I'm hesitant to introduce that sort of global state, so I'd prefer an 
approach that keeps the pool in an `Executor` instance so that we know that it 
gets cleaned up properly.
    
    I think that we can create a pool of `SerializerInstance`s at the top of 
`Executor`, pass the pool into `TaskRunner` via its constructor, then borrow 
instances from the pool in `TaskRunner.run()`.  Specifically, I'm suggesting 
that we make a new class that acts as the pool and implements `borrow()` and 
`release()` methods, since I think that passing the pool interface into 
`TaskRunner` is cleaner / easier to reason about than passing `Executor` itself 
down to the runner.  This class will have to be thread-safe.  It should 
allocate new serializer instances on demand when the pool is empty and an 
instance is requested (this allocation should take place without holding a lock 
/ synchronization).
    
    Maybe Guava has a good standard implementation of this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to