[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

CodingCat Mon, 24 Mar 2014 22:57:02 -0700

Github user CodingCat commented on a diff in the pull request:

    https://github.com/apache/spark/pull/214#discussion_r10918466
  
    --- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -198,6 +201,13 @@ private[spark] class TaskSchedulerImpl(
        */
       def resourceOffers(offers: Seq[WorkerOffer]): Seq[Seq[TaskDescription]] 
= synchronized {
         SparkEnv.set(sc.env)
    +    // Make thread pool local for shutdown before the function returns
    +    // This is for driver can exit normally which not call sc.stop or 
sys.exit
    +    val serializeWorkerPool = new ThreadPoolExecutor(
    --- End diff --
    
    Hi, I didn't check your previous discussion in JIRA, but according to Kay, 
the ideal case is to make the process of serializing tasks asynchronous; in 
your current approach, it's actually synchronous (L278 - L281); 
    
    
    you can check how TaskResultGetter works: the taskRunner finishes the task 
and sends a message to the driver, CoarseSchedulerBackend (CSB)...CSB receives 
that and then taskScheduler -> taskManager -> dagScheduler (simply function 
call, no message) to mark the task as finished. The Async process is achieved 
by the message
    
    I think @kayousterhout 's idea is that you develop new messages in your 
patch, when the serialization is finished, it notifies the CSB to launch the 
task....



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

Reply via email to