Github user squito commented on the issue:

    https://github.com/apache/spark/pull/15505
  
    This might be a really dumb question, but please humor me -- can you 
explain to me the threads involved, both before and after this change?  I must 
be missing something, because it seems to me it will be the same thread both 
before and after this change.
    
    `taskScheduler.resourceOffers()` is only called by 
`CoarseGrainedSchedulerBackend`, inside its event loop.  `taskScheduler` has a 
handful of other threads that interact with it, but nothing that will trigger 
resource offers.  Eg. some methods end up calling `backend.reviveOffers()`, but 
that pushes an event into the `CoarseGrainedSchedulerBackend`'s event queue 
anyway.
    
    OTOH, though this PR doesn't change the thread that serialization is 
happening, after this change serialization happens *without a lock* on the 
`taskScheduler`.  This could be important because when a task finishes, we need 
to both (1) update the taskset's internal state & process the result and (2) 
schedule new tasks on the freed resources.  Those two can happen in different 
threads, but both involve locks on `taskScheduler`; by moving serialization so 
it no longer requires a lock, this reduces contention on that lock and improves 
performance.
    
    Is that accurate, or am I completely missing something?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to