[GitHub] spark pull request: SPARK-2638 MapOutputTracker concurrency improv...

javadba Fri, 25 Jul 2014 09:09:14 -0700

Github user javadba commented on the pull request:

    https://github.com/apache/spark/pull/1542#issuecomment-50170143
  
    Hi, thanks for taking the time to review this PR again.  A couple of things 
maybe I should have pointed out: 
    
    a. The strategy here is not to use the container collection "fetching" to 
enforce the proper concurrency. Instead the monitor is on the contained object 
shuffleId.
    b.  To ensure that changes to "fetching" itself do not cause deadlock/race 
conditions, the fetching itself was changed to a ConcurrentSkipList. 
    
    >> In addition, using the monitor of an object from outside to synchronized 
makes it hard to analyse the codes because the monitor of this object may be 
used (or used in the future) in other place and the developer may not notice 
somebody has already used it here.
    
    This has been addressed in my latest fix. There is no place that the 
intern'ed monitor is not used within the synchronized block.
    
    >> The read/write operation of fetching should use the same monitor 
according to the java memory model. Here for different shuffleId, you use 
different monitor. That's wrong.
    This is not true after the latest fix - as described above in my (a), (b).
    
    Thanks again for your help in reviewing this proposed improvement.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2638 MapOutputTracker concurrency improv...

Reply via email to