[GitHub] spark pull request: SPARK-2634: Change MapOutputTrackerWorker.mapS...

mridulm Wed, 23 Jul 2014 03:14:57 -0700

Github user mridulm commented on the pull request:

    https://github.com/apache/spark/pull/1541#issuecomment-49855865
  
    Instead of a ConcurrentHashMap, we should actually move it to a disk backed 
Map - the cleanup of this datastructure is painful - which it can become 
extremely large; particularly for iterative algo's.
    Fortunately, most cases, we just need the last few entries - and so LRU 
scheme by most disk backed map's work beautifully.
    
    We have been using mapdb for this in MapOutputTrackerWorker  - and it has 
worked beautifully.
    @rxin might be particularly interested since he is looking into reduce 
memory footprint of spark
    CC @mateiz - this is what I had mentioned about earlier.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2634: Change MapOutputTrackerWorker.mapS...

Reply via email to