Github user Earne commented on the pull request:

    https://github.com/apache/spark/pull/12162#issuecomment-206238646
  
    @rxin The use case that motivate this is about below.
    
    - Java objects consume a factor of 2-5x more space than the “raw” data 
inside their fields.
    
    - Running graphx.LiveJournalPageRank example on a 8 nodes cluster (1 work 
as Master, each configured with 45GB memory for Spark running in  legacy memory 
management mode). The dataset (about 30GB) is generated by HiBench, while 
running 5 iterations, time of each iteration is getting worse and worse.
    
    - By analyzing the log file, I realize that it is because memory space for 
cached RDD is not sufficient, and lots of partition with high recomputing cost 
is dropped. Recomputing these partitions brought in lots of time.
    
    - FIFO can be implemented by initialize 
[entries](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala#L90)
 with LinkedHashMap\[BlockId, MemoryEntry\[_\]\](32, 0.75f, false). And even 
FIFO can get much better performance than LRU.
    
    - Storage level such as MEMORY_AND_DISK  may partial solve the problem, but 
the effect is not very good.
    
     An eviction strategy taken the computing cost into consideration may work 
well (even in unified memory mode or use the MEMORY_AND_DISK level). Some 
cost-aware replacement policy already exists in K-V stores, such as 
GD-Wheel(EuroSys’15).
    
    This PR can be separated to below sub-task.
    - [ ] Refactor to  support more than one policy (LRU, FIFO, LFU).
    
    - [ ] Add a policy that taken the computing cost into consideration.
    
    - [ ] Taken serialize and deserialize cost into consideration.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to