[ 
https://issues.apache.org/jira/browse/SPARK-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004538#comment-14004538
 ] 

Wenchen Fan commented on SPARK-1888:
------------------------------------

I introduce a new member for `Entry` (var dropping: Boolean) and use it as a 
flag. When `ensureFreeSpace` is selecting blocks to be dropped, it will skip 
blocks that marked as dropping. And if `ensureFreeSpace` successfully select 
some to-be-dropped blocks, it will just mark their entries as dropping and 
return them to the caller, let caller do the dropping. If the caller hit 
exception during dropping, it will reset the to-be-dropped blocks' dropping 
flag. All operations(read, write) to the dropping flag is synchronized by 
`entries` so modification to the flag can be seen by other threads immediately.
Can one of the admins verify my diff? [~tdas] [~rxin]

> enhance MEMORY_AND_DISK mode by dropping blocks in parallel
> -----------------------------------------------------------
>
>                 Key: SPARK-1888
>                 URL: https://issues.apache.org/jira/browse/SPARK-1888
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>            Reporter: Wenchen Fan
>
> Sometimes MEMORY_AND_DISK mode is slower than DISK_ONLY mode because of the 
> lock on IO operations(dropping blocks in memory store). As the TODO says, the 
> solution is: only synchronize the selecting of to-be-dropped blocks and do 
> the dropping in parallel. I have a quick fix in my PR: 
> https://github.com/apache/spark/pull/791#issuecomment-43567924
> It's fragile currently  but I'm working on it to make it more robust.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to