[GitHub] spark pull request: [SPARK-12757] Add block-level read/write locks...

JoshRosen Fri, 19 Feb 2016 13:20:12 -0800

Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/10705#issuecomment-186411074
  
    @zsxwing, one problem is the avoidance of double-release of locks. Consider 
the case where you read a block from the block manager, do a bunch of pipelined 
filters and maps on it, and then do a `take` or `limit`: the `take` or `limit` 
operator doesn't know the block ids of the blocks that are being read and thus 
doesn't know which blocks to register with the task completion callback. Even 
if we did know which blocks to register, we'd have to then guard against 
double-release of a block in case the block would be freed by 
CompletionIterator (in the case where the take() ends up consuming the entire 
iterator).
    
    The more principled approach to handling `take()` and `limit()` is to use 
an internal iterator interface which supports an explicit `close()` operation 
to signal the fact that the rest of the iterator will not be consumed.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-12757] Add block-level read/write locks...

Reply via email to