[GitHub] spark pull request: [SPARK-12757][WIP] Use reference counting to p...

JoshRosen Tue, 02 Feb 2016 14:26:00 -0800

Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/10705#issuecomment-178861986
  
    After some investigation, it looks like we can't enable the pin leak 
detection in tests right now:
    
    - The SQL `limit` operator doesn't have any way to release resources of its 
inputs, so any query which uses a limit will report a pin peak.
    - There are several places in GraphX which use single-element RDD iterators 
in order to pass big objects around. These consumers of iterators don't really 
obey the API contracts that `CompletionIterator` requires, so it's kind of 
tricky to automatically free these pins at any time before the end of the task. 
In the case of GraphX, I think that we actually want to retain the pin for this 
long. Adding the _explicit_ machinery to put the release in at the end of the 
task is going to be tricky here, since by the time control reaches the 
`RDD.compute()` code we don't know the IDs of the blocks to unpin.
    
    Given both of these cases, I think it's fine to just unpin everything at 
the very end of the task, since this is effectively what we've been doing 
already. Once other internal refactorings take place, we'll be able to take 
advantage of more granular unpinning starting in SQL.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-12757][WIP] Use reference counting to p...

Reply via email to