Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/10705#issuecomment-178861986
After some investigation, it looks like we can't enable the pin leak
detection in tests right now:
- The SQL `limit` operator doesn't have any way to release resources of its
inputs, so any query which uses a limit will report a pin peak.
- There are several places in GraphX which use single-element RDD iterators
in order to pass big objects around. These consumers of iterators don't really
obey the API contracts that `CompletionIterator` requires, so it's kind of
tricky to automatically free these pins at any time before the end of the task.
In the case of GraphX, I think that we actually want to retain the pin for this
long. Adding the _explicit_ machinery to put the release in at the end of the
task is going to be tricky here, since by the time control reaches the
`RDD.compute()` code we don't know the IDs of the blocks to unpin.
Given both of these cases, I think it's fine to just unpin everything at
the very end of the task, since this is effectively what we've been doing
already. Once other internal refactorings take place, we'll be able to take
advantage of more granular unpinning starting in SQL.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]