Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/5608#issuecomment-94922337
  
    In the sampling case, it considers the size of each element directly, so I 
think it would re-count shared data structures? I think that would explain the 
behavior here. Spark thinks memory is really full, so spills, but the spilled 
data is tiny since there wasn't that much data and memory wasn't nearly as full 
as it looked. So would it make sense to have both of these paths use the 
`enqueue` method (don't entirely know how that works)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to