Github user kevincox commented on the pull request:

    https://github.com/apache/spark/pull/6714#issuecomment-110364692
  
    Also why keep the batch size once you know you are going to spill to disk.  
All that does is force you to draw from the iterator in batches.  Once you know 
how big your chunk size should be you can set `batch` to `len(current_chunk)` 
(possibly `*0.8` or something) so that you can do it in a single call since you 
already know you will be spilling to disk.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to