holdenk created BEAM-3290:
-----------------------------

             Summary: Construct iterators directly if possible to allow 
spilling to disk
                 Key: BEAM-3290
                 URL: https://issues.apache.org/jira/browse/BEAM-3290
             Project: Beam
          Issue Type: Improvement
          Components: runner-spark
            Reporter: holdenk
            Assignee: Amit Sela


When you construct a collection first and convert it to an iterator you force 
Spark to evaluate the entire input partition before it can get the first 
element off the output. This breaks some of the spilling to disk Spark can do 
otherwise. Instead chain operations on Iterators.

This is only possible in the Java API for Spark 2 and above (and that's my 
fault from back in my work in the Spark project).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to