Andy Grove created ARROW-11012:
----------------------------------

             Summary: [Rust] [DataFusion] Make write_csv and write_parquet 
concurrent
                 Key: ARROW-11012
                 URL: https://issues.apache.org/jira/browse/ARROW-11012
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Rust - DataFusion
            Reporter: Andy Grove


ExecutionContext.write_csv and write_parquet currently iterate over the output 
partitions and execute one at a time and write the results out. We should run 
these as tokio tasks so they can run concurrently. This should, in theory, help 
with memory usage when the plan contains repartition operators.

We may want to add a configuration option so we can choose between serial and 
parallel writes?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to