Wes McKinney created ARROW-8599:
-----------------------------------

             Summary: [C++][Parquet] Optional parallel processing when writing 
Parquet files
                 Key: ARROW-8599
                 URL: https://issues.apache.org/jira/browse/ARROW-8599
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Wes McKinney
             Fix For: 1.0.0


If we permit encoded columns in row groups to be buffered in memory rather than 
immediately written out to the {{OutputStream}}, then we can use multiple 
threads for the encoding / compression of columns. Combined with a separate 
thread to take the encoded columns and write them out to disk, this should 
yield substantially improved file write times.

This could be enabled through an option since it would increase memory use when 
writing. The memory use can be somewhat constrained by limiting the size of row 
groups



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to