Wes McKinney created ARROW-8599: ----------------------------------- Summary: [C++][Parquet] Optional parallel processing when writing Parquet files Key: ARROW-8599 URL: https://issues.apache.org/jira/browse/ARROW-8599 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wes McKinney Fix For: 1.0.0
If we permit encoded columns in row groups to be buffered in memory rather than immediately written out to the {{OutputStream}}, then we can use multiple threads for the encoding / compression of columns. Combined with a separate thread to take the encoded columns and write them out to disk, this should yield substantially improved file write times. This could be enabled through an option since it would increase memory use when writing. The memory use can be somewhat constrained by limiting the size of row groups -- This message was sent by Atlassian Jira (v8.3.4#803005)