> > Why is DELTA_BINARY_PACKED not supported for writing Parquet since it > may be useful for some sorted int column?
No one has implemented it yet. Also, I was wondering if the original Parquet implementation(seems > like parquet-mr) give each data type a default encoding(or fall back > encoding when dictionary is enabled) other than PLAIN? It seems like > for Arrow, we have to manually set the best encoding for each column. > Would a default non plain encoding be beneficial? I was under the impression that we use the same defaults as parquet-mr (i.e. try to use dictionary encoding and then fallback to plain). But this might be incorrect. In general, the problem with encodings introduced at the time of DataPage V2 still aren't fully implemented in the C++ library so there are some interoperability risks, hence we are conservative with what is supported. Another question is, as from the encoding section, reading is much > more supported than writing Parquet, does it mean Arrow is more > favored to be used as a read-only query engine rather than writing? > e.g it simply uses Parquet files sourced from other applications. The goal is to complete the implementation, but no one has the bandwidth/motivation to complete the write functionality, contributions are welcome. Thanks, Micah On Wed, Mar 23, 2022 at 5:40 AM Xinyu Zeng <[email protected]> wrote: > Why is DELTA_BINARY_PACKED not supported for writing Parquet since it > may be useful for some sorted int column? > > Also, I was wondering if the original Parquet implementation(seems > like parquet-mr) give each data type a default encoding(or fall back > encoding when dictionary is enabled) other than PLAIN? It seems like > for Arrow, we have to manually set the best encoding for each column. > Would a default non plain encoding be beneficial? > > Another question is, as from the encoding section, reading is much > more supported than writing Parquet, does it mean Arrow is more > favored to be used as a read-only query engine rather than writing? > e.g it simply uses Parquet files sourced from other applications. > > Thanks >
