Discrepancy in parquet format documentation

2024-01-06 Thread Kaili Zhang
Hi all I found this page via Google when searching for a description of the parquet binary format: https://parquet.apache.org/docs/file-format/data-pages/. This page suggests that definition levels are written before repetition levels. However, after experimenting with parquet files generated

Re: Pitch for Pcodec Encoding in Parquet

2024-01-06 Thread Martin Loncaric
> > It would be very interesting to expand the comparison against > BYTE_STREAM_SPLIT + compression. Antoine: I created one now, at the bottom of the post . In this case, BYTE_STREAM_SPLIT did worse. parquet-mr is currently a pure