Re: Any Parquet implementations might be impacted by PARQUET-2078

2021-08-31 Thread Chao Sun
Thanks Gabor. The Spark community is in the process of releasing Spark 3.2.0 with Parquet 1.12. Any idea when a new release will be available with the fix? we may need to hold off the Spark release for that. Chao On Mon, Aug 30, 2021 at 6:31 AM Gabor Szadovszky wrote: > It turned out that

Re: Any Parquet implementations might be impacted by PARQUET-2078

2021-08-30 Thread Gabor Szadovszky
It turned out that ColumnMetaData.dictionary_page_offset is not impacted by this issue so it is much easier to handle. It seems that 1.12.0 is the first parquet_mr release which writes ColumnChunk.file_offset and according to PARQUET-2078

Any Parquet implementations might be impacted by PARQUET-2078

2021-08-27 Thread Gabor Szadovszky
Hi everyone, It turned out that since parquet-mr 1.12.0 in certain conditions we write wrong values into ColumnMetaData.dictionary_page_offset and ColumnChunk.file_offset