Re: Files with inconsistent num_rows and num_values?

2023-12-05 Thread Micah Kornfield
Thanks for checking. On Tuesday, December 5, 2023, Gang Wu wrote: > I scanned through the parquet-mr implementation. It provides a row-wise > interface to write records in the ColumnWriter. This cannot reproduce > the issue in this thread. I suspect some other implementations may have > their

Re: Files with inconsistent num_rows and num_values?

2023-12-05 Thread Gang Wu
I scanned through the parquet-mr implementation. It provides a row-wise interface to write records in the ColumnWriter. This cannot reproduce the issue in this thread. I suspect some other implementations may have their own column-wise column writer implementations and only write pages to the

Re: Files with inconsistent num_rows and num_values?

2023-11-28 Thread Micah Kornfield
Hi Gang, For writes I'm seeing "parquet-mr version 1.11.1" and "parquet-mr version 1.10.1". I need to look more into the page headers to check for consistency. At the column level, in some cases the number of values read by pyarrow is consistent with num_rows and in some cases it is consistent

Re: Files with inconsistent num_rows and num_values?

2023-11-28 Thread Gang Wu
Hi Micah, Does the FileMetaData.version [1] provide any information about the writer? What about the num_values in each page header? Is the actual number of values consistent with num_values in the ColumnMetaData? [1]