Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/19793 )
Change subject: IMPALA-6433: Add read support for PageHeaderV2 ...................................................................... Patch Set 11: (8 comments) http://gerrit.cloudera.org:8080/#/c/19793/11/be/src/exec/parquet/parquet-column-chunk-reader.h File be/src/exec/parquet/parquet-column-chunk-reader.h: http://gerrit.cloudera.org:8080/#/c/19793/11/be/src/exec/parquet/parquet-column-chunk-reader.h@123 PS11, Line 123: till "Till" could be understood as "until but not including" but here we do read the page header. Maybe omitting "till" would be clearer? http://gerrit.cloudera.org:8080/#/c/19793/8/be/src/exec/parquet/parquet-column-chunk-reader.cc File be/src/exec/parquet/parquet-column-chunk-reader.cc: http://gerrit.cloudera.org:8080/#/c/19793/8/be/src/exec/parquet/parquet-column-chunk-reader.cc@338 PS8, Line 338: // https://github.com/apache/parquet-format/blob/2a481fe1aad64ff770e21734533bb7ef5a057dac/src/main/thrift/parquet.thrift#L578 > This error message is a bit misleading, I would think we tried to read 'co What do you think about this? http://gerrit.cloudera.org:8080/#/c/19793/11/be/src/exec/parquet/parquet-column-chunk-reader.cc File be/src/exec/parquet/parquet-column-chunk-reader.cc: http://gerrit.cloudera.org:8080/#/c/19793/11/be/src/exec/parquet/parquet-column-chunk-reader.cc@259 PS11, Line 259: num_bytes It would be easier to understand if we called it 'num_rep_level_bytes' and the variable on L271 'num_def_level_bytes'. http://gerrit.cloudera.org:8080/#/c/19793/11/be/src/exec/parquet/parquet-column-chunk-reader.cc@318 PS11, Line 318: write likes Nit: writes like. http://gerrit.cloudera.org:8080/#/c/19793/8/be/src/exec/parquet/parquet-column-readers.cc File be/src/exec/parquet/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/19793/8/be/src/exec/parquet/parquet-column-readers.cc@1145 PS8, Line 1145: Status BaseScalarColumnReader::ReadDataPage() { > I thought about this too but wouldn't to it in this patch. The difference o Ok. http://gerrit.cloudera.org:8080/#/c/19793/11/be/src/exec/parquet/parquet-level-decoder.h File be/src/exec/parquet/parquet-level-decoder.h: http://gerrit.cloudera.org:8080/#/c/19793/11/be/src/exec/parquet/parquet-level-decoder.h@57 PS11, Line 57: bytes Nit: "number of bytes" would be clearer. http://gerrit.cloudera.org:8080/#/c/19793/8/be/src/exec/parquet/parquet-level-decoder.h File be/src/exec/parquet/parquet-level-decoder.h: http://gerrit.cloudera.org:8080/#/c/19793/8/be/src/exec/parquet/parquet-level-decoder.h@56 PS8, Line 56: > done - int32_t vs int is used a bit chaotically in Impala and I think that Yeah, but let's do it the right way when we can. http://gerrit.cloudera.org:8080/#/c/19793/11/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: http://gerrit.cloudera.org:8080/#/c/19793/11/testdata/datasets/functional/functional_schema_template.sql@4099 PS11, Line 4099: alltypesagg_parquet_v2_uncompressed Do we still need 'alltypesagg_parquet_v2' on L4074? -- To view, visit http://gerrit.cloudera.org:8080/19793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I282962a6e4611e2b662c04a81592af83ecaf08ca Gerrit-Change-Number: 19793 Gerrit-PatchSet: 11 Gerrit-Owner: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Comment-Date: Thu, 04 May 2023 09:51:49 +0000 Gerrit-HasComments: Yes
