Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/11268 )
Change subject: IMPALA-6373: Allow primitive type widening on parquet tables ...................................................................... Patch Set 1: (7 comments) http://gerrit.cloudera.org:8080/#/c/11268/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11268/1//COMMIT_MSG@12 PS1, Line 12: tinyint I think that it would be useful to mention that tinyint/smallint/int are all encoded as int32 in Parquet. For example the encoded size could be mentioned in parentheses. http://gerrit.cloudera.org:8080/#/c/11268/1/be/src/exec/parquet-common.h File be/src/exec/parquet-common.h: http://gerrit.cloudera.org:8080/#/c/11268/1/be/src/exec/parquet-common.h@260 PS1, Line 260: return ByteSize(*v); Decode() should return the number of bytes read from the original buffer, so in this case not the size of int64_t but the size of int32_t. I checked some places where it is called (dict-encoding.h, parquet-column-readers.cc) and it is used to advance a pointer to the original data. The comment in this file at line 186 also supports this. I guess that this didn't cause an issue because the test table has only one row. Note that two rows with distinct values are probably needed to cause an issue, because the default encoding is dictionary. http://gerrit.cloudera.org:8080/#/c/11268/1/be/src/exec/parquet-common.h@271 PS1, Line 271: ByteSize Same as line 260. http://gerrit.cloudera.org:8080/#/c/11268/1/be/src/exec/parquet-common.h@282 PS1, Line 282: return ByteSize(*v); Same as line 260. http://gerrit.cloudera.org:8080/#/c/11268/1/be/src/exec/parquet-plain-test.cc File be/src/exec/parquet-plain-test.cc: http://gerrit.cloudera.org:8080/#/c/11268/1/be/src/exec/parquet-plain-test.cc@113 PS1, Line 113: int decoded_size = ParquetPlainEncoder::Decode<WidenInternalType, PARQUET_TYPE>( This function should return the encoded size. http://gerrit.cloudera.org:8080/#/c/11268/1/be/src/exec/parquet-plain-test.cc@145 PS1, Line 145: int decoded_size = ParquetPlainEncoder::Decode<WidenInternalType, PARQUET_TYPE>( Same as line 113. http://gerrit.cloudera.org:8080/#/c/11268/1/testdata/data/README File testdata/data/README: http://gerrit.cloudera.org:8080/#/c/11268/1/testdata/data/README@192 PS1, Line 192: single row Please add a second row with different values. -- To view, visit http://gerrit.cloudera.org:8080/11268 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If93394b035c64cf6fc5f37b54d29c034cc1f86e4 Gerrit-Change-Number: 11268 Gerrit-PatchSet: 1 Gerrit-Owner: Fredy Wijaya <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Fredy Wijaya <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Tue, 21 Aug 2018 10:24:04 +0000 Gerrit-HasComments: Yes
