Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/7822 )
Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet scanner ...................................................................... Patch Set 5: (6 comments) Mostly just cleanup and simplification comments now. http://gerrit.cloudera.org:8080/#/c/7822/5/be/src/exec/hdfs-parquet-scanner.h File be/src/exec/hdfs-parquet-scanner.h: http://gerrit.cloudera.org:8080/#/c/7822/5/be/src/exec/hdfs-parquet-scanner.h@45 PS5, Line 45: INTERNAL_TYPE I should have mentioned this earlier, but I think we mostly use CamelCase for type names inside templates. We're not totally consistent throughout the codebase but that's the general pattern. http://gerrit.cloudera.org:8080/#/c/7822/5/be/src/exec/parquet-common.h File be/src/exec/parquet-common.h: http://gerrit.cloudera.org:8080/#/c/7822/5/be/src/exec/parquet-common.h@328 PS5, Line 328: inline int ParquetPlainEncoder::Encode( It looks like we could also deduplicate the Encode/Decode methods for FIXED_LEN_BYTE_ARRAY with the same technique. http://gerrit.cloudera.org:8080/#/c/7822/5/be/src/exec/parquet-common.h@350 PS5, Line 350: Decode<Decimal4Value,parquet::Type::FIXED_LEN_BYTE_ARRAY>(const uint8_t* buffer, nit: missing space http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-plain-test.cc File be/src/exec/parquet-plain-test.cc: http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-plain-test.cc@193 PS4, Line 193: izeof(D > for var len decimals, its encoded size is bytes required to store size + le Thanks for the explanation. http://gerrit.cloudera.org:8080/#/c/7822/5/be/src/exec/parquet-plain-test.cc File be/src/exec/parquet-plain-test.cc: http://gerrit.cloudera.org:8080/#/c/7822/5/be/src/exec/parquet-plain-test.cc@112 PS5, Line 112: // Decimal4Value: General test case, I find this a bit confusing still because the values being encoded aren't in the table. Could we take it even further and have something like: struct DecimalTestCase { int decimal_byte_size; int128_t decimal_val; int expected_size; } That way all of the inputs to each test would be in the same place. http://gerrit.cloudera.org:8080/#/c/7822/5/be/src/exec/parquet-plain-test.cc@200 PS5, Line 200: sizeof(int32_t) Is there any reason to keep the encoding overhead separate from the encoded size? I would have thought we would just check the total size. -- To view, visit http://gerrit.cloudera.org:8080/7822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e Gerrit-Change-Number: 7822 Gerrit-PatchSet: 5 Gerrit-Owner: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Matthew Jacobs <mjac...@apache.org> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Fri, 20 Oct 2017 00:36:39 +0000 Gerrit-HasComments: Yes