Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/7822 )
Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet scanner ...................................................................... Patch Set 7: (2 comments) I think we'll need to do some more work on testing for the int32/64 patches - we don't have pre-existing test files from parquet-mr. I think we'll have to generate some more test files with parquet-mr for the other cases, and we could consider turning that code into a data generator to generate more test files. From what I could tell Hive doesn't have a knob to generate some of these alternative output encodings. I feel ok with the coverage since we have end-to-end tests then more exhaustive unit tests for the various ways of encoding it. http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-common.h File be/src/exec/parquet-common.h: http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-common.h@391 PS7, Line 391: fixed_len_size Looked again. The variable name (and recycling the argument storage) is confusing. Maybe 'encoded_byte_size'? http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-plain-test.cc File be/src/exec/parquet-plain-test.cc: http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-plain-test.cc@33 PS7, Line 33: int EncodeVarLenDecimal(const DECIMAL_TYPE& t, int fixed_len_size, uint8_t* buffer){ I took another look at the standard and it says that the minimum number of bytes required to store the unscaled value should be used: https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal I think this means that we should not be including any preceding "0" bytes. I.e. we should not have a fixed_len_size argument and instead determine the size based on the number of leading zero bytes in the value. -- To view, visit http://gerrit.cloudera.org:8080/7822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e Gerrit-Change-Number: 7822 Gerrit-PatchSet: 7 Gerrit-Owner: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Wed, 01 Nov 2017 20:53:08 +0000 Gerrit-HasComments: Yes
