Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7822 )

Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet 
scanner
......................................................................


Patch Set 7:

(2 comments)

I think we'll need to do some more work on testing for the int32/64 patches - 
we don't have pre-existing test files from parquet-mr. I think we'll have to 
generate some more test files with parquet-mr for the other cases, and we could 
consider turning that code into a data generator to generate more test files. 
From what I could tell Hive doesn't have a knob to generate some of these 
alternative output encodings.

 I feel ok with the coverage since we have end-to-end tests then more 
exhaustive unit tests for the various ways of encoding it.

http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-common.h
File be/src/exec/parquet-common.h:

http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-common.h@391
PS7, Line 391: fixed_len_size
Looked again. The variable name (and recycling the argument storage) is 
confusing. Maybe 'encoded_byte_size'?


http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-plain-test.cc
File be/src/exec/parquet-plain-test.cc:

http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-plain-test.cc@33
PS7, Line 33: int EncodeVarLenDecimal(const DECIMAL_TYPE& t, int 
fixed_len_size, uint8_t* buffer){
I took another look at the standard and it says that the minimum number of 
bytes required to store the unscaled value should be used: 
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal

I think this means that we should not be including any preceding "0" bytes. 
I.e. we should not have a fixed_len_size argument and instead determine the 
size based on the number of leading zero bytes in the value.



--
To view, visit http://gerrit.cloudera.org:8080/7822
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e
Gerrit-Change-Number: 7822
Gerrit-PatchSet: 7
Gerrit-Owner: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Wed, 01 Nov 2017 20:53:08 +0000
Gerrit-HasComments: Yes

Reply via email to