[Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner

Tim Armstrong (Code Review) Thu, 19 Oct 2017 17:37:51 -0700

Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7822 )


Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet 
scanner
......................................................................


Patch Set 5:

(6 comments)

Mostly just cleanup and simplification comments now.

http://gerrit.cloudera.org:8080/#/c/7822/5/be/src/exec/hdfs-parquet-scanner.h
File be/src/exec/hdfs-parquet-scanner.h:

http://gerrit.cloudera.org:8080/#/c/7822/5/be/src/exec/hdfs-parquet-scanner.h@45
PS5, Line 45: INTERNAL_TYPE
I should have mentioned this earlier, but I think we mostly use CamelCase for 
type names inside templates. We're not totally consistent throughout the 
codebase but that's the general pattern.


http://gerrit.cloudera.org:8080/#/c/7822/5/be/src/exec/parquet-common.h
File be/src/exec/parquet-common.h:

http://gerrit.cloudera.org:8080/#/c/7822/5/be/src/exec/parquet-common.h@328
PS5, Line 328: inline int ParquetPlainEncoder::Encode(
It looks like we could also deduplicate the Encode/Decode methods for 
FIXED_LEN_BYTE_ARRAY with the same technique.


http://gerrit.cloudera.org:8080/#/c/7822/5/be/src/exec/parquet-common.h@350
PS5, Line 350: Decode<Decimal4Value,parquet::Type::FIXED_LEN_BYTE_ARRAY>(const 
uint8_t* buffer,
nit: missing space


http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-plain-test.cc
File be/src/exec/parquet-plain-test.cc:

http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-plain-test.cc@193
PS4, Line 193: izeof(D
> for var len decimals, its encoded size is bytes required to store size + le
Thanks for the explanation.


http://gerrit.cloudera.org:8080/#/c/7822/5/be/src/exec/parquet-plain-test.cc
File be/src/exec/parquet-plain-test.cc:

http://gerrit.cloudera.org:8080/#/c/7822/5/be/src/exec/parquet-plain-test.cc@112
PS5, Line 112: //    Decimal4Value: General test case,
I find this a bit confusing still because the values being encoded aren't in 
the table. Could we take it even further and have something like:

  struct DecimalTestCase {
    int decimal_byte_size;
    int128_t decimal_val;
    int expected_size;
  }

That way all of the inputs to each test would be in the same place.


http://gerrit.cloudera.org:8080/#/c/7822/5/be/src/exec/parquet-plain-test.cc@200
PS5, Line 200: sizeof(int32_t)
Is there any reason to keep the encoding overhead separate from the encoded 
size? I would have thought we would just check the total size.



--
To view, visit http://gerrit.cloudera.org:8080/7822
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e
Gerrit-Change-Number: 7822
Gerrit-PatchSet: 5
Gerrit-Owner: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mjac...@apache.org>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Fri, 20 Oct 2017 00:36:39 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner

Reply via email to