Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7822 )

Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet 
scanner
......................................................................


Patch Set 4:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-column-readers.cc@1277
PS4, Line 1277:         switch (node.element->type) {
> This case is only going to get bigger with the follow on patches - it would
Done


http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-common.h
File be/src/exec/parquet-common.h:

http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-common.h@376
PS4, Line 376: inline int ParquetPlainEncoder::Decode<Decimal4Value, 
parquet::Type::BYTE_ARRAY>(
> I think you can avoid stamping this out if you leave it still parameterized
Unfortunately, the call to Decode is explicit in our code, eg Decode<T,S> 
therefore it will always look for a specialization of those template parameters.
Similarly, in the dummy program you wrote, if I explicitly call foo<int, 
double>(&a, &x), this will look for a specialization of the double templated  
method and return 0;

Since partial specialization is not possible, the only way to reduce redundant 
code will be to do what Lars suggested earlier, i.e., create a helper method 
that is only templatized on <typename T> and call that for each full 
specialization.
I will make changes accordingly in the next iteration of this patch.


http://gerrit.cloudera.org:8080/#/c/7822/2/be/src/exec/parquet-common.h
File be/src/exec/parquet-common.h:

http://gerrit.cloudera.org:8080/#/c/7822/2/be/src/exec/parquet-common.h@237
PS2, Line 237:
> I think this particular call here will always return sizeof(int32_t) (line
should we do that everywhere else in this file wherever we specialize a 
method(both Decode and encode)?


http://gerrit.cloudera.org:8080/#/c/7822/2/be/src/exec/parquet-common.h@338
PS2, Line 338:   return fixed_len_size;
> I'm not sure I'm following: it looks like the next three methods are exactl
Oh I see what you mean now. Done!


http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-metadata-utils.cc
File be/src/exec/parquet-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-metadata-utils.cc@66
PS4, Line 66: bool IsSupportedPhysicalType(PrimitiveType impala_type,
> let's make it static - we don't want to export the symbol to be linked with
Done. enclosed in an anonymous namespace.


http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-metadata-utils.cc@70
PS4, Line 70: (
> unnecessary parens
Done


http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-metadata-utils.cc@173
PS4, Line 173:     // TODO: Is this check too strict?
> I don't see why this shouldn't match the file metadata, this seems valid to
Done


http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-metadata-utils.cc@182
PS4, Line 182:  parquet::SchemaElement*
> Why not pass by const ref like above?
Done


http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-metadata-utils.cc@209
PS4, Line 209:         stringstream ss;
> Can you convert to using Substitute() while we're here? We've been very gra
Done


http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-plain-test.cc
File be/src/exec/parquet-plain-test.cc:

http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-plain-test.cc@176
PS4, Line 176:   TestType<Decimal4Value, 
parquet::Type::BYTE_ARRAY>(Decimal4Value(test_val),
> It would be nice to combine the FIXED_LEN_BYTE_ARRAY and BYTE_ARRAY tests.
Done


http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-plain-test.cc@193
PS4, Line 193: int32_t
> Any reason to use sizeof(int..) instead of sizeof(Decimal*Value) like above
for var len decimals, its encoded size is bytes required to store size + least 
num of bytes required to store num.

in this case bytes required to store numeric_limits<int32_t>::max() is 
sizeof(int32_t) which is less than the sizeof(Decimal8Value) since 
Decimal8Value is stored using int64

I will add a comment at the top of these byte_array test to clarify what 
expected size is. would you recommend I add more comments before limit tests?


http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-plain-test.cc@212
PS4, Line 212:   for (int i = 1; i <=16; ++i) {
> nit: missing space
Done


http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/util/dict-encoding.h
File be/src/util/dict-encoding.h:

http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/util/dict-encoding.h@104
PS4, Line 104: PHYSICAL_TYPE
> Isn't this PHYSICAL_TYPE the internal type, which is called LOGICAL_TYPE in
reverted to 'T' to minimize change from previous code


http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/util/dict-encoding.h@199
PS4, Line 199: PHYSICAL_TYPE
> It doesn't seem like the whole decoding class needs to be templated by this
Done


http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/util/dict-encoding.h@323
PS4, Line 323: DictDecoder<Decimal16Value, 
parquet::Type::BYTE_ARRAY>::GetNextValue(
> The logic here is also the same as above - we should find a way to avoid th
removed, as it is no longer needed since the template param is removed and 
moved to Reset() only


http://gerrit.cloudera.org:8080/#/c/7822/4/testdata/data/binary_decimal_dictionary.parquet
File testdata/data/binary_decimal_dictionary.parquet:

PS4:
> Can you update the README to describe how these files were generated.
this file was generated a long while back and I picked it up from an abandoned 
patch here https://gerrit.cloudera.org/#/c/5115/

Currently looking for a better way to generate this file using the java parquet 
client.



--
To view, visit http://gerrit.cloudera.org:8080/7822
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e
Gerrit-Change-Number: 7822
Gerrit-PatchSet: 4
Gerrit-Owner: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Thu, 19 Oct 2017 20:34:56 +0000
Gerrit-HasComments: Yes

Reply via email to