Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/7822 )
Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet scanner ...................................................................... Patch Set 4: (16 comments) http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-column-readers.cc@1277 PS4, Line 1277: switch (node.element->type) { > This case is only going to get bigger with the follow on patches - it would Done http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-common.h File be/src/exec/parquet-common.h: http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-common.h@376 PS4, Line 376: inline int ParquetPlainEncoder::Decode<Decimal4Value, parquet::Type::BYTE_ARRAY>( > I think you can avoid stamping this out if you leave it still parameterized Unfortunately, the call to Decode is explicit in our code, eg Decode<T,S> therefore it will always look for a specialization of those template parameters. Similarly, in the dummy program you wrote, if I explicitly call foo<int, double>(&a, &x), this will look for a specialization of the double templated method and return 0; Since partial specialization is not possible, the only way to reduce redundant code will be to do what Lars suggested earlier, i.e., create a helper method that is only templatized on <typename T> and call that for each full specialization. I will make changes accordingly in the next iteration of this patch. http://gerrit.cloudera.org:8080/#/c/7822/2/be/src/exec/parquet-common.h File be/src/exec/parquet-common.h: http://gerrit.cloudera.org:8080/#/c/7822/2/be/src/exec/parquet-common.h@237 PS2, Line 237: > I think this particular call here will always return sizeof(int32_t) (line should we do that everywhere else in this file wherever we specialize a method(both Decode and encode)? http://gerrit.cloudera.org:8080/#/c/7822/2/be/src/exec/parquet-common.h@338 PS2, Line 338: return fixed_len_size; > I'm not sure I'm following: it looks like the next three methods are exactl Oh I see what you mean now. Done! http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-metadata-utils.cc File be/src/exec/parquet-metadata-utils.cc: http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-metadata-utils.cc@66 PS4, Line 66: bool IsSupportedPhysicalType(PrimitiveType impala_type, > let's make it static - we don't want to export the symbol to be linked with Done. enclosed in an anonymous namespace. http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-metadata-utils.cc@70 PS4, Line 70: ( > unnecessary parens Done http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-metadata-utils.cc@173 PS4, Line 173: // TODO: Is this check too strict? > I don't see why this shouldn't match the file metadata, this seems valid to Done http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-metadata-utils.cc@182 PS4, Line 182: parquet::SchemaElement* > Why not pass by const ref like above? Done http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-metadata-utils.cc@209 PS4, Line 209: stringstream ss; > Can you convert to using Substitute() while we're here? We've been very gra Done http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-plain-test.cc File be/src/exec/parquet-plain-test.cc: http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-plain-test.cc@176 PS4, Line 176: TestType<Decimal4Value, parquet::Type::BYTE_ARRAY>(Decimal4Value(test_val), > It would be nice to combine the FIXED_LEN_BYTE_ARRAY and BYTE_ARRAY tests. Done http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-plain-test.cc@193 PS4, Line 193: int32_t > Any reason to use sizeof(int..) instead of sizeof(Decimal*Value) like above for var len decimals, its encoded size is bytes required to store size + least num of bytes required to store num. in this case bytes required to store numeric_limits<int32_t>::max() is sizeof(int32_t) which is less than the sizeof(Decimal8Value) since Decimal8Value is stored using int64 I will add a comment at the top of these byte_array test to clarify what expected size is. would you recommend I add more comments before limit tests? http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-plain-test.cc@212 PS4, Line 212: for (int i = 1; i <=16; ++i) { > nit: missing space Done http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/util/dict-encoding.h File be/src/util/dict-encoding.h: http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/util/dict-encoding.h@104 PS4, Line 104: PHYSICAL_TYPE > Isn't this PHYSICAL_TYPE the internal type, which is called LOGICAL_TYPE in reverted to 'T' to minimize change from previous code http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/util/dict-encoding.h@199 PS4, Line 199: PHYSICAL_TYPE > It doesn't seem like the whole decoding class needs to be templated by this Done http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/util/dict-encoding.h@323 PS4, Line 323: DictDecoder<Decimal16Value, parquet::Type::BYTE_ARRAY>::GetNextValue( > The logic here is also the same as above - we should find a way to avoid th removed, as it is no longer needed since the template param is removed and moved to Reset() only http://gerrit.cloudera.org:8080/#/c/7822/4/testdata/data/binary_decimal_dictionary.parquet File testdata/data/binary_decimal_dictionary.parquet: PS4: > Can you update the README to describe how these files were generated. this file was generated a long while back and I picked it up from an abandoned patch here https://gerrit.cloudera.org/#/c/5115/ Currently looking for a better way to generate this file using the java parquet client. -- To view, visit http://gerrit.cloudera.org:8080/7822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e Gerrit-Change-Number: 7822 Gerrit-PatchSet: 4 Gerrit-Owner: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Thu, 19 Oct 2017 20:34:56 +0000 Gerrit-HasComments: Yes
