Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8319 )
Change subject: IMPALA-4123: Columnar decoding in Parquet ...................................................................... Patch Set 16: (5 comments) Addressed comments and ran exhaustive tests overnight. http://gerrit.cloudera.org:8080/#/c/8319/16/be/src/util/dict-encoding.h File be/src/util/dict-encoding.h: http://gerrit.cloudera.org:8080/#/c/8319/16/be/src/util/dict-encoding.h@464 PS16, Line 464: uint8_t* curr_value = reinterpret_cast<uint8_t*>(first_value); > It could be useful to create a simple struct with a pointer + stride that h Sorry about that, it wasn't a deliberate oversight, just sloppiness on my part. I took the idea and used it in a bunch of places - internally in many functions and also at function boundaries where the callee advances the pointer (it simplified things). I didn't use it in the BitPacking interfaces, which are pretty low level and I don't think would benefit from it, and in other interfaces where the pointer wasn't advanced by the callee (since it didn't lead to simpler code). http://gerrit.cloudera.org:8080/#/c/8319/15/testdata/data/README File testdata/data/README: http://gerrit.cloudera.org:8080/#/c/8319/15/testdata/data/README@240 PS15, Line 240: t as > I have no problem with the code, but I am a bit surprised that Hive still t I don't think there's a limit in the Parquet spec - it stores the day as julian days. Afaik the limit just comes from Boost. http://gerrit.cloudera.org:8080/#/c/8319/15/testdata/data/README@244 PS15, Line 244: > nit: double "code" Done http://gerrit.cloudera.org:8080/#/c/8319/15/tests/custom_cluster/test_hive_parquet_timestamp_conversion.py File tests/custom_cluster/test_hive_parquet_timestamp_conversion.py: http://gerrit.cloudera.org:8080/#/c/8319/15/tests/custom_cluster/test_hive_parquet_timestamp_conversion.py@105 PS15, Line 105: # Test that timestamp validation also works as expected when converting timestamps. > This test seem to be an independent test that was appended here to avoid re Done. That will be nice if we can avoid using a custom cluster test. http://gerrit.cloudera.org:8080/#/c/8319/15/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/8319/15/tests/query_test/test_scanners.py@98 PS15, Line 98: new_vector = deepcopy(vector) : new_vector.get_value('exec_option')['batch_size'] = vector.get_value('batch_size') > Can you add a comment about the reason for doing this? Done. For whatever reason this test has the query options as top-level test dimensions rather than part of the exec_option dimension, so they need to be copied over to have any effect. -- To view, visit http://gerrit.cloudera.org:8080/8319 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8c03006981c46ef0dae30602f2b73c253d9b49ef Gerrit-Change-Number: 8319 Gerrit-PatchSet: 16 Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Fri, 09 Nov 2018 18:33:45 +0000 Gerrit-HasComments: Yes