Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8319 )

Change subject: IMPALA-4123: Columnar decoding in Parquet
......................................................................


Patch Set 16:

(5 comments)

Addressed comments and ran exhaustive tests overnight.

http://gerrit.cloudera.org:8080/#/c/8319/16/be/src/util/dict-encoding.h
File be/src/util/dict-encoding.h:

http://gerrit.cloudera.org:8080/#/c/8319/16/be/src/util/dict-encoding.h@464
PS16, Line 464: uint8_t* curr_value = reinterpret_cast<uint8_t*>(first_value);
> It could be useful to create a simple struct with a pointer + stride that h
Sorry about that, it wasn't a deliberate oversight, just sloppiness on my part. 
I took the idea and used it in a bunch of places - internally in many functions 
and also at function boundaries where the callee advances the pointer (it 
simplified things).

I didn't use it in the BitPacking interfaces, which are pretty low level and I 
don't think would benefit from it, and in other interfaces where the pointer 
wasn't advanced by the callee (since it didn't lead to simpler code).


http://gerrit.cloudera.org:8080/#/c/8319/15/testdata/data/README
File testdata/data/README:

http://gerrit.cloudera.org:8080/#/c/8319/15/testdata/data/README@240
PS15, Line 240: t as
> I have no problem with the code, but I am a bit surprised that Hive still t
I don't think there's a limit in the Parquet spec - it stores the day as julian 
days. Afaik the limit just comes from Boost.


http://gerrit.cloudera.org:8080/#/c/8319/15/testdata/data/README@244
PS15, Line 244:
> nit: double "code"
Done


http://gerrit.cloudera.org:8080/#/c/8319/15/tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
File tests/custom_cluster/test_hive_parquet_timestamp_conversion.py:

http://gerrit.cloudera.org:8080/#/c/8319/15/tests/custom_cluster/test_hive_parquet_timestamp_conversion.py@105
PS15, Line 105:     # Test that timestamp validation also works as expected 
when converting timestamps.
> This test seem to be an independent test that was appended here to avoid re
Done. That will be nice if we can avoid using a custom cluster test.


http://gerrit.cloudera.org:8080/#/c/8319/15/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/8319/15/tests/query_test/test_scanners.py@98
PS15, Line 98:     new_vector = deepcopy(vector)
             :     new_vector.get_value('exec_option')['batch_size'] = 
vector.get_value('batch_size')
> Can you add a comment about the reason for doing this?
Done. For whatever reason this test has the query options as top-level test 
dimensions rather than part of the exec_option dimension, so they need to be 
copied over to have any effect.



--
To view, visit http://gerrit.cloudera.org:8080/8319
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8c03006981c46ef0dae30602f2b73c253d9b49ef
Gerrit-Change-Number: 8319
Gerrit-PatchSet: 16
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Fri, 09 Nov 2018 18:33:45 +0000
Gerrit-HasComments: Yes

Reply via email to