Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9403 )

Change subject: IMPALA-6324: Support reading RLE-encoded boolean values in 
Parquet scanner
......................................................................


Patch Set 6:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/9403/3/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/9403/3/be/src/exec/parquet-column-readers.cc@673
PS3, Line 673:     switch (page_encoding_) {
> nit: space before (
Done


http://gerrit.cloudera.org:8080/#/c/9403/3/be/src/exec/parquet-column-readers.cc@1304
PS3, Line 1304: }
> Convert to using Substitute() while you're here - we generally prefer using
Done


http://gerrit.cloudera.org:8080/#/c/9403/3/be/src/util/rle-encoding.h
File be/src/util/rle-encoding.h:

http://gerrit.cloudera.org:8080/#/c/9403/3/be/src/util/rle-encoding.h@138
PS3, Line 138:   /// Returns the number of consumed values or 0 if an error 
occurred.
> Can you add some tests to be/src/util/rle-test.cc for the new interface? Th
Done


http://gerrit.cloudera.org:8080/#/c/9403/3/be/src/util/rle-encoding.h@674
PS3, Line 674:        for (int i = 0; i < num_repeats_to_set; ++i) {
> Yes, please do add the benchmark. The old benchmark was comparing two old i
Done


http://gerrit.cloudera.org:8080/#/c/9403/3/testdata/data/README
File testdata/data/README:

http://gerrit.cloudera.org:8080/#/c/9403/3/testdata/data/README@160
PS3, Line 160: Parquet v1 file with RLE encoded boolean column "b" and int 
column "i".
> How was it created? Parquet-mr?
Yes, it was created with parquet-mr (parquet-cli's convert-csv command). I had 
to modify parquet-mr to force RLE encoding for booleans, because by default it 
will use bit packed for Parquet v1. Parquet v2 uses RLE by default, but Impala 
does not read v2 data pages at the moment.


http://gerrit.cloudera.org:8080/#/c/9403/3/testdata/workloads/functional-query/queries/QueryTest/parquet-rle-encoded-bool.test
File 
testdata/workloads/functional-query/queries/QueryTest/parquet-rle-encoded-bool.test:

http://gerrit.cloudera.org:8080/#/c/9403/3/testdata/workloads/functional-query/queries/QueryTest/parquet-rle-encoded-bool.test@4
PS3, Line 4: select count(*) from rle_encoded_bool where b;
> Thanks for the tip, I have created such a table, and something is actually
The new test will catch if rle encoded booleans are returned in the wrong order.


http://gerrit.cloudera.org:8080/#/c/9403/3/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/9403/3/tests/query_test/test_scanners.py@656
PS3, Line 656:     self.client.execute(("""CREATE TABLE {0}.rle_encoded_bool (b 
boolean, i int)
> Nit: trailing """ can go on same line.
Done



--
To view, visit http://gerrit.cloudera.org:8080/9403
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4644bf8cf5d2b7238b05076407fbf78ab5d2c14f
Gerrit-Change-Number: 9403
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Tue, 13 Mar 2018 13:59:39 +0000
Gerrit-HasComments: Yes

Reply via email to