Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8775 )

Change subject: IMPALA-4993: extend dictionary filtering to collections
......................................................................


Patch Set 15:

(3 comments)

update fixes the bug tim spotted. looking into how to test this for a follow-up 
change.

http://gerrit.cloudera.org:8080/#/c/8775/15/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8775/15/be/src/exec/hdfs-parquet-scanner.cc@1645
PS15, Line 1645: Status HdfsParquetScanner::InitColumns(
> Yeah. I wonder if it would work out simpler if we had:
thanks for the suggestions, went with that.


http://gerrit.cloudera.org:8080/#/c/8775/15/be/src/exec/hdfs-parquet-scanner.cc@1645
PS15, Line 1645: Status HdfsParquetScanner::InitColumns(
> I think there's a bug here with nested collections in files with multiple r
Made a change along the lines you suggested below, which should address the 
missing reset calls (and simplifies the code).

I saw that the test data exercised multiple row-groups, but from your conment, 
yeah, it seems that for collections, all testdata that I'm aware of (tpch 
nested parquet and complextypestbl parquet) has a single row-group per file.


http://gerrit.cloudera.org:8080/#/c/8775/15/be/src/exec/hdfs-parquet-scanner.cc@1645
PS15, Line 1645: Status HdfsParquetScanner::InitColumns(
> Nice catch! Looks like a bug to me too. The code seems to assume that all c
I was unclear on this one. Though that code is gone now, if you think that 
something still looks off there, pls clarify. Before, the code partitioned 
top-level scalar columns that could be filtered and the rest. Calls to init 
assumed such a partition. Prior patch missed the collection calls to init when 
not filtering by dictionaries. Updated patch now inits these collection 
columns. Other than init, I didn't see any other places where we go through 
dict/non_dict and require that all readers are dealt with.



--
To view, visit http://gerrit.cloudera.org:8080/8775
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd
Gerrit-Change-Number: 8775
Gerrit-PatchSet: 15
Gerrit-Owner: Vuk Ercegovac <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Vuk Ercegovac <[email protected]>
Gerrit-Comment-Date: Tue, 16 Jan 2018 02:46:51 +0000
Gerrit-HasComments: Yes

Reply via email to