Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/8775 )
Change subject: IMPALA-4993: extend dictionary filtering to collections ...................................................................... Patch Set 15: (3 comments) update fixes the bug tim spotted. looking into how to test this for a follow-up change. http://gerrit.cloudera.org:8080/#/c/8775/15/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8775/15/be/src/exec/hdfs-parquet-scanner.cc@1645 PS15, Line 1645: Status HdfsParquetScanner::InitColumns( > Yeah. I wonder if it would work out simpler if we had: thanks for the suggestions, went with that. http://gerrit.cloudera.org:8080/#/c/8775/15/be/src/exec/hdfs-parquet-scanner.cc@1645 PS15, Line 1645: Status HdfsParquetScanner::InitColumns( > I think there's a bug here with nested collections in files with multiple r Made a change along the lines you suggested below, which should address the missing reset calls (and simplifies the code). I saw that the test data exercised multiple row-groups, but from your conment, yeah, it seems that for collections, all testdata that I'm aware of (tpch nested parquet and complextypestbl parquet) has a single row-group per file. http://gerrit.cloudera.org:8080/#/c/8775/15/be/src/exec/hdfs-parquet-scanner.cc@1645 PS15, Line 1645: Status HdfsParquetScanner::InitColumns( > Nice catch! Looks like a bug to me too. The code seems to assume that all c I was unclear on this one. Though that code is gone now, if you think that something still looks off there, pls clarify. Before, the code partitioned top-level scalar columns that could be filtered and the rest. Calls to init assumed such a partition. Prior patch missed the collection calls to init when not filtering by dictionaries. Updated patch now inits these collection columns. Other than init, I didn't see any other places where we go through dict/non_dict and require that all readers are dealt with. -- To view, visit http://gerrit.cloudera.org:8080/8775 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd Gerrit-Change-Number: 8775 Gerrit-PatchSet: 15 Gerrit-Owner: Vuk Ercegovac <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Vuk Ercegovac <[email protected]> Gerrit-Comment-Date: Tue, 16 Jan 2018 02:46:51 +0000 Gerrit-HasComments: Yes
