Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/21566 )
Change subject: IMPALA-13193: RuntimeFilter on parquet dictionary should evaluate NULL values ...................................................................... Patch Set 4: (2 comments) Thanks for fixing this bug! It look a really nasty correctness issue. http://gerrit.cloudera.org:8080/#/c/21566/3/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/21566/3/be/src/exec/parquet/hdfs-parquet-scanner.cc@1927 PS3, Line 1927: if (!column_has_match) { > Hi, Michael! Relying on statistics is unreliable. I think Zihao's suggest If null_count is set, then we should assume that it is reliable - if not, then it is a writer error. Impala also "believes" min/max stats when doing row group or page filtering. I am ok with keeping as it is, just please add a todo that null_count could be also checked. http://gerrit.cloudera.org:8080/#/c/21566/4/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/21566/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@1933 PS4, Line 1933: && ExecNode::EvalConjuncts(dict_filter_conjunct_evals->data(), : dict_filter_conjunct_evals->size(), &row))) { Is this part needed? For "normal" (not runtime filter) conjuncts the planner doesn't allow dictionary filtering if NULL is accepted: https://github.com/apache/impala/blob/00d0b0dda1e215d8e91ff52688fe6654bee52282/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java#L1108 so the query below does not use dictionary filtering: select * from parq_tbl d where COALESCE(d.name, '') = '' I am ok with keeping this condition for safety, just a comment could be added about this. -- To view, visit http://gerrit.cloudera.org:8080/21566 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f69405c0c08feb47141d080a828847e5094163f Gerrit-Change-Number: 21566 Gerrit-PatchSet: 4 Gerrit-Owner: ttttttz <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Wenzhe Zhou <[email protected]> Gerrit-Reviewer: Zihao Ye <[email protected]> Gerrit-Reviewer: ttttttz <[email protected]> Gerrit-Comment-Date: Tue, 09 Jul 2024 09:58:41 +0000 Gerrit-HasComments: Yes
