Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21566 )

Change subject: IMPALA-13193: RuntimeFilter on parquet dictionary should 
evaluate NULL values
......................................................................


Patch Set 4:

(2 comments)

Thanks for fixing this bug! It look a really nasty correctness issue.

http://gerrit.cloudera.org:8080/#/c/21566/3/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/21566/3/be/src/exec/parquet/hdfs-parquet-scanner.cc@1927
PS3, Line 1927:     if (!column_has_match) {
> Hi, Michael!  Relying on statistics is unreliable.  I think Zihao's suggest
If null_count is set, then we should assume that it is reliable - if not, then 
it is a writer error. Impala also "believes" min/max stats when doing row group 
or page filtering.


I am ok with keeping as it is, just please add a todo that null_count could be 
also checked.


http://gerrit.cloudera.org:8080/#/c/21566/4/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/21566/4/be/src/exec/parquet/hdfs-parquet-scanner.cc@1933
PS4, Line 1933:           && 
ExecNode::EvalConjuncts(dict_filter_conjunct_evals->data(),
              :               dict_filter_conjunct_evals->size(), &row))) {
Is this part needed? For "normal" (not runtime filter) conjuncts the planner 
doesn't allow dictionary filtering if NULL is accepted:
https://github.com/apache/impala/blob/00d0b0dda1e215d8e91ff52688fe6654bee52282/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java#L1108

so the query below does not use dictionary filtering:
select * from parq_tbl d
  where COALESCE(d.name, '') = ''

I am ok with keeping this condition for safety, just a comment could be added 
about this.



--
To view, visit http://gerrit.cloudera.org:8080/21566
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f69405c0c08feb47141d080a828847e5094163f
Gerrit-Change-Number: 21566
Gerrit-PatchSet: 4
Gerrit-Owner: ttttttz <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Wenzhe Zhou <[email protected]>
Gerrit-Reviewer: Zihao Ye <[email protected]>
Gerrit-Reviewer: ttttttz <[email protected]>
Gerrit-Comment-Date: Tue, 09 Jul 2024 09:58:41 +0000
Gerrit-HasComments: Yes

Reply via email to