[Impala-ASF-CR] WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 ) Change subject: WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader .. Patch Set 2: (1 comment) Just a comment to Csaba's. http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@604 PS2, Line 604: buildStatsPredicate(analyzer, slotRef, binaryPred, binaryPred.getOp()); > Parquet has a somewhat hacky way of finding EQ predicates in the backend an I wonder if the complexity can be removed later on (for Parquet). For ORC, I like the idea of directly utilizing the EQUALS form of predicate, which should translate to better performance in ORC. -- To view, visit http://gerrit.cloudera.org:8080/17815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225 Gerrit-Change-Number: 17815 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 16 Sep 2021 19:26:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 ) Change subject: WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@604 PS2, Line 604: buildStatsPredicate(analyzer, slotRef, binaryPred, binaryPred.getOp()); Parquet has a somewhat hacky way of finding EQ predicates in the backend and using it in bloom filters: https://github.com/apache/impala/blob/master/be/src/exec/parquet/hdfs-parquet-scanner.cc#L1884 It would be great to use a common logic here - I prefer doing the logic in FE, but we did it in BE because we (Daniel Becker + me) were more familiar with BE. -- To view, visit http://gerrit.cloudera.org:8080/17815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225 Gerrit-Change-Number: 17815 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 13 Sep 2021 10:00:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 ) Change subject: WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc@1046 PS2, Line 1046: for (int i = 1; i < eval->root().children().size(); ++i) { : // ORC reader only supports pushing down predicates that constant parts are literal. : // We could get non-literal expr if expr rewrites are disabled. : if (!eval->root().GetChild(i)->IsLiteral()) return false; : in_list.emplace_back(GetLiteralSearchArguments( : eval, i, slot_desc->type(), &predicate_type)); : } > Yes, here is the rule: https://github.com/apache/impala/blob/beb8019f5300bb Okay. That fits my understanding of constant folding. Thanks for the URLs. So if we have tested the presence of literals in buildOrcInListStatsPredicate(), can we assume these literals will be saved in the plan and available in BE to build the In-list predicates with (i.e., to remove line 1049)? -- To view, visit http://gerrit.cloudera.org:8080/17815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225 Gerrit-Change-Number: 17815 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 31 Aug 2021 16:39:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 ) Change subject: WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc@1046 PS2, Line 1046: for (int i = 1; i < eval->root().children().size(); ++i) { : // ORC reader only supports pushing down predicates that constant parts are literal. : // We could get non-literal expr if expr rewrites are disabled. : if (!eval->root().GetChild(i)->IsLiteral()) return false; : in_list.emplace_back(GetLiteralSearchArguments( : eval, i, slot_desc->type(), &predicate_type)); : } > Does the constant-folding happen in FE? Yes, here is the rule: https://github.com/apache/impala/blob/beb8019f5300bb163424e7fdfec50b8e4b796e26/fe/src/main/java/org/apache/impala/rewrite/FoldConstantsRule.java#L41 Here is the entry point for expr rewrite: https://github.com/apache/impala/blob/beb8019f5300bb163424e7fdfec50b8e4b796e26/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java#L521 -- To view, visit http://gerrit.cloudera.org:8080/17815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225 Gerrit-Change-Number: 17815 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 31 Aug 2021 14:14:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 ) Change subject: WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc@1046 PS2, Line 1046: for (int i = 1; i < eval->root().children().size(); ++i) { : // ORC reader only supports pushing down predicates that constant parts are literal. : // We could get non-literal expr if expr rewrites are disabled. : if (!eval->root().GetChild(i)->IsLiteral()) return false; : in_list.emplace_back(GetLiteralSearchArguments( : eval, i, slot_desc->type(), &predicate_type)); : } > This loop is for generating 'in_list', the vector. The check Does the constant-folding happen in FE? http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@603 PS2, Line 603: EQ > Sorry, I mean EQUALS predicate. We have a check at line 595. I see. Yeah, push directly is nice. Done. -- To view, visit http://gerrit.cloudera.org:8080/17815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225 Gerrit-Change-Number: 17815 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 31 Aug 2021 13:06:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 ) Change subject: WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc@1046 PS2, Line 1046: for (int i = 1; i < eval->root().children().size(); ++i) { : // ORC reader only supports pushing down predicates that constant parts are literal. : // We could get non-literal expr if expr rewrites are disabled. : if (!eval->root().GetChild(i)->IsLiteral()) return false; : in_list.emplace_back(GetLiteralSearchArguments( : eval, i, slot_desc->type(), &predicate_type)); : } > Since we have checked in FE on literals already, looks this loop can be rem This loop is for generating 'in_list', the vector. The check inside it is also needed since we could get non-literal expr if expr rewrites are disabled (thus constant-folding is disabled). http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@603 PS2, Line 603: EQ > nit. you mean binary? Sorry, I mean EQUALS predicate. We have a check at line 595. -- To view, visit http://gerrit.cloudera.org:8080/17815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225 Gerrit-Change-Number: 17815 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 31 Aug 2021 09:15:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 ) Change subject: WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader .. Patch Set 2: (4 comments) Looks good! http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/17815/2/be/src/exec/hdfs-orc-scanner.cc@1046 PS2, Line 1046: for (int i = 1; i < eval->root().children().size(); ++i) { : // ORC reader only supports pushing down predicates that constant parts are literal. : // We could get non-literal expr if expr rewrites are disabled. : if (!eval->root().GetChild(i)->IsLiteral()) return false; : in_list.emplace_back(GetLiteralSearchArguments( : eval, i, slot_desc->type(), &predicate_type)); : } Since we have checked in FE on literals already, looks this loop can be removed? http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@551 PS2, Line 551: 1 nit. May need to assure that child0 is a reference to a column. http://gerrit.cloudera.org:8080/#/c/17815/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@603 PS2, Line 603: EQ nit. you mean binary? http://gerrit.cloudera.org:8080/#/c/17815/2/testdata/workloads/functional-query/queries/QueryTest/orc-stats.test File testdata/workloads/functional-query/queries/QueryTest/orc-stats.test: http://gerrit.cloudera.org:8080/#/c/17815/2/testdata/workloads/functional-query/queries/QueryTest/orc-stats.test@364 PS2, Line 364: 0, 7299 In lists on other data types should be added. -- To view, visit http://gerrit.cloudera.org:8080/17815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225 Gerrit-Change-Number: 17815 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Comment-Date: Mon, 30 Aug 2021 20:23:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 ) Change subject: WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225 Gerrit-Change-Number: 17815 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Sun, 29 Aug 2021 19:38:48 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 ) Change subject: WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7435/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/17815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225 Gerrit-Change-Number: 17815 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Sun, 29 Aug 2021 13:20:53 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 ) Change subject: WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/9396/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225 Gerrit-Change-Number: 17815 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Sun, 29 Aug 2021 12:54:08 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17815 to look at the new patch set (#2). Change subject: WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader .. WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader This patch pushs down more kinds of predicates into the ORC reader, including EQUALS and IN-list predicates which can leverage the bloom filters in the ORC files. TODO: Push down IS NULL predicate Tests * Add test in orc-stats.test Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225 --- M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/hdfs-orc-scanner.h M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M testdata/workloads/functional-query/queries/QueryTest/orc-stats.test 4 files changed, 91 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/17815/2 -- To view, visit http://gerrit.cloudera.org:8080/17815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225 Gerrit-Change-Number: 17815 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17815 ) Change subject: WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader .. Patch Set 1: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/9395/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/17815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225 Gerrit-Change-Number: 17815 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Sun, 29 Aug 2021 12:25:39 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader
Quanlong Huang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17815 Change subject: WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader .. WIP IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader This patch pushs down more kinds of predicates into the ORC reader, including EQUALS and IN-list predicates which can leverage the bloom filters in the ORC files. TODO: Push down IS NULL predicate Tests * Add test in orc-stats.test Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225 --- M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/hdfs-orc-scanner.h M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M testdata/workloads/functional-query/queries/QueryTest/orc-stats.test 4 files changed, 90 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/17815/1 -- To view, visit http://gerrit.cloudera.org:8080/17815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225 Gerrit-Change-Number: 17815 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang