Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )
Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans ...................................................................... Patch Set 25: Code-Review+1 (8 comments) I ran over the code one more time before I go on vacation. I had some comments, but the code looks great overall. Feel free to carry my +1 once those comments are resolved. http://gerrit.cloudera.org:8080/#/c/17706/25//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17706/25//COMMIT_MSG@42 PS25, Line 42: only for sorted or partitioned : join columns. I think it would make sense to at least turn this on at the file/row group/page. Probably the overhead of building the filter is negligible in this case. http://gerrit.cloudera.org:8080/#/c/17706/25/be/src/exec/nested-loop-join-builder.h File be/src/exec/nested-loop-join-builder.h: http://gerrit.cloudera.org:8080/#/c/17706/25/be/src/exec/nested-loop-join-builder.h@131 PS25, Line 131: /// This is replaced at runtime with code generated by CodegenInsertRuntimeFilters(). Is this true? PartitionedHashJoinBuilder has some mechanism for codegen, e.g.: https://github.com/apache/impala/blob/5a9dcd108d8a1c6f3ea0062d8de750b6e41fb635/be/src/exec/partitioned-hash-join-builder.cc#L1406 But I don't see that for NestedLoopJoinBuilder. Probably we should remove this, or add TODO with a new Jira ticket? But since we insert only one value from the scalar subquery it's probably not a big deal. http://gerrit.cloudera.org:8080/#/c/17706/25/be/src/exec/nested-loop-join-builder.h@149 PS25, Line 149: initFilterContexts nit: InitFilterContexts (Upper Camel Case) http://gerrit.cloudera.org:8080/#/c/17706/25/be/src/exec/nested-loop-join-builder.cc File be/src/exec/nested-loop-join-builder.cc: http://gerrit.cloudera.org:8080/#/c/17706/25/be/src/exec/nested-loop-join-builder.cc@191 PS25, Line 191: PublishRuntimeFilters( Don't we want to publish the runtime filters for separate builds as well? http://gerrit.cloudera.org:8080/#/c/17706/25/be/src/util/min-max-filter-ir.cc File be/src/util/min-max-filter-ir.cc: http://gerrit.cloudera.org:8080/#/c/17706/25/be/src/util/min-max-filter-ir.cc@130 PS25, Line 130: CopyToBuffer(&min_buffer_, &min_, min_.len); Do we always need to invoke CopyToBuffer() for the min_ value? http://gerrit.cloudera.org:8080/#/c/17706/25/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/17706/25/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@410 PS25, Line 410: miax max() http://gerrit.cloudera.org:8080/#/c/17706/25/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test File testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test: http://gerrit.cloudera.org:8080/#/c/17706/25/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test@409 PS25, Line 409: < Could you please also add a test with '='? http://gerrit.cloudera.org:8080/#/c/17706/25/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test@429 PS25, Line 429: # String data type is not supported as it is impossible to represent the min() and max() : # value for the data type. This seems to be supported now. -- To view, visit http://gerrit.cloudera.org:8080/17706 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446 Gerrit-Change-Number: 17706 Gerrit-PatchSet: 25 Gerrit-Owner: Qifan Chen <[email protected]> Gerrit-Reviewer: Amogh Margoor <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Wed, 11 Aug 2021 15:36:48 +0000 Gerrit-HasComments: Yes
