Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/24001 )
Change subject: IMPALA-14737: Push down LIKE predicates to Iceberg ...................................................................... Patch Set 6: (3 comments) http://gerrit.cloudera.org:8080/#/c/24001/6/fe/src/main/java/org/apache/impala/common/IcebergPredicateConverter.java File fe/src/main/java/org/apache/impala/common/IcebergPredicateConverter.java: http://gerrit.cloudera.org:8080/#/c/24001/6/fe/src/main/java/org/apache/impala/common/IcebergPredicateConverter.java@169 PS6, Line 169: // Check if this is a wildcard : if (c == '%' || c == '_') { : // Count preceding backslashes to see if it's escaped : int backslashCount = 0; : int j = i - 1; : while (j >= 0 && pattern.charAt(j) == '\\') { : backslashCount++; : j--; : } : : // If odd number of backslashes, wildcard is escaped = literal content : if (backslashCount % 2 == 1) { : return true; : } This could be moved to a helper method as it is also used by findFirstUnescapedWildcard(). http://gerrit.cloudera.org:8080/#/c/24001/4/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test File testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test: http://gerrit.cloudera.org:8080/#/c/24001/4/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test@122 PS4, Line 122: select * from iceberg_partitioned where action like "d%" and event_time < "2022-01-01" and id < 10 > The expectation is correct that action LIKE 'd%d' shouldn't be pushed down In the end, I think we want both: * push down LIKE 'd%' to Iceberg * keep LIKE 'd%d' in predicates This way, Iceberg can prune partitions and files for us. Then the executors only need to evaluate 'd%d' on the surviving data files. But I'm fine with splitting this ticket into two commits: 1: handle LIKE 'prefix%' / 'prefix_' cases (it is basically implemented by the current patch set) 2: handle the LIKE 'prefix%suffix' cases which pushes down LIKE 'prefix' to Iceberg, but still evaluates LIKE 'prefix%suffix' on the surviving rows. The 2. needs modifications here: https://github.com/apache/impala/blob/0f53e31363dddad918c5f5cf103697b4624d9ede/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java#L1006-1018 The converter will need to signal that the original predicate was altered. Probably by introducing a ConverterResult class. http://gerrit.cloudera.org:8080/#/c/24001/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-like-pushdown.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-like-pushdown.test: http://gerrit.cloudera.org:8080/#/c/24001/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-like-pushdown.test@330 PS6, Line 330: (should only match 'download' which has 'd' at both ends) We might need a new table for this, because iceberg_partitioned only has 'download' that starts with 'd'. -- To view, visit http://gerrit.cloudera.org:8080/24001 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I548834126540bcc8d22efc872c2571293b8b7ec4 Gerrit-Change-Number: 24001 Gerrit-PatchSet: 6 Gerrit-Owner: Arnab Karmakar <[email protected]> Gerrit-Reviewer: Arnab Karmakar <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Tue, 24 Feb 2026 11:05:55 +0000 Gerrit-HasComments: Yes
