Arnab Karmakar has posted comments on this change. ( http://gerrit.cloudera.org:8080/23566 )
Change subject: IMPALA-14065: Support WHERE clause in SHOW PARTITIONS statement ...................................................................... Patch Set 9: (2 comments) http://gerrit.cloudera.org:8080/#/c/23566/8/fe/src/main/java/org/apache/impala/analysis/PartitionPredicateEvaluator.java File fe/src/main/java/org/apache/impala/analysis/PartitionPredicateEvaluator.java: http://gerrit.cloudera.org:8080/#/c/23566/8/fe/src/main/java/org/apache/impala/analysis/PartitionPredicateEvaluator.java@77 PS8, Line 77: > This is a nice-to-have feature. If the original HdfsPartitionFilter doesn't I believe you are saying that we dont need to add extra seeding here. We CANNOT perfectly replicate SELECT's behaviour like that because: 1. SELECT uses a pcg32 stateful random no generator in BE and the state advances as it processes through rows. 2. SHOW PARTITIONS cant maintain such state as it must call backend via JNI for each partition independently and Generator state is lost between JNI calls. 3. I tried not using any seed and per-partition evaluation is useless in that scenario, since every rand() evaluation gives the same result(0.47...). So a query like "SHOW PARTITIONS tbl WHERE rand() < 0.5" doesn't sample partitions randomly and returns all the partitions as rand() is always equal to 0.47 (We lose the state with independent JNI calls). Best compromise: Use query-level random seed + partition index to simulate sequence advancement. http://gerrit.cloudera.org:8080/#/c/23566/8/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java: http://gerrit.cloudera.org:8080/#/c/23566/8/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java@132 PS8, Line 132: isPartitionPrunedFilterConjunct > For 1 and 2, HdfsPartitionPruner is already used in non-SELECT queries like Sorry, ignore my third point earlier. The reason Im against including evalAllFuncs because: 1. We shouldn't mess with HdfsPartitionPruner, with a new bool evalAllFuncs, we would need to change all those instances wherever prunePartitions() is called. 2. The other non-select queries you mentioned earlier don't support non-deterministic functions and they are able to prune the partitions easily within the analysis phase. 3. Instead of making prunePartitions() complicated we should use it as is. If we are left with any conjuncts(non-deterministic ones), we should handle it separately as done in PartitionPredicateEvaluator via JNI calls. This way we will have proper separation of concerns instead of having a complicated all-in-one class. -- To view, visit http://gerrit.cloudera.org:8080/23566 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2e2a14aabcea3fb17083d4ad6f87b7861113f89e Gerrit-Change-Number: 23566 Gerrit-PatchSet: 9 Gerrit-Owner: Arnab Karmakar <[email protected]> Gerrit-Reviewer: Arnab Karmakar <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Peter Rozsa <[email protected]> Gerrit-Reviewer: Pranav Lodha <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Surya Hebbar <[email protected]> Gerrit-Comment-Date: Tue, 25 Nov 2025 05:06:06 +0000 Gerrit-HasComments: Yes
