Arnab Karmakar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23566 )

Change subject: IMPALA-14065: Support WHERE clause in SHOW PARTITIONS statement
......................................................................


Patch Set 9:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/23566/8/fe/src/main/java/org/apache/impala/analysis/PartitionPredicateEvaluator.java
File 
fe/src/main/java/org/apache/impala/analysis/PartitionPredicateEvaluator.java:

http://gerrit.cloudera.org:8080/#/c/23566/8/fe/src/main/java/org/apache/impala/analysis/PartitionPredicateEvaluator.java@77
PS8, Line 77:
> This is a nice-to-have feature. If the original HdfsPartitionFilter doesn't
I believe you are saying that we dont need to add extra seeding here. We CANNOT 
perfectly replicate SELECT's behaviour like that because:
1. SELECT uses a pcg32 stateful random no generator in BE and the state 
advances as it processes through rows.

2. SHOW PARTITIONS cant maintain such state as it must call backend via JNI for 
each partition independently and Generator state is lost between JNI calls.

3. I tried not using any seed and per-partition evaluation is useless in that 
scenario, since every rand() evaluation gives the same result(0.47...). So a 
query like "SHOW PARTITIONS tbl WHERE rand() < 0.5" doesn't sample partitions 
randomly and returns all the partitions as rand() is always equal to 0.47 (We 
lose the state with independent JNI calls).

Best compromise: Use query-level random seed + partition index to simulate 
sequence advancement.


http://gerrit.cloudera.org:8080/#/c/23566/8/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java:

http://gerrit.cloudera.org:8080/#/c/23566/8/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java@132
PS8, Line 132: isPartitionPrunedFilterConjunct
> For 1 and 2, HdfsPartitionPruner is already used in non-SELECT queries like
Sorry, ignore my third point earlier. The reason Im against including 
evalAllFuncs because:

1. We shouldn't mess with HdfsPartitionPruner, with a new bool evalAllFuncs, we 
would need to change all those instances wherever prunePartitions() is called.

2. The other non-select queries you mentioned earlier don't support 
non-deterministic functions and they are able to prune the partitions easily 
within the analysis phase.

3. Instead of making prunePartitions() complicated we should use it as is. If 
we are left with any conjuncts(non-deterministic ones), we should handle it 
separately as done in PartitionPredicateEvaluator via JNI calls. This way we 
will have proper separation of concerns instead of having a complicated 
all-in-one class.



--
To view, visit http://gerrit.cloudera.org:8080/23566
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2e2a14aabcea3fb17083d4ad6f87b7861113f89e
Gerrit-Change-Number: 23566
Gerrit-PatchSet: 9
Gerrit-Owner: Arnab Karmakar <[email protected]>
Gerrit-Reviewer: Arnab Karmakar <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Pranav Lodha <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Surya Hebbar <[email protected]>
Gerrit-Comment-Date: Tue, 25 Nov 2025 05:06:06 +0000
Gerrit-HasComments: Yes

Reply via email to