Jason Fehr has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22033 )

Change subject: IMPALA-13505: Fix NPE in Calcite Planner
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/22033/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/22033/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@4698
PS1, Line 4698:             return; // Note, since this statement is within a 
lambda function, this return
> I'd like to warn on these; I think they're issues in the Calcite parser we'
The Calcite planner produces this join node:
05:HASH JOIN [INNER JOIN, BROADCAST]
|  hash predicates: substring(tpcds.customer_address.ca_zip, 1, 5) = EXPR$0
|  fk/pk conjuncts: assumed fk/pk
|  runtime filters: RF002[bloom] <- EXPR$0

Since EXPR$0 is not a named column, it has a null resolved path and can be 
skipped.

EXPR$0 refers to the list of zip codes use in the `WHERE SUBSTRING(ca_zip, 1, 
5) IN` clause.  This hash join node is produced by Calcite when the `IN` list 
grows beyond a certain point.


http://gerrit.cloudera.org:8080/#/c/22033/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@4705
PS1, Line 4705: .getCanonicalPath()
              :               .subList(0, 3)
> Is it a good place to address my previous comment?
I'd like to keep this patch focused on the Calcite planner NPE.  I did 
incorporate this suggestion on another branch I have locally.


http://gerrit.cloudera.org:8080/#/c/22033/1/tests/custom_cluster/test_workload_mgmt_sql_details.py
File tests/custom_cluster/test_workload_mgmt_sql_details.py:

http://gerrit.cloudera.org:8080/#/c/22033/1/tests/custom_cluster/test_workload_mgmt_sql_details.py@461
PS1, Line 461:     res = client.execute("SELECT s_store_name, 
sum(ss_net_profit) FROM store_sales,"
> Can we refine this query at all and still hit the issue? Would help with de
I was able to eliminate most of the zip codes from the IN list, but that was 
all I could eliminate and still hit the issue.


http://gerrit.cloudera.org:8080/#/c/22033/1/tests/custom_cluster/test_workload_mgmt_sql_details.py@521
PS1, Line 521:
> Should we verify that some columns were identified?
The Calcite planner does not set any of the workload management related fields 
on the TExecRequest object it returns to the backend and thus all these columns 
are blank in the sys.impala_query_log table.

Opened IMPALA-13519 to address this issue.



--
To view, visit http://gerrit.cloudera.org:8080/22033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4d282120e596fd39a569d1ce9b25024f4f174dd0
Gerrit-Change-Number: 22033
Gerrit-PatchSet: 2
Gerrit-Owner: Jason Fehr <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Jason Fehr <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Steve Carlin <[email protected]>
Gerrit-Comment-Date: Wed, 06 Nov 2024 20:31:22 +0000
Gerrit-HasComments: Yes

Reply via email to