Jason Fehr has posted comments on this change. ( http://gerrit.cloudera.org:8080/22033 )
Change subject: IMPALA-13505: Fix NPE in Calcite Planner ...................................................................... Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/22033/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/22033/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@4698 PS1, Line 4698: return; // Note, since this statement is within a lambda function, this return > I'd like to warn on these; I think they're issues in the Calcite parser we' The Calcite planner produces this join node: 05:HASH JOIN [INNER JOIN, BROADCAST] | hash predicates: substring(tpcds.customer_address.ca_zip, 1, 5) = EXPR$0 | fk/pk conjuncts: assumed fk/pk | runtime filters: RF002[bloom] <- EXPR$0 Since EXPR$0 is not a named column, it has a null resolved path and can be skipped. EXPR$0 refers to the list of zip codes use in the `WHERE SUBSTRING(ca_zip, 1, 5) IN` clause. This hash join node is produced by Calcite when the `IN` list grows beyond a certain point. http://gerrit.cloudera.org:8080/#/c/22033/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@4705 PS1, Line 4705: .getCanonicalPath() : .subList(0, 3) > Is it a good place to address my previous comment? I'd like to keep this patch focused on the Calcite planner NPE. I did incorporate this suggestion on another branch I have locally. http://gerrit.cloudera.org:8080/#/c/22033/1/tests/custom_cluster/test_workload_mgmt_sql_details.py File tests/custom_cluster/test_workload_mgmt_sql_details.py: http://gerrit.cloudera.org:8080/#/c/22033/1/tests/custom_cluster/test_workload_mgmt_sql_details.py@461 PS1, Line 461: res = client.execute("SELECT s_store_name, sum(ss_net_profit) FROM store_sales," > Can we refine this query at all and still hit the issue? Would help with de I was able to eliminate most of the zip codes from the IN list, but that was all I could eliminate and still hit the issue. http://gerrit.cloudera.org:8080/#/c/22033/1/tests/custom_cluster/test_workload_mgmt_sql_details.py@521 PS1, Line 521: > Should we verify that some columns were identified? The Calcite planner does not set any of the workload management related fields on the TExecRequest object it returns to the backend and thus all these columns are blank in the sys.impala_query_log table. Opened IMPALA-13519 to address this issue. -- To view, visit http://gerrit.cloudera.org:8080/22033 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4d282120e596fd39a569d1ce9b25024f4f174dd0 Gerrit-Change-Number: 22033 Gerrit-PatchSet: 2 Gerrit-Owner: Jason Fehr <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Jason Fehr <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Steve Carlin <[email protected]> Gerrit-Comment-Date: Wed, 06 Nov 2024 20:31:22 +0000 Gerrit-HasComments: Yes
