[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-25 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 26: Code-Review+2

Merging this.


--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 26
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Thu, 25 Apr 2024 20:08:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-25 Thread Joe McDonnell (Code Review)
Joe McDonnell has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..

IMPALA-12872: Use Calcite for optimization - part 1: simple queries

This is the first commit to use the Calcite library to parse,
analyze, and optimize queries.

The hook for the planner is through an override of the JniFrontend. The
CalciteJniFrontend class is the driver that walks through each of the
Calcite steps which are as follows:

CalciteQueryParser: Takes the string query and outputs an AST in the
form of Calcite's SqlNode object.

CalciteMetadataHandler: Iterate through the SqlNode from the previous step
and make sure all essential table metadata is retrieved from catalogd.

CalciteValidator: Validate the SqlNode tree, akin to the Impala Analyzer.

CalciteRelNodeConverter: Change the AST into a logical plan. In this first
commit, the only logical nodes used are LogicalTableScan and LogicalProject.
The LogicalTableScan will serve as the node that reads from an Hdfs Table and
the LogicalProject will only project out the used columns in the query. In
later versions, the LogicalProject will also handle function changes.

CalciteOptimizer: This step is to optimize the query. In this cut, it will be
a nop, but in later versions, it will perform logical optimizations via
Calcite's rule mechanism.

CalcitePhysPlanCreator: Converts the Calcite RelNode logical tree into
Impala's PlanNode physical tree

ExecRequestCreator: Implement the existing Impala steps that turn a Single
Node Plan into a Distributed Plan. It will also create the TExecRequest object
needed by the runtime server.

Only some very basic queries will work with this commit. These include:
select * from tbl <-- only needs the LogicalTableScan
select c1 from tbl <-- Also uses the LogicalProject

In the CalciteJniFrontend, there is some basic checks to make sure only
select statements will get processed. Any non-query statement will revert
back to the current Impala planner.

In this iteration, any queries besides the minimal ones listed above will
result in a caught exception which will then be run through the current
Impala planner. The tests that do work can be found in calcite.test and
run through the custom cluster test test_experimental_planner.py

This iteration should support all types with the exception of complex
types. Calcite does not have a STRING type, so the string type is
represented as VARCHAR(MAXINT) similar to how Hive represents their
STRING type.

The ImpalaTypeConverter file is used to convert the Impala Type object
to corresponding Calcite objects.

Authorization is not yet working with this current commit. A Jira has been
filed (IMPALA-13011) to deal with this.

Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Reviewed-on: http://gerrit.cloudera.org:8080/21109
Tested-by: Impala Public Jenkins 
Reviewed-by: Joe McDonnell 
---
M bin/set-classpath.sh
M bin/start-impala-cluster.py
M fe/src/main/java/org/apache/impala/analysis/TableName.java
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
A java/calcite-planner/pom.xml
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ConvertToImpalaRelRules.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaPlanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaProjectRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/NodeWithExprs.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ParentPlanRelContext.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/phys/ImpalaHdfsScanNode.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteDb.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteMetadataHandler.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteQueryParser.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteRelNodeConverter.java
A 

[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-24 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 26: Code-Review+1

I think this review is ready to go as an initial patch.


--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 26
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Wed, 24 Apr 2024 16:52:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 26: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 26
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Wed, 24 Apr 2024 05:55:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 26:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10577/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 26
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Wed, 24 Apr 2024 00:52:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 26: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10576/


--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 26
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Wed, 24 Apr 2024 00:13:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 26:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15999/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 26
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Tue, 23 Apr 2024 19:42:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-23 Thread Steve Carlin (Code Review)
Hello Aman Sinha, Quanlong Huang, Joe McDonnell, Csaba Ringhofer, Michael 
Smith, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21109

to look at the new patch set (#26).

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..

IMPALA-12872: Use Calcite for optimization - part 1: simple queries

This is the first commit to use the Calcite library to parse,
analyze, and optimize queries.

The hook for the planner is through an override of the JniFrontend. The
CalciteJniFrontend class is the driver that walks through each of the
Calcite steps which are as follows:

CalciteQueryParser: Takes the string query and outputs an AST in the
form of Calcite's SqlNode object.

CalciteMetadataHandler: Iterate through the SqlNode from the previous step
and make sure all essential table metadata is retrieved from catalogd.

CalciteValidator: Validate the SqlNode tree, akin to the Impala Analyzer.

CalciteRelNodeConverter: Change the AST into a logical plan. In this first
commit, the only logical nodes used are LogicalTableScan and LogicalProject.
The LogicalTableScan will serve as the node that reads from an Hdfs Table and
the LogicalProject will only project out the used columns in the query. In
later versions, the LogicalProject will also handle function changes.

CalciteOptimizer: This step is to optimize the query. In this cut, it will be
a nop, but in later versions, it will perform logical optimizations via
Calcite's rule mechanism.

CalcitePhysPlanCreator: Converts the Calcite RelNode logical tree into
Impala's PlanNode physical tree

ExecRequestCreator: Implement the existing Impala steps that turn a Single
Node Plan into a Distributed Plan. It will also create the TExecRequest object
needed by the runtime server.

Only some very basic queries will work with this commit. These include:
select * from tbl <-- only needs the LogicalTableScan
select c1 from tbl <-- Also uses the LogicalProject

In the CalciteJniFrontend, there is some basic checks to make sure only
select statements will get processed. Any non-query statement will revert
back to the current Impala planner.

In this iteration, any queries besides the minimal ones listed above will
result in a caught exception which will then be run through the current
Impala planner. The tests that do work can be found in calcite.test and
run through the custom cluster test test_experimental_planner.py

This iteration should support all types with the exception of complex
types. Calcite does not have a STRING type, so the string type is
represented as VARCHAR(MAXINT) similar to how Hive represents their
STRING type.

The ImpalaTypeConverter file is used to convert the Impala Type object
to corresponding Calcite objects.

Authorization is not yet working with this current commit. A Jira has been
filed (IMPALA-13011) to deal with this.

Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
---
M bin/set-classpath.sh
M bin/start-impala-cluster.py
M fe/src/main/java/org/apache/impala/analysis/TableName.java
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
A java/calcite-planner/pom.xml
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ConvertToImpalaRelRules.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaPlanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaProjectRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/NodeWithExprs.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ParentPlanRelContext.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/phys/ImpalaHdfsScanNode.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteDb.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteMetadataHandler.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteQueryParser.java
A 

[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 26:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21109/26/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java:

http://gerrit.cloudera.org:8080/#/c/21109/26/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java@26
PS26, Line 26:  * 
https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/validate/SqlConformance.html
line too long (98 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 26
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Tue, 23 Apr 2024 19:19:17 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 26:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10576/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 26
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Tue, 23 Apr 2024 19:18:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-23 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 25:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21109/25/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/21109/25/fe/src/main/java/org/apache/impala/service/Frontend.java@2144
PS25, Line 2144: addPlannerToProfile(PLANNER);
When I comment this out, the custom_cluster/test_query_log.py and 
custom_cluster/test_query_live.py tests pass. I'll dig a bit, but my guess is 
that the new line in the profile interacts with the query history table.



--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 25
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Tue, 23 Apr 2024 17:54:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 25: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10572/


--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 25
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Tue, 23 Apr 2024 04:20:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 25:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15989/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 25
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Mon, 22 Apr 2024 23:33:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 25:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10572/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 25
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Mon, 22 Apr 2024 23:14:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-22 Thread Steve Carlin (Code Review)
Hello Aman Sinha, Quanlong Huang, Joe McDonnell, Csaba Ringhofer, Michael 
Smith, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21109

to look at the new patch set (#25).

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..

IMPALA-12872: Use Calcite for optimization - part 1: simple queries

This is the first commit to use the Calcite library to parse,
analyze, and optimize queries.

The hook for the planner is through an override of the JniFrontend. The
CalciteJniFrontend class is the driver that walks through each of the
Calcite steps which are as follows:

CalciteQueryParser: Takes the string query and outputs an AST in the
form of Calcite's SqlNode object.

CalciteMetadataHandler: Iterate through the SqlNode from the previous step
and make sure all essential table metadata is retrieved from catalogd.

CalciteValidator: Validate the SqlNode tree, akin to the Impala Analyzer.

CalciteRelNodeConverter: Change the AST into a logical plan. In this first
commit, the only logical nodes used are LogicalTableScan and LogicalProject.
The LogicalTableScan will serve as the node that reads from an Hdfs Table and
the LogicalProject will only project out the used columns in the query. In
later versions, the LogicalProject will also handle function changes.

CalciteOptimizer: This step is to optimize the query. In this cut, it will be
a nop, but in later versions, it will perform logical optimizations via
Calcite's rule mechanism.

CalcitePhysPlanCreator: Converts the Calcite RelNode logical tree into
Impala's PlanNode physical tree

ExecRequestCreator: Implement the existing Impala steps that turn a Single
Node Plan into a Distributed Plan. It will also create the TExecRequest object
needed by the runtime server.

Only some very basic queries will work with this commit. These include:
select * from tbl <-- only needs the LogicalTableScan
select c1 from tbl <-- Also uses the LogicalProject

In the CalciteJniFrontend, there is some basic checks to make sure only
select statements will get processed. Any non-query statement will revert
back to the current Impala planner.

In this iteration, any queries besides the minimal ones listed above will
result in a caught exception which will then be run through the current
Impala planner. The tests that do work can be found in calcite.test and
run through the custom cluster test test_experimental_planner.py

This iteration should support all types with the exception of complex
types. Calcite does not have a STRING type, so the string type is
represented as VARCHAR(MAXINT) similar to how Hive represents their
STRING type.

The ImpalaTypeConverter file is used to convert the Impala Type object
to corresponding Calcite objects.

Authorization is not yet working with this current commit. A Jira has been
filed (IMPALA-13011) to deal with this.

Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
---
M bin/set-classpath.sh
M bin/start-impala-cluster.py
M fe/src/main/java/org/apache/impala/analysis/TableName.java
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
A java/calcite-planner/pom.xml
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ConvertToImpalaRelRules.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaPlanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaProjectRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/NodeWithExprs.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ParentPlanRelContext.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/phys/ImpalaHdfsScanNode.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteDb.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteMetadataHandler.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteQueryParser.java
A 

[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 25:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21109/25/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java:

http://gerrit.cloudera.org:8080/#/c/21109/25/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java@26
PS25, Line 26:  * 
https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/validate/SqlConformance.html
line too long (98 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 25
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Mon, 22 Apr 2024 23:09:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 24: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10567/


--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 24
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Sat, 20 Apr 2024 09:08:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 24:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10567/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 24
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Sat, 20 Apr 2024 04:00:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 24:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15944/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 24
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Thu, 18 Apr 2024 15:11:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-18 Thread Steve Carlin (Code Review)
Steve Carlin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 23:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/21109/21/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java:

http://gerrit.cloudera.org:8080/#/c/21109/21/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java@50
PS21, Line 50: // TODO: IMPALA-13011: Awkward call for authorization here. 
Authorization
 : // will be done at validation time, but this is needed here 
for
> Can you mention in the commit message that authorization is missing at this
Done


http://gerrit.cloudera.org:8080/#/c/21109/20/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java:

http://gerrit.cloudera.org:8080/#/c/21109/20/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java@35
PS20, Line 35: ImpalaTypeSystemImpl
> Yeah, it is perfectly fine to just add a class comment and mention that thi
Ok, added a class comment.


http://gerrit.cloudera.org:8080/#/c/21109/23/testdata/workloads/functional-query/queries/QueryTest/calcite.test
File testdata/workloads/functional-query/queries/QueryTest/calcite.test:

http://gerrit.cloudera.org:8080/#/c/21109/23/testdata/workloads/functional-query/queries/QueryTest/calcite.test@113
PS23, Line 113: xedzt
> hmm, why are these different than https://github.com/apache/impala/blob/541
Yeah, prolly best to take this out. The test in binary-type does a casting 
function which isn't supported in this commit (but coming soon).



--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 23
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Thu, 18 Apr 2024 14:48:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 24:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21109/24/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java:

http://gerrit.cloudera.org:8080/#/c/21109/24/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java@26
PS24, Line 26:  * 
https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/validate/SqlConformance.html
line too long (98 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 24
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Thu, 18 Apr 2024 14:48:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-18 Thread Steve Carlin (Code Review)
Hello Aman Sinha, Quanlong Huang, Joe McDonnell, Csaba Ringhofer, Michael 
Smith, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21109

to look at the new patch set (#24).

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..

IMPALA-12872: Use Calcite for optimization - part 1: simple queries

This is the first commit to use the Calcite library to parse,
analyze, and optimize queries.

The hook for the planner is through an override of the JniFrontend. The
CalciteJniFrontend class is the driver that walks through each of the
Calcite steps which are as follows:

CalciteQueryParser: Takes the string query and outputs an AST in the
form of Calcite's SqlNode object.

CalciteMetadataHandler: Iterate through the SqlNode from the previous step
and make sure all essential table metadata is retrieved from catalogd.

CalciteValidator: Validate the SqlNode tree, akin to the Impala Analyzer.

CalciteRelNodeConverter: Change the AST into a logical plan. In this first
commit, the only logical nodes used are LogicalTableScan and LogicalProject.
The LogicalTableScan will serve as the node that reads from an Hdfs Table and
the LogicalProject will only project out the used columns in the query. In
later versions, the LogicalProject will also handle function changes.

CalciteOptimizer: This step is to optimize the query. In this cut, it will be
a nop, but in later versions, it will perform logical optimizations via
Calcite's rule mechanism.

CalcitePhysPlanCreator: Converts the Calcite RelNode logical tree into
Impala's PlanNode physical tree

ExecRequestCreator: Implement the existing Impala steps that turn a Single
Node Plan into a Distributed Plan. It will also create the TExecRequest object
needed by the runtime server.

Only some very basic queries will work with this commit. These include:
select * from tbl <-- only needs the LogicalTableScan
select c1 from tbl <-- Also uses the LogicalProject

In the CalciteJniFrontend, there is some basic checks to make sure only
select statements will get processed. Any non-query statement will revert
back to the current Impala planner.

In this iteration, any queries besides the minimal ones listed above will
result in a caught exception which will then be run through the current
Impala planner. The tests that do work can be found in calcite.test and
run through the custom cluster test test_experimental_planner.py

This iteration should support all types with the exception of complex
types. Calcite does not have a STRING type, so the string type is
represented as VARCHAR(MAXINT) similar to how Hive represents their
STRING type.

The ImpalaTypeConverter file is used to convert the Impala Type object
to corresponding Calcite objects.

Authorization is not yet working with this current commit. A Jira has been
filed (IMPALA-13011) to deal with this.

Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
---
M bin/set-classpath.sh
M bin/start-impala-cluster.py
M fe/src/main/java/org/apache/impala/analysis/TableName.java
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
A java/calcite-planner/pom.xml
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ConvertToImpalaRelRules.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaPlanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaProjectRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/NodeWithExprs.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ParentPlanRelContext.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/phys/ImpalaHdfsScanNode.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteDb.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteMetadataHandler.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteQueryParser.java
A 

[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-18 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 23:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/21109/21/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java:

http://gerrit.cloudera.org:8080/#/c/21109/21/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java@50
PS21, Line 50: // TODO: IMPALA-13011: Awkward call for authorization here. 
Authorization
 : // will be done at validation time, but this is needed here 
for
> Yeah, authorization will happen earlier.  It's not implemented yet.  This p
Can you mention in the commit message that authorization is missing at this 
point?


http://gerrit.cloudera.org:8080/#/c/21109/20/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java:

http://gerrit.cloudera.org:8080/#/c/21109/20/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java@35
PS20, Line 35: ImpalaTypeSystemImpl
> Sigh, you caught me on something I haven't researched that much...
Yeah, it is perfectly fine to just add a class comment and mention that this 
may change in the future. It doesn't seem useful to put more effort into it 
while expressions/more complex queries are not supported. If there is some Hive 
code that acted as the inspiration, than a link to it would be nice.


http://gerrit.cloudera.org:8080/#/c/21109/23/testdata/workloads/functional-query/queries/QueryTest/calcite.test
File testdata/workloads/functional-query/queries/QueryTest/calcite.test:

http://gerrit.cloudera.org:8080/#/c/21109/23/testdata/workloads/functional-query/queries/QueryTest/calcite.test@113
PS23, Line 113: xedzt
hmm, why are these different than 
https://github.com/apache/impala/blob/541fc5ee9ec2d804f2ba45feb2df5bb96a013f86/testdata/workloads/functional-query/queries/QueryTest/binary-type.test#L12
 ?
I quickly tested it and it doesn't seem to pass with this escaped string.
Note that I wouldn't mind using only the ascii lines in the test - the goal is 
to test the planner, not the executor + client.



--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 23
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Thu, 18 Apr 2024 06:55:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 23:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15933/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 23
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Wed, 17 Apr 2024 19:58:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-17 Thread Steve Carlin (Code Review)
Steve Carlin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 20:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/21109/21/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java:

http://gerrit.cloudera.org:8080/#/c/21109/21/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@99
PS21, Line 99:   CalciteMetadataHandler mdHandler =
> optional: other steps separate the constructor and doing the actual work -
This is a good idea and I started to implement it...and then  figured out why I 
did it this way.

I wanted to keep all member variables final.  But there wasn't a good way to 
put any of the calls into a loadTables() call and still keep all member 
variables final.

If you think there's a better way to restructure this, let me know.


http://gerrit.cloudera.org:8080/#/c/21109/21/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java:

http://gerrit.cloudera.org:8080/#/c/21109/21/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java@50
PS21, Line 50: AuthorizationFactory authzFactory =
 : 
AuthorizationUtil.authzFactoryFrom(BackendConfig.INSTANCE);
> Does authorization take place in this step? Is it expected to work?
Yeah, authorization will happen earlier.  It's not implemented yet.  This 
probably will be done when we grab the table names in the MetadataLoader class, 
so as you said, at validation time.

 This is just a placeholder in order to instantiate the Analyzer class, which, 
unfortunately, is still needed for now.  This will need to be refactored.  
Filed IMPALA-13011 for this


http://gerrit.cloudera.org:8080/#/c/21109/20/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java:

http://gerrit.cloudera.org:8080/#/c/21109/20/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java@35
PS20, Line 35: ImpalaTypeSystemImpl
> I don't mean to solve all type system related questions in this patch, but
Sigh, you caught me on something I haven't researched that much...

I have much of the code working (with tpcds and tpch) and the values worked 
as/is.

I grabbed this code originally from Hive and matched their values, figuring 
that was the best thing to do.

Looking at the code you provided:  All upcasts are going to be done external to 
Calcite and based on Impala rules. So it might not even be necessary to have 
this method for float and double

But I'm pretty sure we're gonna need it though for Decimal types since Max 
Scale and Max Precision are gonna need to be baked into the validation step 
(not verified, but this makes sense to me)

So now I'm in an awkward place.  I tried to match Hive. I don't necessarily 
have the right definition.  But I'd hate to leave it blank.  I suppose I could 
throw an exception if it's called for double or float, but that doesn't seem 
right either.

I also suppose I could just put in this explanation as I'm telling you now?  
And re-explore later (and file a Jira)?


http://gerrit.cloudera.org:8080/#/c/21109/20/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java@48
PS20, Line 48:   private static final int DEFAULT_FLOAT_PRECISION= 7;
> This is still not clear to me in the float/double case. They have fixed byt
Addressed in comment above


http://gerrit.cloudera.org:8080/#/c/21109/20/testdata/workloads/functional-query/queries/QueryTest/calcite.test
File testdata/workloads/functional-query/queries/QueryTest/calcite.test:

http://gerrit.cloudera.org:8080/#/c/21109/20/testdata/workloads/functional-query/queries/QueryTest/calcite.test@47
PS20, Line 47: decimal_tbl
> This file was not updated in the last patch.
Whoops, missed this (only added files under the java directory).  Added now.



--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 20
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Wed, 17 Apr 

[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-17 Thread Steve Carlin (Code Review)
Hello Aman Sinha, Quanlong Huang, Joe McDonnell, Csaba Ringhofer, Michael 
Smith, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21109

to look at the new patch set (#23).

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..

IMPALA-12872: Use Calcite for optimization - part 1: simple queries

This is the first commit to use the Calcite library to parse,
analyze, and optimize queries.

The hook for the planner is through an override of the JniFrontend. The
CalciteJniFrontend class is the driver that walks through each of the
Calcite steps which are as follows:

CalciteQueryParser: Takes the string query and outputs an AST in the
form of Calcite's SqlNode object.

CalciteMetadataHandler: Iterate through the SqlNode from the previous step
and make sure all essential table metadata is retrieved from catalogd.

CalciteValidator: Validate the SqlNode tree, akin to the Impala Analyzer.

CalciteRelNodeConverter: Change the AST into a logical plan. In this first
commit, the only logical nodes used are LogicalTableScan and LogicalProject.
The LogicalTableScan will serve as the node that reads from an Hdfs Table and
the LogicalProject will only project out the used columns in the query. In
later versions, the LogicalProject will also handle function changes.

CalciteOptimizer: This step is to optimize the query. In this cut, it will be
a nop, but in later versions, it will perform logical optimizations via
Calcite's rule mechanism.

CalcitePhysPlanCreator: Converts the Calcite RelNode logical tree into
Impala's PlanNode physical tree

ExecRequestCreator: Implement the existing Impala steps that turn a Single
Node Plan into a Distributed Plan. It will also create the TExecRequest object
needed by the runtime server.

Only some very basic queries will work with this commit. These include:
select * from tbl <-- only needs the LogicalTableScan
select c1 from tbl <-- Also uses the LogicalProject

In the CalciteJniFrontend, there is some basic checks to make sure only
select statements will get processed. Any non-query statement will revert
back to the current Impala planner.

In this iteration, any queries besides the minimal ones listed above will
result in a caught exception which will then be run through the current
Impala planner. The tests that do work can be found in calcite.test and
run through the custom cluster test test_experimental_planner.py

This iteration should support all types with the exception of complex
types. Calcite does not have a STRING type, so the string type is
represented as VARCHAR(MAXINT) similar to how Hive represents their
STRING type.

The ImpalaTypeConverter file is used to convert the Impala Type object
to corresponding Calcite objects.

Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
---
M bin/set-classpath.sh
M bin/start-impala-cluster.py
M fe/src/main/java/org/apache/impala/analysis/TableName.java
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
A java/calcite-planner/pom.xml
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ConvertToImpalaRelRules.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaPlanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaProjectRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/NodeWithExprs.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ParentPlanRelContext.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/phys/ImpalaHdfsScanNode.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteDb.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteMetadataHandler.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteQueryParser.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteRelNodeConverter.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteValidator.java
A 

[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 23:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21109/23/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java:

http://gerrit.cloudera.org:8080/#/c/21109/23/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java@26
PS23, Line 26:  * 
https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/validate/SqlConformance.html
line too long (98 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 23
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Wed, 17 Apr 2024 19:36:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 22:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15931/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 22
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Wed, 17 Apr 2024 18:48:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-17 Thread Steve Carlin (Code Review)
Hello Aman Sinha, Quanlong Huang, Joe McDonnell, Csaba Ringhofer, Michael 
Smith, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21109

to look at the new patch set (#22).

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..

IMPALA-12872: Use Calcite for optimization - part 1: simple queries

This is the first commit to use the Calcite library to parse,
analyze, and optimize queries.

The hook for the planner is through an override of the JniFrontend. The
CalciteJniFrontend class is the driver that walks through each of the
Calcite steps which are as follows:

CalciteQueryParser: Takes the string query and outputs an AST in the
form of Calcite's SqlNode object.

CalciteMetadataHandler: Iterate through the SqlNode from the previous step
and make sure all essential table metadata is retrieved from catalogd.

CalciteValidator: Validate the SqlNode tree, akin to the Impala Analyzer.

CalciteRelNodeConverter: Change the AST into a logical plan. In this first
commit, the only logical nodes used are LogicalTableScan and LogicalProject.
The LogicalTableScan will serve as the node that reads from an Hdfs Table and
the LogicalProject will only project out the used columns in the query. In
later versions, the LogicalProject will also handle function changes.

CalciteOptimizer: This step is to optimize the query. In this cut, it will be
a nop, but in later versions, it will perform logical optimizations via
Calcite's rule mechanism.

CalcitePhysPlanCreator: Converts the Calcite RelNode logical tree into
Impala's PlanNode physical tree

ExecRequestCreator: Implement the existing Impala steps that turn a Single
Node Plan into a Distributed Plan. It will also create the TExecRequest object
needed by the runtime server.

Only some very basic queries will work with this commit. These include:
select * from tbl <-- only needs the LogicalTableScan
select c1 from tbl <-- Also uses the LogicalProject

In the CalciteJniFrontend, there is some basic checks to make sure only
select statements will get processed. Any non-query statement will revert
back to the current Impala planner.

In this iteration, any queries besides the minimal ones listed above will
result in a caught exception which will then be run through the current
Impala planner. The tests that do work can be found in calcite.test and
run through the custom cluster test test_experimental_planner.py

This iteration should support all types with the exception of complex
types. Calcite does not have a STRING type, so the string type is
represented as VARCHAR(MAXINT) similar to how Hive represents their
STRING type.

The ImpalaTypeConverter file is used to convert the Impala Type object
to corresponding Calcite objects.

Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
---
M bin/set-classpath.sh
M bin/start-impala-cluster.py
M fe/src/main/java/org/apache/impala/analysis/TableName.java
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
A java/calcite-planner/pom.xml
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ConvertToImpalaRelRules.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaPlanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaProjectRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/NodeWithExprs.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ParentPlanRelContext.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/phys/ImpalaHdfsScanNode.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteDb.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteMetadataHandler.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteQueryParser.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteRelNodeConverter.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteValidator.java
A 

[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 22:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21109/22/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java:

http://gerrit.cloudera.org:8080/#/c/21109/22/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java@26
PS22, Line 26:  * 
https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/validate/SqlConformance.html
line too long (98 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 22
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Wed, 17 Apr 2024 18:28:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-17 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 21:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/21109/21/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java:

http://gerrit.cloudera.org:8080/#/c/21109/21/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@99
PS21, Line 99:   CalciteMetadataHandler mdHandler =
optional: other steps separate the constructor and doing the actual work - it 
would feel more consistent to me to also add a .loadTables() or similar call 
here


http://gerrit.cloudera.org:8080/#/c/21109/21/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java:

http://gerrit.cloudera.org:8080/#/c/21109/21/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java@50
PS21, Line 50: AuthorizationFactory authzFactory =
 : 
AuthorizationUtil.authzFactoryFrom(BackendConfig.INSTANCE);
Does authorization take place in this step? Is it expected to work?

I think that authorization should happen earlier, e.g. during validate, as the 
optimizer may optimize out tables the user has no right to access, revealing 
information that the user shouldn't know (e.g. that there are 0 files in the 
table, so it can be omitted).


http://gerrit.cloudera.org:8080/#/c/21109/20/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java:

http://gerrit.cloudera.org:8080/#/c/21109/20/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java@35
PS20, Line 35: ImpalaTypeSystemImpl
I don't mean to solve all type system related questions in this patch, but it 
would be nice the add a comment about the goals of this class - what is the aim 
of the type system in calcite+Impala? Does it try to be as close to the old 
Impala planner as possible, or it wants to move closer to Hive, or wants to get 
more standard? Or it could be configurable?


http://gerrit.cloudera.org:8080/#/c/21109/20/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java@48
PS20, Line 48:   // HiveDataTypeSystemImpl in the Hive github code base.
> Mentioned in above comment
This is still not clear to me in the float/double case. They have fixed byte 
count, but the "number of digits" is a matter of formatting when converting 
them to strings.

Checked calcite and it uses getMaxPrecision() here: 
https://github.com/apache/calcite/blob/152801428fc28948d8f78753c258744f7c8e253a/core/src/main/java/org/apache/calcite/rex/RexUtil.java#L527C32-L527C52

Is the goal of setting the precision to forbid assignments  from double to 
float? Impala works this way, while Hive seems to allow it.


http://gerrit.cloudera.org:8080/#/c/21109/20/testdata/workloads/functional-query/queries/QueryTest/calcite.test
File testdata/workloads/functional-query/queries/QueryTest/calcite.test:

http://gerrit.cloudera.org:8080/#/c/21109/20/testdata/workloads/functional-query/queries/QueryTest/calcite.test@47
PS20, Line 47: decimal_tbl
> Done
This file was not updated in the last patch.



--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 21
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Wed, 17 Apr 2024 15:41:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-16 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 21:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15915/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 21
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Tue, 16 Apr 2024 21:18:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-16 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 21:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21109/21/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java:

http://gerrit.cloudera.org:8080/#/c/21109/21/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java@26
PS21, Line 26:  * 
https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/validate/SqlConformance.html
line too long (98 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 21
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Tue, 16 Apr 2024 20:54:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-16 Thread Steve Carlin (Code Review)
Steve Carlin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 21:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/21109/20/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java:

http://gerrit.cloudera.org:8080/#/c/21109/20/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java@40
PS20, Line 40: 15
> Can you add more comments about what precision means in different cases? Fo
I put in a better comment, hopefully this explains it better.

The values match the values existing in HiveDataTypeSystem.  The float value, 
as I mentioned in the comment, refers to the number of digits used, much like 
tinyint has 3 digits.


http://gerrit.cloudera.org:8080/#/c/21109/20/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java@48
PS20, Line 48:   // HiveDataTypeSystemImpl in the Hive github code base.
> RelDataTypeSystemImpl uses 15 for float too. What is the difference behind
Mentioned in above comment


http://gerrit.cloudera.org:8080/#/c/21109/20/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java@86
PS20, Line 86: INARY:
> It looks scary that for some types getDefaultPrecision calls getMaxPrecisio
Makes sense.


http://gerrit.cloudera.org:8080/#/c/21109/20/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java@87
PS20, Line 87:   return RelDataType.PRECISION_NOT_SPECIFIED;
> Shouldn't this return -1, as the user is expected to provide the precision?
Done


http://gerrit.cloudera.org:8080/#/c/21109/20/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java@89
PS20, Line 89:   return RelDataType.PRECISION_NOT_SPECIFIED;
> same as for CHAR
Done


http://gerrit.cloudera.org:8080/#/c/21109/20/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java:

http://gerrit.cloudera.org:8080/#/c/21109/20/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java@121
PS20, Line 121:   @Override public boolean allowGeometry() {
  : return false;
  :   }
> This sounds great, but I think that it is not true yet. Even HMS doesn't ha
Yeah, makes sense.  I didn't look at the details under the hood as to how 
Calcite uses this.

Looking under the hood in Calcite:  This is used for the parsing language.  It 
prolly doesn't matter too much whether this is on or off, but having it off 
would be better for our purposes.

If we left it on, there would be some parsing error messages, but if the sql 
parsed correctly, it would still fail at validation time.  Best that all the 
error messages are consistent though (at validation time), so I'm changing this 
flag, thanks!

As for Hive:  Hive doesn't have a conformance file because Hive doesn't use the 
Calcite parser or validator.  This causes some major issues in the Hive code 
base and lots of extra code.


http://gerrit.cloudera.org:8080/#/c/21109/20/testdata/workloads/functional-query/queries/QueryTest/calcite.test
File testdata/workloads/functional-query/queries/QueryTest/calcite.test:

http://gerrit.cloudera.org:8080/#/c/21109/20/testdata/workloads/functional-query/queries/QueryTest/calcite.test@3
PS20, Line 3: alltypes
> a smaller table like alltypestiny could be queried so that the results coul
I just removed this.  It's not really adding any value.


http://gerrit.cloudera.org:8080/#/c/21109/20/testdata/workloads/functional-query/queries/QueryTest/calcite.test@15
PS20, Line 15: drop table if exists calcite_alltypes;
> This shouldn't be needed with unique_database
Done


http://gerrit.cloudera.org:8080/#/c/21109/20/testdata/workloads/functional-query/queries/QueryTest/calcite.test@47
PS20, Line 47: decimal_tbl
> Can you also check the other missing types?
Done



--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 21
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Tue, 16 Apr 2024 20:53:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-16 Thread Steve Carlin (Code Review)
Steve Carlin has uploaded a new patch set (#21). ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..

IMPALA-12872: Use Calcite for optimization - part 1: simple queries

This is the first commit to use the Calcite library to parse,
analyze, and optimize queries.

The hook for the planner is through an override of the JniFrontend. The
CalciteJniFrontend class is the driver that walks through each of the
Calcite steps which are as follows:

CalciteQueryParser: Takes the string query and outputs an AST in the
form of Calcite's SqlNode object.

CalciteMetadataHandler: Iterate through the SqlNode from the previous step
and make sure all essential table metadata is retrieved from catalogd.

CalciteValidator: Validate the SqlNode tree, akin to the Impala Analyzer.

CalciteRelNodeConverter: Change the AST into a logical plan. In this first
commit, the only logical nodes used are LogicalTableScan and LogicalProject.
The LogicalTableScan will serve as the node that reads from an Hdfs Table and
the LogicalProject will only project out the used columns in the query. In
later versions, the LogicalProject will also handle function changes.

CalciteOptimizer: This step is to optimize the query. In this cut, it will be
a nop, but in later versions, it will perform logical optimizations via
Calcite's rule mechanism.

CalcitePhysPlanCreator: Converts the Calcite RelNode logical tree into
Impala's PlanNode physical tree

ExecRequestCreator: Implement the existing Impala steps that turn a Single
Node Plan into a Distributed Plan. It will also create the TExecRequest object
needed by the runtime server.

Only some very basic queries will work with this commit. These include:
select * from tbl <-- only needs the LogicalTableScan
select c1 from tbl <-- Also uses the LogicalProject

In the CalciteJniFrontend, there is some basic checks to make sure only
select statements will get processed. Any non-query statement will revert
back to the current Impala planner.

In this iteration, any queries besides the minimal ones listed above will
result in a caught exception which will then be run through the current
Impala planner. The tests that do work can be found in calcite.test and
run through the custom cluster test test_experimental_planner.py

This iteration should support all types with the exception of complex
types. Calcite does not have a STRING type, so the string type is
represented as VARCHAR(MAXINT) similar to how Hive represents their
STRING type.

The ImpalaTypeConverter file is used to convert the Impala Type object
to corresponding Calcite objects.

Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
---
M bin/set-classpath.sh
M bin/start-impala-cluster.py
M fe/src/main/java/org/apache/impala/analysis/TableName.java
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
A java/calcite-planner/pom.xml
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ConvertToImpalaRelRules.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaPlanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaProjectRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/NodeWithExprs.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ParentPlanRelContext.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/phys/ImpalaHdfsScanNode.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteDb.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteMetadataHandler.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteQueryParser.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteRelNodeConverter.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteValidator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CompilerStep.java
A 

[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-16 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 20:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/21109/20//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21109/20//COMMIT_MSG@44
PS20, Line 44:
Can you add a paragraph about type support? E.g.
- all Impala types are supported with the exception of:
  - STRING is represented as VARCHAR(INT_MAX)
  - complex types are not supported
- implemented in ImpalaTypeSystemImpl and ImpalaTypeConverter


http://gerrit.cloudera.org:8080/#/c/21109/20/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java:

http://gerrit.cloudera.org:8080/#/c/21109/20/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java@40
PS20, Line 40: 15
Can you add more comments about what precision means in different cases? For 
example I am pretty confused about float/double.


http://gerrit.cloudera.org:8080/#/c/21109/20/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java@48
PS20, Line 48:   private static final int DEFAULT_FLOAT_PRECISION= 7;
RelDataTypeSystemImpl uses 15 for float too. What is the difference behind 
overriding this? 
https://github.com/apache/calcite/blob/cc1d46a4c4f88962c059e4ad0689ddfbb784ea96/core/src/main/java/org/apache/calcite/rel/type/RelDataTypeSystemImpl.java#L101


http://gerrit.cloudera.org:8080/#/c/21109/20/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java@86
PS20, Line 86: getMaxPrecision
It looks scary that for some types getDefaultPrecision calls getMaxPrecision, 
while for other it is the other way around. I think that the clearest would be 
to only call getMaxPrecision from getDefaultPrecision


http://gerrit.cloudera.org:8080/#/c/21109/20/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java@87
PS20, Line 87: case CHAR:
Shouldn't this return -1, as the user is expected to provide the precision?
https://github.com/apache/calcite/blob/cc1d46a4c4f88962c059e4ad0689ddfbb784ea96/core/src/main/java/org/apache/calcite/rel/type/RelDataTypeSystem.java#L43


http://gerrit.cloudera.org:8080/#/c/21109/20/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java@89
PS20, Line 89: case VARCHAR:
same as for CHAR


http://gerrit.cloudera.org:8080/#/c/21109/20/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java:

http://gerrit.cloudera.org:8080/#/c/21109/20/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java@121
PS20, Line 121:   @Override public boolean allowGeometry() {
  : return true;
  :   }
This sounds great, but I think that it is not true yet. Even HMS doesn't have 
geometry columns AFAIK.

Btw does Hive have a similar Conformance class? I couldn't find it.


http://gerrit.cloudera.org:8080/#/c/21109/20/testdata/workloads/functional-query/queries/QueryTest/calcite.test
File testdata/workloads/functional-query/queries/QueryTest/calcite.test:

http://gerrit.cloudera.org:8080/#/c/21109/20/testdata/workloads/functional-query/queries/QueryTest/calcite.test@3
PS20, Line 3: alltypes
a smaller table like alltypestiny could be queried so that the results could be 
checked


http://gerrit.cloudera.org:8080/#/c/21109/20/testdata/workloads/functional-query/queries/QueryTest/calcite.test@15
PS20, Line 15: drop table if exists calcite_alltypes;
This shouldn't be needed with unique_database


http://gerrit.cloudera.org:8080/#/c/21109/20/testdata/workloads/functional-query/queries/QueryTest/calcite.test@47
PS20, Line 47: decimal_tbl
Can you also check the other missing types?
For binary: binary_tbl
For date: date_tbl
For char, varchar see 
https://github.com/apache/impala/blob/61ceb16d880a7be07241f682138bfb286ec2a80e/testdata/workloads/functional-query/queries/QueryTest/chars.test#L19



--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 20
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Tue, 16 Apr 2024 14:48:02 +
Gerrit-HasComments: 

[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 20:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15862/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 20
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Wed, 10 Apr 2024 22:36:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-10 Thread Steve Carlin (Code Review)
Hello Aman Sinha, Quanlong Huang, Joe McDonnell, Csaba Ringhofer, Michael 
Smith, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21109

to look at the new patch set (#20).

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..

IMPALA-12872: Use Calcite for optimization - part 1: simple queries

This is the first commit to use the Calcite library to parse,
analyze, and optimize queries.

The hook for the planner is through an override of the JniFrontend. The
CalciteJniFrontend class is the driver that walks through each of the
Calcite steps which are as follows:

CalciteQueryParser: Takes the string query and outputs an AST in the
form of Calcite's SqlNode object.

CalciteMetadataHandler: Iterate through the SqlNode from the previous step
and make sure all essential table metadata is retrieved from catalogd.

CalciteValidator: Validate the SqlNode tree, akin to the Impala Analyzer.

CalciteRelNodeConverter: Change the AST into a logical plan. In this first
commit, the only logical nodes used are LogicalTableScan and LogicalProject.
The LogicalTableScan will serve as the node that reads from an Hdfs Table and
the LogicalProject will only project out the used columns in the query. In
later versions, the LogicalProject will also handle function changes.

CalciteOptimizer: This step is to optimize the query. In this cut, it will be
a nop, but in later versions, it will perform logical optimizations via
Calcite's rule mechanism.

CalcitePhysPlanCreator: Converts the Calcite RelNode logical tree into
Impala's PlanNode physical tree

ExecRequestCreator: Implement the existing Impala steps that turn a Single
Node Plan into a Distributed Plan. It will also create the TExecRequest object
needed by the runtime server.

Only some very basic queries will work with this commit. These include:
select * from tbl <-- only needs the LogicalTableScan
select c1 from tbl <-- Also uses the LogicalProject

In the CalciteJniFrontend, there is some basic checks to make sure only
select statements will get processed. Any non-query statement will revert
back to the current Impala planner.

In this iteration, any queries besides the minimal ones listed above will
result in a caught exception which will then be run through the current
Impala planner. The tests that do work can be found in calcite.test and
run through the custom cluster test test_experimental_planner.py

Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
---
M bin/set-classpath.sh
M bin/start-impala-cluster.py
M fe/src/main/java/org/apache/impala/analysis/TableName.java
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
A java/calcite-planner/pom.xml
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ConvertToImpalaRelRules.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaPlanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaProjectRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/NodeWithExprs.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ParentPlanRelContext.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/phys/ImpalaHdfsScanNode.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteDb.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteMetadataHandler.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteQueryParser.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteRelNodeConverter.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteValidator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CompilerStep.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeConverter.java
A 

[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 20:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21109/20/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java:

http://gerrit.cloudera.org:8080/#/c/21109/20/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java@26
PS20, Line 26:  * 
https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/validate/SqlConformance.html
line too long (98 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 20
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Wed, 10 Apr 2024 22:13:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 19:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15853/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 19
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Wed, 10 Apr 2024 14:58:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-10 Thread Steve Carlin (Code Review)
Steve Carlin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 18:

(7 comments)

Ok, great points!

The eventual goal is to make this work with all existing tests.  So it probably 
makes more sense to add these fallback tests when changes are made to make 
things work for the test framework.  At that point, all Decimal V1 tests will 
fail, so I'll add the failback test at that point.

So I went ahead and removed most of the code.

One thing that has to remain is the canStmtBePlannedThroughCalcite.  I need to 
make sure that DDL statements go back to the old path.

I was hoping to use the Calcite parser to do this, but alas, I cannot. I need a 
way to distinguish between queries that fail due to syntax issues and queries 
that won't work because they're DDL.

An earlier code review comment mentioned that the current code doesn't handle 
/* */ comments.  A Jira has been filed for that.

I also added the RUNTIME_PROFILE changes you suggested.  That is definitely 
needed and was a great suggestion. It should be able to detect both the 
original planner and the new planner.

I did leave in query checks for statements that begin with "values" and "with". 
 Those aren't tested here, but I'd prefer to leave them in at this point 
because there are an abundance of tests in the main framework that use this, 
and the RUNTIME_PROFILE changes I made (from your comments) will ensure we 
don't miss this in the future.

http://gerrit.cloudera.org:8080/#/c/21109/18/bin/set-classpath.sh
File bin/set-classpath.sh:

http://gerrit.cloudera.org:8080/#/c/21109/18/bin/set-classpath.sh@41
PS18, Line 41:
> Nit: stray line
Done


http://gerrit.cloudera.org:8080/#/c/21109/18/bin/set-classpath.sh@56
PS18, Line 56: echo "USE_CALCITE_PLANNER"
 : echo $USE_CALCITE_PLANNER
> Can we skip this debug statement?
Done


http://gerrit.cloudera.org:8080/#/c/21109/18/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java:

http://gerrit.cloudera.org:8080/#/c/21109/18/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@144
PS18, Line 144:  catch (Exception e) {
  :   LOG.info("Calcite planner failed.");
  :   LOG.info("Exception: " + e);
  :   if (e != null) {
  : LOG.info("Stack Trace:" + 
ExceptionUtils.getStackTrace(e));
  : throw new InternalException(e.getMessage());
  :   }
  :   throw new RuntimeException(e);
  : }
> For complete fallback support, this case would need to fall back. In practi
Answered in the main reply


http://gerrit.cloudera.org:8080/#/c/21109/18/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@159
PS18, Line 159:   private boolean canStmtBePlannedThroughCalcite(QueryContext 
queryCtx) {
  : String stringWithFirstRealWord = queryCtx.getStmt();
  : String[] lines = stringWithFirstRealWord.split("\n");
  : // Get rid of comments and blank lines which start the 
query. We need to find
  : // the first real word.
  : // TODO: IMPALA-12976: need to make this more generic. 
Certain patterns aren't caught
  : // here like /* */
  : for (String line : lines) {
  :   if (line.trim().startsWith("--") || 
line.trim().equals("")) {
  : stringWithFirstRealWord = 
stringWithFirstRealWord.replaceFirst(line + "\n", "");
  :   } else {
  : break;
  :   }
  : }
  : stringWithFirstRealWord = stringWithFirstRealWord.trim();
  : String beforeStripString;
  : do {
  :   beforeStripString = stringWithFirstRealWord;
  :   stringWithFirstRealWord = 
StringUtils.stripStart(stringWithFirstRealWord, "(");
  :   stringWithFirstRealWord = 
StringUtils.stripStart(stringWithFirstRealWord, null);
  : } while 
(!stringWithFirstRealWord.equals(beforeStripString));
  : return 
StringUtils.startsWithIgnoreCase(stringWithFirstRealWord, "select") ||
  : 
StringUtils.startsWithIgnoreCase(stringWithFirstRealWord, "values") ||
  : 
StringUtils.startsWithIgnoreCase(stringWithFirstRealWord, "with");
  :   }
  :
  :   private void checkUnsupportedFeatures(QueryContext queryCtx)
  :   throws CalcitePlannerUnsupportedException {
  : if 

[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-10 Thread Steve Carlin (Code Review)
Hello Aman Sinha, Quanlong Huang, Joe McDonnell, Csaba Ringhofer, Michael 
Smith, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21109

to look at the new patch set (#19).

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..

IMPALA-12872: Use Calcite for optimization - part 1: simple queries

This is the first commit to use the Calcite library to parse,
analyze, and optimize queries.

The hook for the planner is through an override of the JniFrontend. The
CalciteJniFrontend class is the driver that walks through each of the
Calcite steps which are as follows:

CalciteQueryParser: Takes the string query and outputs an AST in the
form of Calcite's SqlNode object.

CalciteMetadataHandler: Iterate through the SqlNode from the previous step
and make sure all essential table metadata is retrieved from catalogd.

CalciteValidator: Validate the SqlNode tree, akin to the Impala Analyzer.

CalciteRelNodeConverter: Change the AST into a logical plan. In this first
commit, the only logical nodes used are LogicalTableScan and LogicalProject.
The LogicalTableScan will serve as the node that reads from an Hdfs Table and
the LogicalProject will only project out the used columns in the query. In
later versions, the LogicalProject will also handle function changes.

CalciteOptimizer: This step is to optimize the query. In this cut, it will be
a nop, but in later versions, it will perform logical optimizations via
Calcite's rule mechanism.

CalcitePhysPlanCreator: Converts the Calcite RelNode logical tree into
Impala's PlanNode physical tree

ExecRequestCreator: Implement the existing Impala steps that turn a Single
Node Plan into a Distributed Plan. It will also create the TExecRequest object
needed by the runtime server.

Only some very basic queries will work with this commit. These include:
select * from tbl <-- only needs the LogicalTableScan
select c1 from tbl <-- Also uses the LogicalProject

In the CalciteJniFrontend, there is some basic checks to make sure only
select statements will get processed. Any non-query statement will revert
back to the current Impala planner.

In this iteration, any queries besides the minimal ones listed above will
result in a caught exception which will then be run through the current
Impala planner. The tests that do work can be found in calcite.test and
run through the custom cluster test test_experimental_planner.py

Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
---
M bin/set-classpath.sh
M bin/start-impala-cluster.py
M fe/src/main/java/org/apache/impala/analysis/TableName.java
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
A java/calcite-planner/pom.xml
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ConvertToImpalaRelRules.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaPlanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaProjectRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/NodeWithExprs.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ParentPlanRelContext.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/phys/ImpalaHdfsScanNode.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteDb.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteMetadataHandler.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteQueryParser.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteRelNodeConverter.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteValidator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CompilerStep.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeConverter.java
A 

[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 19:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21109/19/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java:

http://gerrit.cloudera.org:8080/#/c/21109/19/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java@26
PS19, Line 26:  * 
https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/validate/SqlConformance.html
line too long (98 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 19
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Wed, 10 Apr 2024 14:23:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-08 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 18:

(7 comments)

This is looking good, but I have a couple points.
First, if it doesn't have a test, then it doesn't exist. Maybe that's a bit 
strict for something this early, but it is the only thing that I know works. If 
we accumulate code without tests, it's a pain to go back and fix that. Along 
that vein, it would be good to have tests for the cases that bailout early from 
the Calcite planner. Some examples:
1. Testing semi join / anti join detection
2. Testing with decimal v1 / appx count distinct
3. Testing with Kudu/HBase table
4. Testing with complex types
5. Testing with a view
6. The logic in canStmtBePlannedThroughCalcite()
Some of these will go away over time, and that's ok.

Second, we have the first piece of fallback, but fallback is going to need more 
work. We're going to want to have reasonable information in the profiles and 
we'll want test cases. We are probably going to need to find a way to merge the 
timelines / account for time before Calcite falls back. Storing the error 
message that caused fallback into the profile would be useful. All of this is a 
long way of saying: I would be ok if this change didn't have fallback.

http://gerrit.cloudera.org:8080/#/c/21109/18/bin/set-classpath.sh
File bin/set-classpath.sh:

http://gerrit.cloudera.org:8080/#/c/21109/18/bin/set-classpath.sh@41
PS18, Line 41:
Nit: stray line


http://gerrit.cloudera.org:8080/#/c/21109/18/bin/set-classpath.sh@56
PS18, Line 56: echo "USE_CALCITE_PLANNER"
 : echo $USE_CALCITE_PLANNER
Can we skip this debug statement?


http://gerrit.cloudera.org:8080/#/c/21109/18/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java:

http://gerrit.cloudera.org:8080/#/c/21109/18/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@144
PS18, Line 144:  catch (Exception e) {
  :   LOG.info("Calcite planner failed.");
  :   LOG.info("Exception: " + e);
  :   if (e != null) {
  : LOG.info("Stack Trace:" + 
ExceptionUtils.getStackTrace(e));
  : throw new InternalException(e.getMessage());
  :   }
  :   throw new RuntimeException(e);
  : }
For complete fallback support, this case would need to fall back. In practice, 
most exceptions are not CalcitePlannerUnsupportedExceptions. e.g. "select 
count(*) from functional.alltypes" doesn't fall back.

There are a variety of test cases that I would want to see for fallback 
functionality (e.g. what does the profile show for fallback, what things 
fallback instantly versus fallback as a result of a Calcite error, etc). Should 
we skip fallback in this initial change?


http://gerrit.cloudera.org:8080/#/c/21109/18/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@159
PS18, Line 159:   private boolean canStmtBePlannedThroughCalcite(QueryContext 
queryCtx) {
  : String stringWithFirstRealWord = queryCtx.getStmt();
  : String[] lines = stringWithFirstRealWord.split("\n");
  : // Get rid of comments and blank lines which start the 
query. We need to find
  : // the first real word.
  : // TODO: IMPALA-12976: need to make this more generic. 
Certain patterns aren't caught
  : // here like /* */
  : for (String line : lines) {
  :   if (line.trim().startsWith("--") || 
line.trim().equals("")) {
  : stringWithFirstRealWord = 
stringWithFirstRealWord.replaceFirst(line + "\n", "");
  :   } else {
  : break;
  :   }
  : }
  : stringWithFirstRealWord = stringWithFirstRealWord.trim();
  : String beforeStripString;
  : do {
  :   beforeStripString = stringWithFirstRealWord;
  :   stringWithFirstRealWord = 
StringUtils.stripStart(stringWithFirstRealWord, "(");
  :   stringWithFirstRealWord = 
StringUtils.stripStart(stringWithFirstRealWord, null);
  : } while 
(!stringWithFirstRealWord.equals(beforeStripString));
  : return 
StringUtils.startsWithIgnoreCase(stringWithFirstRealWord, "select") ||
  : 
StringUtils.startsWithIgnoreCase(stringWithFirstRealWord, "values") ||
  : 
StringUtils.startsWithIgnoreCase(stringWithFirstRealWord, "with");
  :   }
  :
  :   private void 

[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 18:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15794/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 18
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Sun, 07 Apr 2024 02:31:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-06 Thread Steve Carlin (Code Review)
Steve Carlin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 16:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java:

http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java@1
PS17, Line 1: /*
> In some files, the License header is enclosed in the multi statement syntax
Done


http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java:

http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java@30
PS17, Line 30:  queryCtx;
> Pls add underscore
Done


http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java@31
PS17, Line 31: stmtTableCache
> Add underscore
Done


http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java:

http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@94
PS16, Line 94: :
> nit: add space or newline after :  so that the query string is separated ou
Done


http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@148
PS16, Line 148:   if (e == null) {
> I wanted the ability to print the stack trace hree.  But I also throw an In
Done


http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@209
PS16, Line 209: LOG.info("Using Impala planner");
> Can you make this consistent with the corresponding message for Calcite pla
Done


http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java:

http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@116
PS17, Line 116: logical plan
> nit: to be precise, can we say 'initial logical plan'
Done


http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@121
PS17, Line 121: optimized plan
> nit: to be precise, can we say 'optimized logical plan' .
Done


http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@234
PS17, Line 234:   // hack to match the FrontEnd code
> Can you add some more context to this .. which part of the FE code is this
Done


http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteQueryParser.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteQueryParser.java:

http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteQueryParser.java@53
PS17, Line 53: Finished parse step, but unknown result:
> nit: The two part sentence could be just a single one : "Parser produced an
Done


http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteRelNodeConverter.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteRelNodeConverter.java:

http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteRelNodeConverter.java@91
PS17, Line 91: Finished RelNodeConverter step, but unknown result:
> nit: similar to above.
Done


http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/util/NotSupported.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/util/NotSupported.java:

http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/util/NotSupported.java@26
PS17, Line 26:   public static String VIEWS = "Views are not yet supported.";
> The not supported messages should be consistent.
Done



--
To view, visit 

[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-06 Thread Steve Carlin (Code Review)
Hello Aman Sinha, Quanlong Huang, Joe McDonnell, Csaba Ringhofer, Michael 
Smith, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21109

to look at the new patch set (#18).

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..

IMPALA-12872: Use Calcite for optimization - part 1: simple queries

This is the first commit to use the Calcite library to parse,
analyze, and optimize queries.

The hook for the planner is through an override of the JniFrontend. The
CalciteJniFrontend class is the driver that walks through each of the
Calcite steps which are as follows:

CalciteQueryParser: Takes the string query and outputs an AST in the
form of Calcite's SqlNode object.

CalciteMetadataHandler: Iterate through the SqlNode from the previous step
and make sure all essential table metadata is retrieved from catalogd.

CalciteValidator: Validate the SqlNode tree, akin to the Impala Analyzer.

CalciteRelNodeConverter: Change the AST into a logical plan. In this first
commit, the only logical nodes used are LogicalTableScan and LogicalProject.
The LogicalTableScan will serve as the node that reads from an Hdfs Table and
the LogicalProject will only project out the used columns in the query. In
later versions, the LogicalProject will also handle function changes.

CalciteOptimizer: This step is to optimize the query. In this cut, it will be
a nop, but in later versions, it will perform logical optimizations via
Calcite's rule mechanism.

CalcitePhysPlanCreator: Converts the Calcite RelNode logical tree into
Impala's PlanNode physical tree

ExecRequestCreator: Implement the existing Impala steps that turn a Single
Node Plan into a Distributed Plan. It will also create the TExecRequest object
needed by the runtime server.

Only some very basic queries will work with this commit. These include:
select * from tbl <-- only needs the LogicalTableScan
select c1 from tbl <-- Also uses the LogicalProject

In the CalciteJniFrontend, there is some basic checks to make sure only
select statements will get processed. Any non-query statement will revert
back to the current Impala planner.

In this iteration, any queries besides the minimal ones listed above will
result in a caught exception which will then be run through the current
Impala planner. The tests that do work can be found in calcite.test and
run through the custom cluster test test_experimental_planner.py

Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
---
M bin/set-classpath.sh
M bin/start-impala-cluster.py
M fe/src/main/java/org/apache/impala/analysis/TableName.java
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
A java/calcite-planner/pom.xml
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ConvertToImpalaRelRules.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaPlanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaProjectRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/NodeWithExprs.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ParentPlanRelContext.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/phys/ImpalaHdfsScanNode.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteDb.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteMetadataHandler.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteQueryParser.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteRelNodeConverter.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteValidator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CompilerStep.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeConverter.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java
A 

[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 18:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21109/18/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java:

http://gerrit.cloudera.org:8080/#/c/21109/18/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java@26
PS18, Line 26:  * 
https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/validate/SqlConformance.html
line too long (98 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 18
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Sun, 07 Apr 2024 02:09:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-05 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 17:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java:

http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java@1
PS17, Line 1: /*
In some files, the License header is enclosed in the multi statement syntax /*. 
 .. */ whereas in others it is the single line comment.
Pls make it consistent in all the files and use whatever is being used in 
existing Impala files.
Impala uses the single line comment style.  '// '


http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java:

http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java@30
PS17, Line 30:  queryCtx;
Pls add underscore


http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java@31
PS17, Line 31: stmtTableCache
Add underscore


http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java:

http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@94
PS16, Line 94: :
nit: add space or newline after :  so that the query string is separated out.


http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@148
PS16, Line 148:   if (e == null) {
> I wanted the ability to print the stack trace hree.  But I also throw an In
So this should be e != null then.


http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@166
PS16, Line 166:   if (line.trim().startsWith("--") || 
line.trim().equals("")) {
Comments can be multi-statement also e.g.
 /*  ignore
   this
*/
SELECT a, b, c.
There could be other patterns not handled here.  Can we not use the parser 
methods directly ? If this is supposed to be a temporary function or if this is 
expected to go through revision later, pls add the relevant comments.


http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@209
PS16, Line 209: LOG.info("Using Impala planner");
Can you make this consistent with the corresponding message for Calcite planner 
on line 94.  Something like:
LOG.info("Using Impala Planner for the following query: " + queryCtx.getStmt());


http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@245
PS16, Line 245: Frontend Timeline (Calcite Planner)")
This QueryContext is used in both cases right ?  i.e if Calcite planner fails, 
fall back to Impala planner.  So this message should be modified if the fall 
back occurs.


http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java:

http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@116
PS17, Line 116: logical plan
nit: to be precise, can we say 'initial logical plan'


http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@121
PS17, Line 121: optimized plan
nit: to be precise, can we say 'optimized logical plan' .


http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@234
PS17, Line 234:   // hack to match the FrontEnd code
Can you add some more context to this .. which part of the FE code is this 
referring to ?


http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteQueryParser.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteQueryParser.java:


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 17:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15744/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 17
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Sun, 31 Mar 2024 16:30:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-31 Thread Steve Carlin (Code Review)
Hello Aman Sinha, Quanlong Huang, Joe McDonnell, Csaba Ringhofer, Michael 
Smith, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21109

to look at the new patch set (#17).

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..

IMPALA-12872: Use Calcite for optimization - part 1: simple queries

This is the first commit to use the Calcite library to parse,
analyze, and optimize queries.

The hook for the planner is through an override of the JniFrontend. The
CalciteJniFrontend class is the driver that walks through each of the
Calcite steps which are as follows:

CalciteQueryParser: Takes the string query and outputs an AST in the
form of Calcite's SqlNode object.

CalciteMetadataHandler: Iterate through the SqlNode from the previous step
and make sure all essential table metadata is retrieved from catalogd.

CalciteValidator: Validate the SqlNode tree, akin to the Impala Analyzer.

CalciteRelNodeConverter: Change the AST into a logical plan. In this first
commit, the only logical nodes used are LogicalTableScan and LogicalProject.
The LogicalTableScan will serve as the node that reads from an Hdfs Table and
the LogicalProject will only project out the used columns in the query. In
later versions, the LogicalProject will also handle function changes.

CalciteOptimizer: This step is to optimize the query. In this cut, it will be
a nop, but in later versions, it will perform logical optimizations via
Calcite's rule mechanism.

CalcitePhysPlanCreator: Converts the Calcite RelNode logical tree into
Impala's PlanNode physical tree

ExecRequestCreator: Implement the existing Impala steps that turn a Single
Node Plan into a Distributed Plan. It will also create the TExecRequest object
needed by the runtime server.

Only some very basic queries will work with this commit. These include:
select * from tbl <-- only needs the LogicalTableScan
select c1 from tbl <-- Also uses the LogicalProject

In the CalciteJniFrontend, there is some basic checks to make sure only
select statements will get processed. Any non-query statement will revert
back to the current Impala planner.

In this iteration, any queries besides the minimal ones listed above will
result in a caught exception which will then be run through the current
Impala planner. The tests that do work can be found in calcite.test and
run through the custom cluster test test_experimental_planner.py

Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
---
M bin/set-classpath.sh
M bin/start-impala-cluster.py
M fe/src/main/java/org/apache/impala/analysis/TableName.java
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
A java/calcite-planner/pom.xml
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ConvertToImpalaRelRules.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaPlanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaProjectRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/NodeWithExprs.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ParentPlanRelContext.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/phys/ImpalaHdfsScanNode.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteDb.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteMetadataHandler.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteQueryParser.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteRelNodeConverter.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteValidator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CompilerStep.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeConverter.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java
A 

[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-31 Thread Steve Carlin (Code Review)
Steve Carlin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 16:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/21109/16/bin/set-classpath.sh
File bin/set-classpath.sh:

http://gerrit.cloudera.org:8080/#/c/21109/16/bin/set-classpath.sh@62
PS16, Line 62: FE
> nit: use expanded form, not acronym
Done


http://gerrit.cloudera.org:8080/#/c/21109/16/bin/start-impala-cluster.py
File bin/start-impala-cluster.py:

http://gerrit.cloudera.org:8080/#/c/21109/16/bin/start-impala-cluster.py@182
PS16, Line 182: U
> nit: lowercase 'u'.
Done


http://gerrit.cloudera.org:8080/#/c/21109/16/bin/start-impala-cluster.py@183
PS16, Line 183:   "instead of JniFrontend.")
> JniFrontend is an internal class, shouldn't be mentioned in a user message.
Done


http://gerrit.cloudera.org:8080/#/c/21109/16/fe/src/main/java/org/apache/impala/planner/PlannerContext.java
File fe/src/main/java/org/apache/impala/planner/PlannerContext.java:

http://gerrit.cloudera.org:8080/#/c/21109/16/fe/src/main/java/org/apache/impala/planner/PlannerContext.java@97
PS16, Line 97:   // Constructor useful for an external planner module
> nit: this comment can be removed since the purpose of this patch is to make
Done


http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java:

http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java@95
PS16, Line 95: List scanOutputExprs = new 
ArrayList<>(Collections.nCopies(totalCols, null));
> For wide tables where we are only needing a few columns projected, we will
Filed IMPALA-12961


http://gerrit.cloudera.org:8080/#/c/21109/15/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java:

http://gerrit.cloudera.org:8080/#/c/21109/15/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@52
PS15, Line 52: an experimental
> nit: remove experimental
Done


http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java:

http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@148
PS16, Line 148:   if (e == null) {
> Why is this checking for null here ? e is already being referenced above.
I wanted the ability to print the stack trace hree.  But I also throw an 
Internal Exception which dereferences "e" in e.getMessage().  It isn't 
dereferenced until this statement.



--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 16
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Sun, 31 Mar 2024 16:06:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 17:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java:

http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java@25
PS17, Line 25:  * 
https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/validate/SqlConformance.html
line too long (98 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 17
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Sun, 31 Mar 2024 16:06:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-29 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 16:

(7 comments)

A few more comments based on a second pass.

http://gerrit.cloudera.org:8080/#/c/21109/16/bin/set-classpath.sh
File bin/set-classpath.sh:

http://gerrit.cloudera.org:8080/#/c/21109/16/bin/set-classpath.sh@62
PS16, Line 62: FE
nit: use expanded form, not acronym


http://gerrit.cloudera.org:8080/#/c/21109/16/bin/start-impala-cluster.py
File bin/start-impala-cluster.py:

http://gerrit.cloudera.org:8080/#/c/21109/16/bin/start-impala-cluster.py@182
PS16, Line 182: U
nit: lowercase 'u'.


http://gerrit.cloudera.org:8080/#/c/21109/16/bin/start-impala-cluster.py@183
PS16, Line 183:   "instead of JniFrontend.")
JniFrontend is an internal class, shouldn't be mentioned in a user message.  
How about 'If true, use the Calcite planner for query optimization instead of 
Impala planner'


http://gerrit.cloudera.org:8080/#/c/21109/16/fe/src/main/java/org/apache/impala/planner/PlannerContext.java
File fe/src/main/java/org/apache/impala/planner/PlannerContext.java:

http://gerrit.cloudera.org:8080/#/c/21109/16/fe/src/main/java/org/apache/impala/planner/PlannerContext.java@97
PS16, Line 97:   // Constructor useful for an external planner module
nit: this comment can be removed since the purpose of this patch is to make 
Calcite planner an internal module.


http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java:

http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java@95
PS16, Line 95: List scanOutputExprs = new 
ArrayList<>(Collections.nCopies(totalCols, null));
For wide tables where we are only needing a few columns projected, we will end 
up with a long list with mostly Nulls.
A LinkedHashMap (preserves Insertion order) where the key is position and value 
is the SlotRef would be better suited despite the cpu cost of hashing.  In 
general, in a query planner,  memory is the most precious commodity since the 
plan search space can be large, so anything we can do to reduce memory 
footprint would be preferred.
That said, I would be ok if this is done in a subsequent patch, just keep track 
through a Jira.


http://gerrit.cloudera.org:8080/#/c/21109/15/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java:

http://gerrit.cloudera.org:8080/#/c/21109/15/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@52
PS15, Line 52: an experimental
nit: remove experimental


http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java:

http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@148
PS16, Line 148:   if (e == null) {
Why is this checking for null here ? e is already being referenced above.



--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 16
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Sat, 30 Mar 2024 01:56:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 16:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15739/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 16
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Fri, 29 Mar 2024 22:57:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-29 Thread Steve Carlin (Code Review)
Hello Aman Sinha, Quanlong Huang, Joe McDonnell, Csaba Ringhofer, Michael 
Smith, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21109

to look at the new patch set (#16).

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..

IMPALA-12872: Use Calcite for optimization - part 1: simple queries

This is the first commit to use the Calcite library to parse,
analyze, and optimize queries.

The hook for the planner is through an override of the JniFrontend. The
CalciteJniFrontend class is the driver that walks through each of the
Calcite steps which are as follows:

CalciteQueryParser: Takes the string query and outputs an AST in the
form of Calcite's SqlNode object.

CalciteMetadataHandler: Iterate through the SqlNode from the previous step
and make sure all essential table metadata is retrieved from catalogd.

CalciteValidator: Validate the SqlNode tree, akin to the Impala Analyzer.

CalciteRelNodeConverter: Change the AST into a logical plan. In this first
commit, the only logical nodes used are LogicalTableScan and LogicalProject.
The LogicalTableScan will serve as the node that reads from an Hdfs Table and
the LogicalProject will only project out the used columns in the query. In
later versions, the LogicalProject will also handle function changes.

CalciteOptimizer: This step is to optimize the query. In this cut, it will be
a nop, but in later versions, it will perform logical optimizations via
Calcite's rule mechanism.

CalcitePhysPlanCreator: Converts the Calcite RelNode logical tree into
Impala's PlanNode physical tree

ExecRequestCreator: Implement the existing Impala steps that turn a Single
Node Plan into a Distributed Plan. It will also create the TExecRequest object
needed by the runtime server.

Only some very basic queries will work with this commit. These include:
select * from tbl <-- only needs the LogicalTableScan
select c1 from tbl <-- Also uses the LogicalProject

In the CalciteJniFrontend, there is some basic checks to make sure only
select statements will get processed. Any non-query statement will revert
back to the current Impala planner.

In this iteration, any queries besides the minimal ones listed above will
result in a caught exception which will then be run through the current
Impala planner. The tests that do work can be found in calcite.test and
run through the custom cluster test test_experimental_planner.py

Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
---
M bin/set-classpath.sh
M bin/start-impala-cluster.py
M fe/src/main/java/org/apache/impala/analysis/TableName.java
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
A java/calcite-planner/pom.xml
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ConvertToImpalaRelRules.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaPlanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaProjectRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/NodeWithExprs.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ParentPlanRelContext.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/phys/ImpalaHdfsScanNode.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteDb.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteMetadataHandler.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteQueryParser.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteRelNodeConverter.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteValidator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CompilerStep.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeConverter.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java
A 

[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 16:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java:

http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java@25
PS16, Line 25:  * 
https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/validate/SqlConformance.html
line too long (98 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 16
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Fri, 29 Mar 2024 22:34:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-27 Thread Steve Carlin (Code Review)
Steve Carlin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 15:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21109/15/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java:

http://gerrit.cloudera.org:8080/#/c/21109/15/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@56
PS15, Line 56: public class CalciteJniFrontend extends JniFrontend {
> Future: One path forward is that as the Calcite planner gains functionality
This totally makes sense.

The nice thing (as you probably are aware) about keeping it in its own 
JniFrontend right now is to remove any kind of dependency needed while this is 
not near production ready.

But yeah, once we are ready to move this over, then there should only be one 
frontend.

It also does make sense to have a query option to run on the current planner in 
production even before the planners get combined.  I've filed a subtask for 
this: IMPALA-12946


http://gerrit.cloudera.org:8080/#/c/21109/15/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteMetadataHandler.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteMetadataHandler.java:

http://gerrit.cloudera.org:8080/#/c/21109/15/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteMetadataHandler.java@93
PS15, Line 93: // load the relevant tables in the query from catalogd
 : this.stmtTableCache_ = 
stmtMetadataLoader.loadTables(tableVisitor.tableNames_);
> Future: When this supports views, are we thinking that StmtMetadataLoader w
Admittedly, I haven't looked very closely at how views are going to be handled 
yet. And I'm not sure I understand what you mean here and where StatementBase 
is being used.

Perhaps you are suggesting a refactor somewhere? Yeah, currently, the purpose 
behind the code here is to make sure impalad has retrieved all the metadata 
information from catalogd. The stmtTableCache_ object contains this metadata 
and will be used by the Calcite Schema to get various information (e.g. 
statistics for the table).

I'm presuming  views fall somewhere under this umbrella though in terms of 
metadata from catalogd.



--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 15
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Wed, 27 Mar 2024 15:31:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-25 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 15:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21109/15/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java:

http://gerrit.cloudera.org:8080/#/c/21109/15/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@56
PS15, Line 56: public class CalciteJniFrontend extends JniFrontend {
Future: One path forward is that as the Calcite planner gains functionality, it 
eventually gets pulled into the regular frontend. In that world, Calcite 
planning would be controlled by a query option and wouldn't need to be its own 
subclass of JniFrontend. The normal JniFrontend would call out to the right 
planner (while trying to share lots of code). How does that line up with your 
own idea of what the future looks like?

We have a lot of logic in various parts of the existing frontend that we want 
to reuse if can. For example, Frontend's getTExecRequest() has a lot of logic 
for workload-aware autoscaling and considering plans for different sizes of 
executor groups. The Calcite planner implements something similar to Frontend's 
doCreateExecRequest().


http://gerrit.cloudera.org:8080/#/c/21109/15/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteMetadataHandler.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteMetadataHandler.java:

http://gerrit.cloudera.org:8080/#/c/21109/15/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteMetadataHandler.java@93
PS15, Line 93: // load the relevant tables in the query from catalogd
 : this.stmtTableCache_ = 
stmtMetadataLoader.loadTables(tableVisitor.tableNames_);
Future: When this supports views, are we thinking that StmtMetadataLoader will 
do that under the covers? It feels like StmtMetadataLoader only really needs 
the StatementBase for limited things.



--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 15
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Tue, 26 Mar 2024 02:03:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 15:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15663/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 15
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Mon, 25 Mar 2024 22:25:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 14:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15662/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 14
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Mon, 25 Mar 2024 22:24:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 13:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15661/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 13
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Mon, 25 Mar 2024 22:19:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-25 Thread Steve Carlin (Code Review)
Hello Aman Sinha, Joe McDonnell, Csaba Ringhofer, Michael Smith, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21109

to look at the new patch set (#15).

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..

IMPALA-12872: Use Calcite for optimization - part 1: simple queries

This is the first commit to use the Calcite library to parse,
analyze, and optimize queries.

The hook for the planner is through an override of the JniFrontend. The
CalciteJniFrontend class is the driver that walks through each of the
Calcite steps which are as follows:

CalciteQueryParser: Takes the string query and outputs an AST in the
form of Calcite's SqlNode object.

CalciteMetadataHandler: Iterate through the SqlNode from the previous step
and make sure all essential table metadata is retrieved from catalogd.

CalciteValidator: Validate the SqlNode tree, akin to the Impala Analyzer.

CalciteRelNodeConverter: Change the AST into a logical plan. In this first
commit, the only logical nodes used are LogicalTableScan and LogicalProject.
The LogicalTableScan will serve as the node that reads from an Hdfs Table and
the LogicalProject will only project out the used columns in the query. In
later versions, the LogicalProject will also handle function changes.

CalciteOptimizer: This step is to optimize the query. In this cut, it will be
a nop, but in later versions, it will perform logical optimizations via
Calcite's rule mechanism.

CalcitePhysPlanCreator: Converts the Calcite RelNode logical tree into
Impala's PlanNode physical tree

ExecRequestCreator: Implement the existing Impala steps that turn a Single
Node Plan into a Distributed Plan. It will also create the TExecRequest object
needed by the runtime server.

Only some very basic queries will work with this commit. These include:
select * from tbl <-- only needs the LogicalTableScan
select c1 from tbl <-- Also uses the LogicalProject

In the CalciteJniFrontend, there is some basic checks to make sure only
select statements will get processed. Any non-query statement will revert
back to the current Impala planner.

In this iteration, any queries besides the minimal ones listed above will
result in a caught exception which will then be run through the current
Impala planner. The tests that do work can be found in calcite.test and
run through the custom cluster test test_experimental_planner.py

Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
---
M bin/set-classpath.sh
M bin/start-impala-cluster.py
M fe/src/main/java/org/apache/impala/analysis/TableName.java
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
A java/calcite-planner/pom.xml
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ConvertToImpalaRelRules.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaPlanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaProjectRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/NodeWithExprs.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ParentPlanRelContext.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/phys/ImpalaHdfsScanNode.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteDb.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteMetadataHandler.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteQueryParser.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteRelNodeConverter.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteValidator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CompilerStep.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeConverter.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java
A 

[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-25 Thread Steve Carlin (Code Review)
Hello Aman Sinha, Joe McDonnell, Csaba Ringhofer, Michael Smith, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21109

to look at the new patch set (#14).

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..

IMPALA-12872: Use Calcite for optimization - part 1: simple queries

This is the first commit to use the Calcite library to parse,
analyze, and optimize queries.

The hook for the planner is through an override of the JniFrontend. The
CalciteJniFrontend class is the driver that walks through each of the
Calcite steps which are as follows:

CalciteQueryParser: Takes the string query and outputs an AST in the
form of Calcite's SqlNode object.

CalciteMetadataHandler: Iterate through the SqlNode from the previous step
and make sure all essential table metadata is retrieved from catalogd.

CalciteValidator: Validate the SqlNode tree, akin to the Impala Analyzer.

CalciteRelNodeConverter: Change the AST into a logical plan. In this first
commit, the only logical nodes used are LogicalTableScan and LogicalProject.
The LogicalTableScan will serve as the node that reads from an Hdfs Table and
the LogicalProject will only project out the used columns in the query. In
later versions, the LogicalProject will also handle function changes.

CalciteOptimizer: This step is to optimize the query. In this cut, it will be
a nop, but in later versions, it will perform logical optimizations via
Calcite's rule mechanism.

CalcitePhysPlanCreator: Converts the Calcite RelNode logical tree into
Impala's PlanNode physical tree

ExecRequestCreator: Implement the existing Impala steps that turn a Single
Node Plan into a Distributed Plan. It will also create the TExecRequest object
needed by the runtime server.

Only some very basic queries will work with this commit. These include:
select * from tbl <-- only needs the LogicalTableScan
select c1 from tbl <-- Also uses the LogicalProject

In the CalciteJniFrontend, there is some basic checks to make sure only
select statements will get processed. Any non-query statement will revert
back to the current Impala planner.

In this iteration, any queries besides the minimal ones listed above will
result in a caught exception which will then be run through the current
Impala planner. The tests that do work can be found in calcite.test and
run through the custom cluster test test_experimental_planner.py

Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
---
M bin/set-classpath.sh
M bin/start-impala-cluster.py
M fe/src/main/java/org/apache/impala/analysis/TableName.java
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
A java/calcite-planner/pom.xml
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ConvertToImpalaRelRules.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaPlanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaProjectRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/NodeWithExprs.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ParentPlanRelContext.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/phys/ImpalaHdfsScanNode.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteDb.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteMetadataHandler.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteQueryParser.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteRelNodeConverter.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteValidator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CompilerStep.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeConverter.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java
A 

[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21109/15/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java:

http://gerrit.cloudera.org:8080/#/c/21109/15/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java@25
PS15, Line 25:  * 
https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/validate/SqlConformance.html
line too long (98 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 15
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Mon, 25 Mar 2024 22:02:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-25 Thread Steve Carlin (Code Review)
Steve Carlin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 12:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/21109/12/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java:

http://gerrit.cloudera.org:8080/#/c/21109/12/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java@70
PS12, Line 70: // O BENEVOLENT REVIEWER AND CODE INSPECTOR...
 : // TODO: Please hold off on reviewing this file.  I held off on 
cleaning this up until
 : // this gets past the experimental stage. Some of the code in 
SingleNodePlanner
 : // is duplicated here, so this will involve a general rewrite. 
After more Calcite
 : // code gets committed and the planner works for a good portion 
of the queries, this
 : // will get rewritten into its final form.
> Let's rework this comment since we're looking to merge this.
Heh, didn't like my comment, eh?  :). I sometimes get a tad bit whimsical.

Changed the comment, hopefully it looks better now.


http://gerrit.cloudera.org:8080/#/c/21109/12/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java@101
PS12, Line 101:* Create an exec request for Impala to execute based on the 
supplied plan.
> Can you add a comment that this is similar to Frontend's createExecRequest(
Done


http://gerrit.cloudera.org:8080/#/c/21109/12/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java@168
PS12, Line 168: Hive CBO
> Nit: Since we're using Calcite, let's update locations that reference Hive
Done


http://gerrit.cloudera.org:8080/#/c/21109/12/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java@196
PS12, Line 196:* @param planNodeRoot root node of the Impala physical plan
  :* @param destination path to the target table if the table 
is not null
  :* @param isOverwrite true if it is an INSERT OVERWRITE 
statement
  :* @param writeId write ID of the target table if the table 
is not null
  :* @return list of plan fragments in the order [root 
fragment, child of root ...
  :* leaf fragment]
  :* @throws ImpalaException
  :* @throws HiveException
> Update this to match the new signature or remove it
Done


http://gerrit.cloudera.org:8080/#/c/21109/12/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java@335
PS12, Line 335:* Return true if any join in the plan rooted at 'root' was 
inverted.
  :*
  :* TODO: This should be replaced once we conclude the changes 
contained in this method
  :*   are safe to be pushed to Planner.invertJoins, i.e., 
they do not cause any
  :*   performance regressions with Impala FE.
> Could you add a sentence or two about what is different? From looking at th
Done


http://gerrit.cloudera.org:8080/#/c/21109/12/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java:

http://gerrit.cloudera.org:8080/#/c/21109/12/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java@25
PS12, Line 25: 
https://javadoc.io/doc/org.apache.calcite/calcite-core/latest/index.html
> This might be a better link:
Done, but the line is > 90 characters, not sure what to do about that.  Should 
I just ignore the warning?



--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 12
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Mon, 25 Mar 2024 21:57:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 14:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21109/14/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java:

http://gerrit.cloudera.org:8080/#/c/21109/14/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java@25
PS14, Line 25:  * 
https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/validate/SqlConformance.html
line too long (98 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 14
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Mon, 25 Mar 2024 21:59:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-25 Thread Steve Carlin (Code Review)
Hello Aman Sinha, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21109

to look at the new patch set (#13).

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..

IMPALA-12872: Use Calcite for optimization - part 1: simple queries

This is the first commit to use the Calcite library to parse,
analyze, and optimize queries.

The hook for the planner is through an override of the JniFrontend. The
CalciteJniFrontend class is the driver that walks through each of the
Calcite steps which are as follows:

CalciteQueryParser: Takes the string query and outputs an AST in the
form of Calcite's SqlNode object.

CalciteMetadataHandler: Iterate through the SqlNode from the previous step
and make sure all essential table metadata is retrieved from catalogd.

CalciteValidator: Validate the SqlNode tree, akin to the Impala Analyzer.

CalciteRelNodeConverter: Change the AST into a logical plan. In this first
commit, the only logical nodes used are LogicalTableScan and LogicalProject.
The LogicalTableScan will serve as the node that reads from an Hdfs Table and
the LogicalProject will only project out the used columns in the query. In
later versions, the LogicalProject will also handle function changes.

CalciteOptimizer: This step is to optimize the query. In this cut, it will be
a nop, but in later versions, it will perform logical optimizations via
Calcite's rule mechanism.

CalcitePhysPlanCreator: Converts the Calcite RelNode logical tree into
Impala's PlanNode physical tree

ExecRequestCreator: Implement the existing Impala steps that turn a Single
Node Plan into a Distributed Plan. It will also create the TExecRequest object
needed by the runtime server.

Only some very basic queries will work with this commit. These include:
select * from tbl <-- only needs the LogicalTableScan
select c1 from tbl <-- Also uses the LogicalProject

In the CalciteJniFrontend, there is some basic checks to make sure only
select statements will get processed. Any non-query statement will revert
back to the current Impala planner.

In this iteration, any queries besides the minimal ones listed above will
result in a caught exception which will then be run through the current
Impala planner. The tests that do work can be found in calcite.test and
run through the custom cluster test test_experimental_planner.py

Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
---
M bin/set-classpath.sh
M bin/start-impala-cluster.py
M fe/src/main/java/org/apache/impala/analysis/TableName.java
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
A java/calcite-planner/pom.xml
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ConvertToImpalaRelRules.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaPlanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaProjectRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/NodeWithExprs.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ParentPlanRelContext.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/phys/ImpalaHdfsScanNode.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteDb.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteMetadataHandler.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteQueryParser.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteRelNodeConverter.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteValidator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CompilerStep.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeConverter.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java
A 

[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 13:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21109/13/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java:

http://gerrit.cloudera.org:8080/#/c/21109/13/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java@351
PS13, Line 351:*
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/21109/13/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java:

http://gerrit.cloudera.org:8080/#/c/21109/13/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java@25
PS13, Line 25:  * 
https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/validate/SqlConformance.html
line too long (98 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 13
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Mon, 25 Mar 2024 21:54:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-25 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 12:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/21109/12/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java:

http://gerrit.cloudera.org:8080/#/c/21109/12/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java@70
PS12, Line 70: // O BENEVOLENT REVIEWER AND CODE INSPECTOR...
 : // TODO: Please hold off on reviewing this file.  I held off on 
cleaning this up until
 : // this gets past the experimental stage. Some of the code in 
SingleNodePlanner
 : // is duplicated here, so this will involve a general rewrite. 
After more Calcite
 : // code gets committed and the planner works for a good portion 
of the queries, this
 : // will get rewritten into its final form.
Let's rework this comment since we're looking to merge this.

I would describe what this thing does (i.e. it takes a single-node PlanNode 
tree from Calcite and produces a TExecRequest), describe what the corresponding 
code on the regular Impala planner is.

Then, I would say that this needs refactoring as we get farther along and maybe 
give some general idea of where we want to end up.


http://gerrit.cloudera.org:8080/#/c/21109/12/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java@101
PS12, Line 101:* Create an exec request for Impala to execute based on the 
supplied plan.
Can you add a comment that this is similar to Frontend's createExecRequest()?


http://gerrit.cloudera.org:8080/#/c/21109/12/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java@168
PS12, Line 168: Hive CBO
Nit: Since we're using Calcite, let's update locations that reference Hive CBO 
to say Calcite.


http://gerrit.cloudera.org:8080/#/c/21109/12/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java@196
PS12, Line 196:* @param planNodeRoot root node of the Impala physical plan
  :* @param destination path to the target table if the table 
is not null
  :* @param isOverwrite true if it is an INSERT OVERWRITE 
statement
  :* @param writeId write ID of the target table if the table 
is not null
  :* @return list of plan fragments in the order [root 
fragment, child of root ...
  :* leaf fragment]
  :* @throws ImpalaException
  :* @throws HiveException
Update this to match the new signature or remove it


http://gerrit.cloudera.org:8080/#/c/21109/12/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java@335
PS12, Line 335:* Return true if any join in the plan rooted at 'root' was 
inverted.
  :*
  :* TODO: This should be replaced once we conclude the changes 
contained in this method
  :*   are safe to be pushed to Planner.invertJoins, i.e., 
they do not cause any
  :*   performance regressions with Impala FE.
Could you add a sentence or two about what is different? From looking at this, 
it is a slightly different signature (returning bool, taking Analyzer), and 
calling computeStats() in a couple place where we currently don't.


http://gerrit.cloudera.org:8080/#/c/21109/12/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java:

http://gerrit.cloudera.org:8080/#/c/21109/12/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java@25
PS12, Line 25: 
https://javadoc.io/doc/org.apache.calcite/calcite-core/latest/index.html
This might be a better link:
https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/validate/SqlConformance.html



--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 12
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Mon, 25 Mar 2024 20:48:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 12:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15638/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 12
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Fri, 22 Mar 2024 21:21:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-22 Thread Steve Carlin (Code Review)
Hello Aman Sinha, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21109

to look at the new patch set (#12).

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..

IMPALA-12872: Use Calcite for optimization - part 1: simple queries

This is the first commit to use the Calcite library to parse,
analyze, and optimize queries.

The hook for the planner is through an override of the JniFrontend. The
CalciteJniFrontend class is the driver that walks through each of the
Calcite steps which are as follows:

CalciteQueryParser: Takes the string query and outputs an AST in the
form of Calcite's SqlNode object.

CalciteMetadataHandler: Iterate through the SqlNode from the previous step
and make sure all essential table metadata is retrieved from catalogd.

CalciteValidator: Validate the SqlNode tree, akin to the Impala Analyzer.

CalciteRelNodeConverter: Change the AST into a logical plan. In this first
commit, the only logical nodes used are LogicalTableScan and LogicalProject.
The LogicalTableScan will serve as the node that reads from an Hdfs Table and
the LogicalProject will only project out the used columns in the query. In
later versions, the LogicalProject will also handle function changes.

CalciteOptimizer: This step is to optimize the query. In this cut, it will be
a nop, but in later versions, it will perform logical optimizations via
Calcite's rule mechanism.

CalcitePhysPlanCreator: Converts the Calcite RelNode logical tree into
Impala's PlanNode physical tree

ExecRequestCreator: Implement the existing Impala steps that turn a Single
Node Plan into a Distributed Plan. It will also create the TExecRequest object
needed by the runtime server.

Only some very basic queries will work with this commit. These include:
select * from tbl <-- only needs the LogicalTableScan
select c1 from tbl <-- Also uses the LogicalProject

In the CalciteJniFrontend, there is some basic checks to make sure only
select statements will get processed. Any non-query statement will revert
back to the current Impala planner.

In this iteration, any queries besides the minimal ones listed above will
result in a caught exception which will then be run through the current
Impala planner. The tests that do work can be found in calcite.test and
run through the custom cluster test test_experimental_planner.py

Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
---
M bin/set-classpath.sh
M bin/start-impala-cluster.py
M fe/src/main/java/org/apache/impala/analysis/TableName.java
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
A java/calcite-planner/pom.xml
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ConvertToImpalaRelRules.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaPlanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaProjectRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/NodeWithExprs.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ParentPlanRelContext.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/phys/ImpalaHdfsScanNode.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteDb.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteMetadataHandler.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteQueryParser.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteRelNodeConverter.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteValidator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CompilerStep.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeConverter.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java
A 

[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 11:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15637/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 11
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Fri, 22 Mar 2024 20:15:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21109/11/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteMetadataHandler.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteMetadataHandler.java:

http://gerrit.cloudera.org:8080/#/c/21109/11/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteMetadataHandler.java@200
PS11, Line 200:   localTableNames.add(new 
TableName(parts.get(0).toLowerCase(), parts.get(1).toLowerCase()));
line too long (101 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 11
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Fri, 22 Mar 2024 19:46:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-22 Thread Steve Carlin (Code Review)
Steve Carlin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 10:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/21109/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java:

http://gerrit.cloudera.org:8080/#/c/21109/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java@69
PS10, Line 69: nor or partitions
> Nit: "nor are partitions"
Done


http://gerrit.cloudera.org:8080/#/c/21109/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java@97
PS10, Line 97: int totalCols = table.getColumns().size();
> For my own understanding: Is it true that totalCols == numFields?
Yeah, I think so. This *might* be an issue when we deal with Acid tables that 
have virtual columns? But since we're not doing that right now and I haven't 
tested with that yet, it doesn't make sense to have 2 separate variables.

I'm not sure I changed this as you desired. But I did get rid of the extra 
variable


http://gerrit.cloudera.org:8080/#/c/21109/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java:

http://gerrit.cloudera.org:8080/#/c/21109/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java@70
PS10, Line 70: // O BENEVOLENT REVIEWER AND CODE INSPECTOR...
 : // TODO: Please hold off on reviewing this file.  I held off on 
cleaning this up until
 : // this gets past the experimental stage. Some of the code in 
SingleNodePlanner
 : // is duplicated here, so this will involve a general rewrite. 
After more Calcite
 : // code gets committed and the planner works for a good portion 
of the queries, this
 : // will get rewritten into its final form.
> Is this comment still true? Are there rewrites to come for this file?
Sigh, unfortunately, yes, it's still true.

My goal with this commit was to get a first pass Calcite commit in and make as 
few changes to Impala under "fe" as I possibly can.  This allows the code 
review to be a bit simpler.

The code in here is mostly common code with existing Impala code as all this 
happens after we have done the conversion into PlanNode.  So to do this right, 
the code under fe/.../planner/* should be refactored.

I do want to do this later, which is why I left this comment.

I'm open to doing this sooner rather than later if you think though.


http://gerrit.cloudera.org:8080/#/c/21109/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeConverter.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeConverter.java:

http://gerrit.cloudera.org:8080/#/c/21109/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeConverter.java@107
PS10, Line 107:   case DECIMAL:
  : RelDataType decimalDefinedRetType = 
factory.createSqlType(SqlTypeName.DECIMAL,
  : scalarType.decimalPrecision(), 
scalarType.decimalScale());
  : return 
factory.createTypeWithNullability(decimalDefinedRetType, true);
  :   case VARCHAR:
  : return createCharType(factory, SqlTypeName.VARCHAR, 
scalarType.getLength());
  :   case CHAR:
  : return createCharType(factory, SqlTypeName.CHAR, 
scalarType.getLength());
> If I understand this right, we could omit DECIMAL, VARCHAR, and CHAR from t
In this commit, you are correct, so I shall remove it.

I think I need to put this back in future commits, but I'll do that when the 
time comes.


http://gerrit.cloudera.org:8080/#/c/21109/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeConverter.java@169
PS10, Line 169: Charset charSetName = 
Charset.forName(ConversionUtil.NATIVE_UTF16_CHARSET_NAME);
> Does this charset do anything for execution?
Nah, prolly not.  I put this in because I saw this used in another project that 
used Calcite, but I'm gonna delete this unless we see a need for it in the 
future.


http://gerrit.cloudera.org:8080/#/c/21109/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java:

http://gerrit.cloudera.org:8080/#/c/21109/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java@40
PS10, Line 40:   private static final int MAX_BINARY_PRECISION  = 
Integer.MAX_VALUE;
 :   

[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-22 Thread Steve Carlin (Code Review)
Hello Aman Sinha, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21109

to look at the new patch set (#11).

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..

IMPALA-12872: Use Calcite for optimization - part 1: simple queries

This is the first commit to use the Calcite library to parse,
analyze, and optimize queries.

The hook for the planner is through an override of the JniFrontend. The
CalciteJniFrontend class is the driver that walks through each of the
Calcite steps which are as follows:

CalciteQueryParser: Takes the string query and outputs an AST in the
form of Calcite's SqlNode object.

CalciteMetadataHandler: Iterate through the SqlNode from the previous step
and make sure all essential table metadata is retrieved from catalogd.

CalciteValidator: Validate the SqlNode tree, akin to the Impala Analyzer.

CalciteRelNodeConverter: Change the AST into a logical plan. In this first
commit, the only logical nodes used are LogicalTableScan and LogicalProject.
The LogicalTableScan will serve as the node that reads from an Hdfs Table and
the LogicalProject will only project out the used columns in the query. In
later versions, the LogicalProject will also handle function changes.

CalciteOptimizer: This step is to optimize the query. In this cut, it will be
a nop, but in later versions, it will perform logical optimizations via
Calcite's rule mechanism.

CalcitePhysPlanCreator: Converts the Calcite RelNode logical tree into
Impala's PlanNode physical tree

ExecRequestCreator: Implement the existing Impala steps that turn a Single
Node Plan into a Distributed Plan. It will also create the TExecRequest object
needed by the runtime server.

Only some very basic queries will work with this commit. These include:
select * from tbl <-- only needs the LogicalTableScan
select c1 from tbl <-- Also uses the LogicalProject

In the CalciteJniFrontend, there is some basic checks to make sure only
select statements will get processed. Any non-query statement will revert
back to the current Impala planner.

In this iteration, any queries besides the minimal ones listed above will
result in a caught exception which will then be run through the current
Impala planner. The tests that do work can be found in calcite.test and
run through the custom cluster test test_experimental_planner.py

Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
---
M bin/set-classpath.sh
M bin/start-impala-cluster.py
M fe/src/main/java/org/apache/impala/analysis/TableName.java
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
A java/calcite-planner/pom.xml
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ConvertToImpalaRelRules.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaPlanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaProjectRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/NodeWithExprs.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ParentPlanRelContext.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/phys/ImpalaHdfsScanNode.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteDb.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteMetadataHandler.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteQueryParser.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteRelNodeConverter.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteValidator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CompilerStep.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeConverter.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java
A 

[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-21 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 10:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/21109/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java:

http://gerrit.cloudera.org:8080/#/c/21109/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java@69
PS10, Line 69: nor or partitions
Nit: "nor are partitions"


http://gerrit.cloudera.org:8080/#/c/21109/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java@97
PS10, Line 97: int totalCols = table.getColumns().size();
For my own understanding: Is it true that totalCols == numFields?

It seems like it would be, because we modulus by totalCols into an array of 
size numFields.

If that is true, then I would size the ArrayList and do the modulus with the 
same variable (maybe totalCols) and have a precondition that totalCols == 
getRowType().getFieldNames().size().


http://gerrit.cloudera.org:8080/#/c/21109/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java:

http://gerrit.cloudera.org:8080/#/c/21109/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java@70
PS10, Line 70: // O BENEVOLENT REVIEWER AND CODE INSPECTOR...
 : // TODO: Please hold off on reviewing this file.  I held off on 
cleaning this up until
 : // this gets past the experimental stage. Some of the code in 
SingleNodePlanner
 : // is duplicated here, so this will involve a general rewrite. 
After more Calcite
 : // code gets committed and the planner works for a good portion 
of the queries, this
 : // will get rewritten into its final form.
Is this comment still true? Are there rewrites to come for this file?


http://gerrit.cloudera.org:8080/#/c/21109/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeConverter.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeConverter.java:

http://gerrit.cloudera.org:8080/#/c/21109/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeConverter.java@107
PS10, Line 107:   case DECIMAL:
  : RelDataType decimalDefinedRetType = 
factory.createSqlType(SqlTypeName.DECIMAL,
  : scalarType.decimalPrecision(), 
scalarType.decimalScale());
  : return 
factory.createTypeWithNullability(decimalDefinedRetType, true);
  :   case VARCHAR:
  : return createCharType(factory, SqlTypeName.VARCHAR, 
scalarType.getLength());
  :   case CHAR:
  : return createCharType(factory, SqlTypeName.CHAR, 
scalarType.getLength());
If I understand this right, we could omit DECIMAL, VARCHAR, and CHAR from the 
impalaToCalciteMap? Is that right or do we use it for something else?


http://gerrit.cloudera.org:8080/#/c/21109/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeConverter.java@169
PS10, Line 169: Charset charSetName = 
Charset.forName(ConversionUtil.NATIVE_UTF16_CHARSET_NAME);
Does this charset do anything for execution?


http://gerrit.cloudera.org:8080/#/c/21109/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java:

http://gerrit.cloudera.org:8080/#/c/21109/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java@40
PS10, Line 40:   private static final int MAX_BINARY_PRECISION  = 
Integer.MAX_VALUE;
 :   private static final int MAX_TIMESTAMP_PRECISION   = 15;
 :   private static final int 
MAX_TIMESTAMP_WITH_LOCAL_TIME_ZONE_PRECISION = 15; // nanos
 :   private static final int DEFAULT_TINYINT_PRECISION  = 3;
 :   private static final int DEFAULT_SMALLINT_PRECISION = 5;
 :   private static final int DEFAULT_INTEGER_PRECISION  = 10;
 :   private static final int DEFAULT_BIGINT_PRECISION   = 19;
 :   private static final int DEFAULT_FLOAT_PRECISION= 7;
 :   private static final int DEFAULT_DOUBLE_PRECISION   = 15;
Could you include a comment that provides a quick definition of precision and 
how Calcite uses this?


http://gerrit.cloudera.org:8080/#/c/21109/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/validate/ImpalaConformance.java
File 

[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-20 Thread Steve Carlin (Code Review)
Steve Carlin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21109/5/fe/pom.xml
File fe/pom.xml:

http://gerrit.cloudera.org:8080/#/c/21109/5/fe/pom.xml@616
PS5, Line 616: 
 :   org.apache.calcite
 :   calcite-core
 :   1.36.0
 : 
 : 
 :   org.apache.calcite.avatica
 :   avatica-core
 :   1.23.0
 : 
> Yeah.  This was an interesting dilemma for me
Ok, I think I made the changes that make this better.

The dependencies are now in the java/calcite-planner/target directory and the 
path is set up via set-classpath through an environment variable.  The 
CLASSPATH is set up similar to how fe sets up the CLASSPATH.


The custom cluster unit test is now changed too.



--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 5
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Thu, 21 Mar 2024 00:31:55 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 10:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15593/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 10
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Thu, 21 Mar 2024 00:13:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-20 Thread Steve Carlin (Code Review)
Hello Aman Sinha, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21109

to look at the new patch set (#10).

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..

IMPALA-12872: Use Calcite for optimization - part 1: simple queries

This is the first commit to use the Calcite library to parse,
analyze, and optimize queries.

The hook for the planner is through an override of the JniFrontend. The
CalciteJniFrontend class is the driver that walks through each of the
Calcite steps which are as follows:

CalciteQueryParser: Takes the string query and outputs an AST in the
form of Calcite's SqlNode object.

CalciteMetadataHandler: Iterate through the SqlNode from the previous step
and make sure all essential table metadata is retrieved from catalogd.

CalciteValidator: Validate the SqlNode tree, akin to the Impala Analyzer.

CalciteRelNodeConverter: Change the AST into a logical plan. In this first
commit, the only logical nodes used are LogicalTableScan and LogicalProject.
The LogicalTableScan will serve as the node that reads from an Hdfs Table and
the LogicalProject will only project out the used columns in the query. In
later versions, the LogicalProject will also handle function changes.

CalciteOptimizer: This step is to optimize the query. In this cut, it will be
a nop, but in later versions, it will perform logical optimizations via
Calcite's rule mechanism.

CalcitePhysPlanCreator: Converts the Calcite RelNode logical tree into
Impala's PlanNode physical tree

ExecRequestCreator: Implement the existing Impala steps that turn a Single
Node Plan into a Distributed Plan. It will also create the TExecRequest object
needed by the runtime server.

Only some very basic queries will work with this commit. These include:
select * from tbl <-- only needs the LogicalTableScan
select c1 from tbl <-- Also uses the LogicalProject

In the CalciteJniFrontend, there is some basic checks to make sure only
select statements will get processed. Any non-query statement will revert
back to the current Impala planner.

In this iteration, any queries besides the minimal ones listed above will
result in a caught exception which will then be run through the current
Impala planner. The tests that do work can be found in calcite.test and
run through the custom cluster test test_experimental_planner.py

Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
---
M bin/set-classpath.sh
M bin/start-impala-cluster.py
M fe/src/main/java/org/apache/impala/analysis/TableName.java
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
A java/calcite-planner/pom.xml
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ConvertToImpalaRelRules.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaPlanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaProjectRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/NodeWithExprs.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ParentPlanRelContext.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/phys/ImpalaHdfsScanNode.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteDb.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteMetadataHandler.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteQueryParser.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteRelNodeConverter.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteValidator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CompilerStep.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeConverter.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java
A 

[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15592/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 9
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Wed, 20 Mar 2024 23:03:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15591/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 8
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Wed, 20 Mar 2024 22:59:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-20 Thread Steve Carlin (Code Review)
Hello Aman Sinha, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21109

to look at the new patch set (#9).

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..

IMPALA-12872: Use Calcite for optimization - part 1: simple queries

This is the first commit to use the Calcite library to parse,
analyze, and optimize queries.

The hook for the planner is through an override of the JniFrontend. The
CalciteJniFrontend class is the driver that walks through each of the
Calcite steps which are as follows:

CalciteQueryParser: Takes the string query and outputs an AST in the
form of Calcite's SqlNode object.

CalciteMetadataHandler: Iterate through the SqlNode from the previous step
and make sure all essential table metadata is retrieved from catalogd.

CalciteValidator: Validate the SqlNode tree, akin to the Impala Analyzer.

CalciteRelNodeConverter: Change the AST into a logical plan. In this first
commit, the only logical nodes used are LogicalTableScan and LogicalProject.
The LogicalTableScan will serve as the node that reads from an Hdfs Table and
the LogicalProject will only project out the used columns in the query. In
later versions, the LogicalProject will also handle function changes.

CalciteOptimizer: This step is to optimize the query. In this cut, it will be
a nop, but in later versions, it will perform logical optimizations via
Calcite's rule mechanism.

CalcitePhysPlanCreator: Converts the Calcite RelNode logical tree into
Impala's PlanNode physical tree

ExecRequestCreator: Implement the existing Impala steps that turn a Single
Node Plan into a Distributed Plan. It will also create the TExecRequest object
needed by the runtime server.

Only some very basic queries will work with this commit. These include:
select * from tbl <-- only needs the LogicalTableScan
select c1 from tbl <-- Also uses the LogicalProject

In the CalciteJniFrontend, there is some basic checks to make sure only
select statements will get processed. Any non-query statement will revert
back to the current Impala planner.

In this iteration, any queries besides the minimal ones listed above will
result in a caught exception which will then be run through the current
Impala planner. The tests that do work can be found in calcite.test and
run through the custom cluster test test_experimental_planner.py

Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
---
M bin/impala-config.sh
M bin/set-classpath.sh
M bin/start-impala-cluster.py
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/TableName.java
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
A java/calcite-planner/pom.xml
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ConvertToImpalaRelRules.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaPlanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaProjectRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/NodeWithExprs.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ParentPlanRelContext.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/phys/ImpalaHdfsScanNode.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteDb.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteMetadataHandler.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteQueryParser.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteRelNodeConverter.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteValidator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CompilerStep.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeConverter.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java
A 

[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-20 Thread Steve Carlin (Code Review)
Hello Aman Sinha, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21109

to look at the new patch set (#8).

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..

IMPALA-12872: Use Calcite for optimization - part 1: simple queries

This is the first commit to use the Calcite library to parse,
analyze, and optimize queries.

The hook for the planner is through an override of the JniFrontend. The
CalciteJniFrontend class is the driver that walks through each of the
Calcite steps which are as follows:

CalciteQueryParser: Takes the string query and outputs an AST in the
form of Calcite's SqlNode object.

CalciteMetadataHandler: Iterate through the SqlNode from the previous step
and make sure all essential table metadata is retrieved from catalogd.

CalciteValidator: Validate the SqlNode tree, akin to the Impala Analyzer.

CalciteRelNodeConverter: Change the AST into a logical plan. In this first
commit, the only logical nodes used are LogicalTableScan and LogicalProject.
The LogicalTableScan will serve as the node that reads from an Hdfs Table and
the LogicalProject will only project out the used columns in the query. In
later versions, the LogicalProject will also handle function changes.

CalciteOptimizer: This step is to optimize the query. In this cut, it will be
a nop, but in later versions, it will perform logical optimizations via
Calcite's rule mechanism.

CalcitePhysPlanCreator: Converts the Calcite RelNode logical tree into
Impala's PlanNode physical tree

ExecRequestCreator: Implement the existing Impala steps that turn a Single
Node Plan into a Distributed Plan. It will also create the TExecRequest object
needed by the runtime server.

Only some very basic queries will work with this commit. These include:
select * from tbl <-- only needs the LogicalTableScan
select c1 from tbl <-- Also uses the LogicalProject

In the CalciteJniFrontend, there is some basic checks to make sure only
select statements will get processed. Any non-query statement will revert
back to the current Impala planner.

In this iteration, any queries besides the minimal ones listed above will
result in a caught exception which will then be run through the current
Impala planner. The tests that do work can be found in calcite.test and
run through the custom cluster test test_experimental_planner.py

Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
---
M bin/impala-config.sh
M bin/set-classpath.sh
M bin/start-impala-cluster.py
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/TableName.java
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
A java/calcite-planner/pom.xml
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ConvertToImpalaRelRules.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaPlanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaProjectRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/NodeWithExprs.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ParentPlanRelContext.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/phys/ImpalaHdfsScanNode.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteDb.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteMetadataHandler.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteQueryParser.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteRelNodeConverter.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteValidator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CompilerStep.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeConverter.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java
A 

[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21109/8/bin/start-impala-cluster.py
File bin/start-impala-cluster.py:

http://gerrit.cloudera.org:8080/#/c/21109/8/bin/start-impala-cluster.py@629
PS8, Line 629: =
flake8: E225 missing whitespace around operator


http://gerrit.cloudera.org:8080/#/c/21109/8/tests/custom_cluster/test_calcite_planner.py
File tests/custom_cluster/test_calcite_planner.py:

http://gerrit.cloudera.org:8080/#/c/21109/8/tests/custom_cluster/test_calcite_planner.py@21
PS8, Line 21: import os
flake8: F401 'os' imported but unused



--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 8
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Wed, 20 Mar 2024 22:34:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15589/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 7
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Wed, 20 Mar 2024 20:13:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15588/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 6
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Wed, 20 Mar 2024 20:13:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-20 Thread Steve Carlin (Code Review)
Steve Carlin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 5:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/21109/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21109/5//COMMIT_MSG@7
PS5, Line 7: IMPALA-12872: Use Calcite for ...
> nit: combine the two lines. it seems short enough.  How about 'Use Calcite
Done


http://gerrit.cloudera.org:8080/#/c/21109/5/java/experimental-planner/src/main/java/org/apache/impala/calcite/rel/node/ParentPlanRelContext.java
File 
java/experimental-planner/src/main/java/org/apache/impala/calcite/rel/node/ParentPlanRelContext.java:

http://gerrit.cloudera.org:8080/#/c/21109/5/java/experimental-planner/src/main/java/org/apache/impala/calcite/rel/node/ParentPlanRelContext.java@33
PS5, Line 33: analyzer
> nit:use underscore to be consistent
Removed this since it is not needed.  The analyzer is in the PlannerContext


http://gerrit.cloudera.org:8080/#/c/21109/5/java/experimental-planner/src/main/java/org/apache/impala/calcite/rel/node/ParentPlanRelContext.java@36
PS5, Line 36: ctx
> nit: use underscore to be consistent with other variable names.
Done


http://gerrit.cloudera.org:8080/#/c/21109/5/java/experimental-planner/src/main/java/org/apache/impala/calcite/rel/node/ParentPlanRelContext.java@39
PS5, Line 39:   public final Class parentClass_;
> We should use the actual base class name (RelNode/ImpalaPlanRel) instead of
Removed this since it is not needed in this cut


http://gerrit.cloudera.org:8080/#/c/21109/5/java/experimental-planner/src/main/java/org/apache/impala/calcite/schema/CalciteDb.java
File 
java/experimental-planner/src/main/java/org/apache/impala/calcite/schema/CalciteDb.java:

http://gerrit.cloudera.org:8080/#/c/21109/5/java/experimental-planner/src/main/java/org/apache/impala/calcite/schema/CalciteDb.java@31
PS5, Line 31: tableMap
> nit: use underscore
Done


http://gerrit.cloudera.org:8080/#/c/21109/5/java/experimental-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java
File 
java/experimental-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java:

http://gerrit.cloudera.org:8080/#/c/21109/5/java/experimental-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java@78
PS5, Line 78:   private List foreignKeys;
> This field is not used in this implementation.  Is it for future use ?
Removed it since not used in this cut


http://gerrit.cloudera.org:8080/#/c/21109/5/java/experimental-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java
File 
java/experimental-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java:

http://gerrit.cloudera.org:8080/#/c/21109/5/java/experimental-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java@87
PS5, Line 87: LOG.debug(getDebugString(resultObject));
> nit: should this logging be in an else block ?
Good catch

I put a return in the if block.  Also did this in other Calcite* files


http://gerrit.cloudera.org:8080/#/c/21109/5/java/experimental-planner/src/main/java/org/apache/impala/calcite/util/NotSupported.java
File 
java/experimental-planner/src/main/java/org/apache/impala/calcite/util/NotSupported.java:

http://gerrit.cloudera.org:8080/#/c/21109/5/java/experimental-planner/src/main/java/org/apache/impala/calcite/util/NotSupported.java@33
PS5, Line 33:
> nit: missing 'are' .  Here and subsequent messages.
Done



--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 5
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Wed, 20 Mar 2024 19:50:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-20 Thread Steve Carlin (Code Review)
Hello Aman Sinha, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21109

to look at the new patch set (#7).

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..

IMPALA-12872: Use Calcite for optimization - part 1: simple queries

This is the first commit to use the Calcite library to parse,
analyze, and optimize queries.

The hook for the planner is through an override of the JniFrontend. The
CalciteJniFrontend class is the driver that walks through each of the
Calcite steps which are as follows:

CalciteQueryParser: Takes the string query and outputs an AST in the
form of Calcite's SqlNode object.

CalciteMetadataHandler: Iterate through the SqlNode from the previous step
and make sure all essential table metadata is retrieved from catalogd.

CalciteValidator: Validate the SqlNode tree, akin to the Impala Analyzer.

CalciteRelNodeConverter: Change the AST into a logical plan. In this first
commit, the only logical nodes used are LogicalTableScan and LogicalProject.
The LogicalTableScan will serve as the node that reads from an Hdfs Table and
the LogicalProject will only project out the used columns in the query. In
later versions, the LogicalProject will also handle function changes.

CalciteOptimizer: This step is to optimize the query. In this cut, it will be
a nop, but in later versions, it will perform logical optimizations via
Calcite's rule mechanism.

CalcitePhysPlanCreator: Converts the Calcite RelNode logical tree into
Impala's PlanNode physical tree

ExecRequestCreator: Implement the existing Impala steps that turn a Single
Node Plan into a Distributed Plan. It will also create the TExecRequest object
needed by the runtime server.

Only some very basic queries will work with this commit. These include:
select * from tbl <-- only needs the LogicalTableScan
select c1 from tbl <-- Also uses the LogicalProject

In the CalciteJniFrontend, there is some basic checks to make sure only
select statements will get processed. Any non-query statement will revert
back to the current Impala planner.

In this iteration, any queries besides the minimal ones listed above will
result in a caught exception which will then be run through the current
Impala planner. The tests that do work can be found in calcite.test and
run through the custom cluster test test_experimental_planner.py

Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
---
M bin/impala-config.sh
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/TableName.java
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
A java/calcite-planner/pom.xml
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ConvertToImpalaRelRules.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaPlanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaProjectRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/NodeWithExprs.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ParentPlanRelContext.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/phys/ImpalaHdfsScanNode.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteDb.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteMetadataHandler.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteQueryParser.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteRelNodeConverter.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteValidator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CompilerStep.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeConverter.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java
A 

[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-20 Thread Steve Carlin (Code Review)
Hello Aman Sinha, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21109

to look at the new patch set (#6).

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..

IMPALA-12872: Use Calcite for optimization - part 1: simple queries

This is the first commit to use the Calcite library to parse,
analyze, and optimize queries.

The hook for the planner is through an override of the JniFrontend. The
CalciteJniFrontend class is the driver that walks through each of the
Calcite steps which are as follows:

CalciteQueryParser: Takes the string query and outputs an AST in the
form of Calcite's SqlNode object.

CalciteMetadataHandler: Iterate through the SqlNode from the previous step
and make sure all essential table metadata is retrieved from catalogd.

CalciteValidator: Validate the SqlNode tree, akin to the Impala Analyzer.

CalciteRelNodeConverter: Change the AST into a logical plan. In this first
commit, the only logical nodes used are LogicalTableScan and LogicalProject.
The LogicalTableScan will serve as the node that reads from an Hdfs Table and
the LogicalProject will only project out the used columns in the query. In
later versions, the LogicalProject will also handle function changes.

CalciteOptimizer: This step is to optimize the query. In this cut, it will be
a nop, but in later versions, it will perform logical optimizations via
Calcite's rule mechanism.

CalcitePhysPlanCreator: Converts the Calcite RelNode logical tree into
Impala's PlanNode physical tree

ExecRequestCreator: Implement the existing Impala steps that turn a Single
Node Plan into a Distributed Plan. It will also create the TExecRequest object
needed by the runtime server.

Only some very basic queries will work with this commit. These include:
select * from tbl <-- only needs the LogicalTableScan
select c1 from tbl <-- Also uses the LogicalProject

In the CalciteJniFrontend, there is some basic checks to make sure only
select statements will get processed. Any non-query statement will revert
back to the current Impala planner.

In this iteration, any queries besides the minimal ones listed above will
result in a caught exception which will then be run through the current
Impala planner. The tests that do work can be found in calcite.test and
run through the custom cluster test test_experimental_planner.py

Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
---
M bin/impala-config.sh
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/TableName.java
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
A java/calcite-planner/pom.xml
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ConvertToImpalaRelRules.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaHdfsScanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaPlanRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaProjectRel.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/NodeWithExprs.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ParentPlanRelContext.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/phys/ImpalaHdfsScanNode.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteDb.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteMetadataHandler.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalcitePhysPlanCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteQueryParser.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteRelNodeConverter.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteValidator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CompilerStep.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/ExecRequestCreator.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeConverter.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/type/ImpalaTypeSystemImpl.java
A 

[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-03-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21109/6/java/calcite-planner/src/main/java/org/apache/impala/calcite/util/NotSupported.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/util/NotSupported.java:

http://gerrit.cloudera.org:8080/#/c/21109/6/java/calcite-planner/src/main/java/org/apache/impala/calcite/util/NotSupported.java@39
PS6, Line 39:   public static String APPX_COUNT_DISTINCT = "Approximate count 
distinct is not supported.";
line too long (92 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/21109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Gerrit-Change-Number: 21109
Gerrit-PatchSet: 6
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Steve Carlin 
Gerrit-Comment-Date: Wed, 20 Mar 2024 19:47:48 +
Gerrit-HasComments: Yes