[Impala-ASF-CR] IMPALA-13221: Calcite: Enable tpcds and tpch queries

Steve Carlin (Code Review) Fri, 11 Oct 2024 14:45:08 -0700

Steve Carlin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21575 )

Change subject: IMPALA-13221: Calcite: Enable tpcds and tpch queries
......................................................................

Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21575/12/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/ImpalaBaseTableRef.java
File
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/ImpalaBaseTableRef.java:

http://gerrit.cloudera.org:8080/#/c/21575/12/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/ImpalaBaseTableRef.java@33
PS12, Line 33: public ImpalaBaseTableRef(TableRef tableRef, Path resolvedPath,
: SimplifiedAnalyzer basicAnalyzer) throws ImpalaException {
: super(tableRef, resolvedPath);
: // Impala's table uniqueAlias is within the scope of each
Analyzer.
: // Since Impala uses a separate Analyzer instance for each
query block
: // it can maintain the uniqueness. However, since the
Calcite planner
: // uses a single SimplifiedAnalyzer for entire query and
there are no
: // longer separate query blocks (they have already been
unnested), it needs
: // to make the alias globally unique.
: Preconditions.checkState(aliases_.length > 0);
: aliases_[0] =
basicAnalyzer.getUniqueTableAlias(getUniqueAlias());
: }
> Ah, ok. I think I know how to fix this. Will hopefully post shortly.
Sigh, maybe I don't understand Calcite well enough, but this might be hard to
track. Here's how it works in our implementation:

When the CalciteValidator runs to validate the tables, it already needs the
whole "Schema" populated. This step is done before the validator. A scan is
run through the SqlNode tree. When it hits an SqlNode that should be a table,
it calls into Impala and loads the FeFsTable from catalogd. It also populates
the Calcite Schema map, a String to a CalciteTable.

At this point, we can have access to the alias. My original thought was that
we can use the alias for the key in the table map. But that's not really gonna
work. If there are subqueries and the subqueries use the same alias on 2
different tables, we'll have problems. And I"m not even sure if it would
validate if the key was an alias anyway.

So if there are 2 references to the table, I can't see any way right now to get
the alias information into the relevant LogicalTableScan.

I suppose we can try to fix this upstream on Calcite, but we'll prolly just
need to file a Jira for now.

--
To view, visit http://gerrit.cloudera.org:8080/21575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3107d336ac07ecd89530b640165798ec6a574f41
Gerrit-Change-Number: 21575
Gerrit-PatchSet: 15
Gerrit-Owner: Steve Carlin <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Anonymous Coward (816)
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Steve Carlin <[email protected]>
Gerrit-Comment-Date: Fri, 11 Oct 2024 21:43:40 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-13221: Calcite: Enable tpcds and tpch queries

Reply via email to