DaZuiZui commented on issue #17797:
URL: https://github.com/apache/iotdb/issues/17797#issuecomment-4668344244
I would like to propose the implementation plan for Part 2: supporting
Lateral Column Alias (LCA) in the table-model `SELECT` list.
LCA allows a later `SELECT` item to reference an explicit alias defined by
an earlier `SELECT` item:
```sql
SELECT s1 AS x, x + 1 AS y
FROM table1;
```
This should be analyzed as:
```sql
SELECT s1 AS x, s1 + 1 AS y
FROM table1;
```
No new keywords or grammar changes are required. The existing `SELECT`
syntax can be reused, and LCA can be resolved during analysis by rewriting
expressions before type analysis, aggregation analysis, output scope
computation, and planning.
### Proposed semantics
LCA is resolved from left to right inside the `SELECT` list.
```sql
SELECT s1 AS x, x + 1 AS y, y * 2 AS z
FROM table1;
```
is equivalent to:
```sql
SELECT s1 AS x, s1 + 1 AS y, (s1 + 1) * 2 AS z
FROM table1;
```
Forward references are not supported, and the alias of the current `SELECT`
item is not visible to its own expression.
Only unqualified identifiers are considered alias references. Qualified
expressions such as `t.x`, `table1.x`, or `x.y` should continue to use the
existing qualified column resolution rules.
The recommended resolution priority is:
```text
local source column > visible previous aliases > existing analyzer resolution
```
Duplicate aliases should not overwrite each other. If multiple previous
aliases have the same canonical name and there is no same-name local source
column, the analyzer should report an ambiguity error.
### Non-goals
This change should not alter `WHERE` or `HAVING` semantics:
```sql
SELECT s1 AS x FROM table1 WHERE x > 1;
SELECT avg(s1) AS a FROM table1 HAVING a > 1;
```
These should still resolve `x` or `a` only through the input scope, not
through `SELECT` aliases.
LCA should also not enter subquery scopes:
```sql
SELECT s1 AS x, (SELECT x FROM table1) AS y
FROM table1;
```
The `x` inside the subquery should not be rewritten using the outer `SELECT`
alias.
### Analyzer changes
The entry point should be `StatementAnalyzer.analyzeSelect`.
The `SELECT` list can be processed left to right. For each
non-`COLUMNS(...)` `SingleColumn`:
1. Rewrite the original expression using visible previous aliases.
2. Register window metadata for any newly copied window functions.
3. Analyze the rewritten expression.
4. Record the rewritten expression in `SelectAnalysis` output expressions.
5. Record `SingleColumn -> rewritten expression` mapping.
6. Add the current explicit alias to visible aliases only after its
expression is rewritten and analyzed.
`SelectAlias` can be extended to keep:
```text
canonicalName
position
rewrittenExpression
```
`SelectAnalysis` should keep semantic output expressions per `SingleColumn`,
for example:
```text
NodeRef<SingleColumn> -> List<Expression>
```
The original AST should remain unchanged because `SingleColumn.expression`
is final.
### Expression rewriting
LCA rewriting should replace only unqualified `Identifier`s.
When an identifier is encountered:
1. If it resolves to a local input column, keep it unchanged.
2. Otherwise, look up visible previous aliases.
3. If exactly one alias matches, replace the identifier with a deep copy of
that alias's rewritten expression.
4. If multiple aliases match, report alias ambiguity.
5. If no alias matches, keep the identifier unchanged and let existing
analysis handle it.
The rewriter should not traverse into `SubqueryExpression`.
It should also handle `DereferenceExpression` carefully so that expressions
such as `t.x`, `table1.x`, and `x.y` are not rewritten as alias references.
Each alias replacement must create a deep copy of the defining expression.
Reusing the same expression instance is unsafe because IoTDB uses `NodeRef` as
identity-based keys in analysis maps.
### AllColumns and COLUMNS(...)
`AllColumns` should not register aliases for LCA.
A `SingleColumn` containing `COLUMNS(...)` should be expanded and analyzed
as today, but its alias should not be registered as a reusable LCA alias
because it may expand to multiple output columns.
```sql
SELECT COLUMNS('s.*') AS x, x + 1 AS y
FROM table1;
```
Here, `x` should not be treated as a unique alias definition from
`COLUMNS(...)`.
### Aggregation and window functions
LCA may reference previous aggregate expressions:
```sql
SELECT avg(s1) AS a, a + 1 AS b
FROM table1;
```
This should be rewritten to:
```sql
SELECT avg(s1) AS a, avg(s1) + 1 AS b
FROM table1;
```
Existing aggregation validity checks should still apply after rewriting.
For window functions, supporting alias references is preferred:
```sql
SELECT row_number() OVER (ORDER BY s1) AS rn, rn + 1 AS rn2
FROM table1;
```
Since LCA deep copy creates new `FunctionCall` nodes, any copied window
functions need their resolved window metadata registered again in `Analysis`.
If this is not supported in the first implementation, such aliases should be
rejected explicitly with a clear error instead of failing silently.
### GROUP BY and ORDER BY compatibility
This should remain compatible with the existing `GROUP BY` / `ORDER BY`
select-alias reuse behavior.
```sql
SELECT date_bin(1h, time) AS hour_time, avg(s1)
FROM table1
GROUP BY hour_time
ORDER BY hour_time;
```
`GROUP BY <alias>` should continue to resolve to the corresponding select
expression, while `ORDER BY <alias>` can continue to resolve to the output
field reference.
Because `SELECT` output expressions will contain the LCA-rewritten
expressions, the existing `GROUP BY` / `ORDER BY` alias reuse logic should
compose naturally with this change.
For example:
```sql
SELECT s1 AS x, x + 1 AS y, count(*)
FROM table1
GROUP BY x, y;
```
`GROUP BY x` should resolve to:
```sql
s1
```
and `GROUP BY y` should resolve to:
```sql
s1 + 1
```
### Suggested tests
I plan to add or update tests in `SelectAliasReuseTest` for:
```sql
SELECT s1 AS x, x + 1 AS y FROM table1;
SELECT s1 AS x, x + 1 AS y, y * 2 AS z FROM table1;
SELECT s1 AS x, x + x AS y FROM table1;
SELECT y + 1 AS x, s1 AS y FROM table1;
SELECT x + 1 AS x FROM table1;
SELECT s1 AS x, x + 1 AS y FROM table_with_x;
SELECT s1 AS x, s2 AS x, x + 1 AS y FROM table1;
SELECT s1 AS x, s2 AS x, x + 1 AS y FROM table_with_x;
SELECT s1 AS x, table1.x + 1 AS y FROM table1;
SELECT s1 AS x, (SELECT x FROM table1) AS y FROM table1;
SELECT avg(s1) AS a, a + 1 AS b FROM table1;
SELECT s1 AS x, avg(s2) + x AS y FROM table1;
SELECT avg(s1) AS a FROM table1 HAVING a > 1;
SELECT s1 AS x FROM table1 WHERE x > 1;
SELECT COLUMNS('s.*') AS x, x + 1 AS y FROM table1;
SELECT row_number() OVER (ORDER BY s1) AS rn, rn + 1 AS rn2 FROM table1;
SELECT s1 AS x, x FROM table1;
```
Please let me know whether this direction looks reasonable, especially the
resolution priority, duplicate alias handling, and the preferred behavior for
window function aliases.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]