DaZuiZui commented on issue #17797:
URL: https://github.com/apache/iotdb/issues/17797#issuecomment-4668344244

   I would like to propose the implementation plan for Part 2: supporting 
Lateral Column Alias (LCA) in the table-model `SELECT` list.
   
   LCA allows a later `SELECT` item to reference an explicit alias defined by 
an earlier `SELECT` item:
   
   ```sql
   SELECT s1 AS x, x + 1 AS y
   FROM table1;
   ```
   
   This should be analyzed as:
   
   ```sql
   SELECT s1 AS x, s1 + 1 AS y
   FROM table1;
   ```
   
   No new keywords or grammar changes are required. The existing `SELECT` 
syntax can be reused, and LCA can be resolved during analysis by rewriting 
expressions before type analysis, aggregation analysis, output scope 
computation, and planning.
   
   ### Proposed semantics
   
   LCA is resolved from left to right inside the `SELECT` list.
   
   ```sql
   SELECT s1 AS x, x + 1 AS y, y * 2 AS z
   FROM table1;
   ```
   
   is equivalent to:
   
   ```sql
   SELECT s1 AS x, s1 + 1 AS y, (s1 + 1) * 2 AS z
   FROM table1;
   ```
   
   Forward references are not supported, and the alias of the current `SELECT` 
item is not visible to its own expression.
   
   Only unqualified identifiers are considered alias references. Qualified 
expressions such as `t.x`, `table1.x`, or `x.y` should continue to use the 
existing qualified column resolution rules.
   
   The recommended resolution priority is:
   
   ```text
   local source column > visible previous aliases > existing analyzer resolution
   ```
   
   Duplicate aliases should not overwrite each other. If multiple previous 
aliases have the same canonical name and there is no same-name local source 
column, the analyzer should report an ambiguity error.
   
   ### Non-goals
   
   This change should not alter `WHERE` or `HAVING` semantics:
   
   ```sql
   SELECT s1 AS x FROM table1 WHERE x > 1;
   SELECT avg(s1) AS a FROM table1 HAVING a > 1;
   ```
   
   These should still resolve `x` or `a` only through the input scope, not 
through `SELECT` aliases.
   
   LCA should also not enter subquery scopes:
   
   ```sql
   SELECT s1 AS x, (SELECT x FROM table1) AS y
   FROM table1;
   ```
   
   The `x` inside the subquery should not be rewritten using the outer `SELECT` 
alias.
   
   ### Analyzer changes
   
   The entry point should be `StatementAnalyzer.analyzeSelect`.
   
   The `SELECT` list can be processed left to right. For each 
non-`COLUMNS(...)` `SingleColumn`:
   
   1. Rewrite the original expression using visible previous aliases.
   2. Register window metadata for any newly copied window functions.
   3. Analyze the rewritten expression.
   4. Record the rewritten expression in `SelectAnalysis` output expressions.
   5. Record `SingleColumn -> rewritten expression` mapping.
   6. Add the current explicit alias to visible aliases only after its 
expression is rewritten and analyzed.
   
   `SelectAlias` can be extended to keep:
   
   ```text
   canonicalName
   position
   rewrittenExpression
   ```
   
   `SelectAnalysis` should keep semantic output expressions per `SingleColumn`, 
for example:
   
   ```text
   NodeRef<SingleColumn> -> List<Expression>
   ```
   
   The original AST should remain unchanged because `SingleColumn.expression` 
is final.
   
   ### Expression rewriting
   
   LCA rewriting should replace only unqualified `Identifier`s.
   
   When an identifier is encountered:
   
   1. If it resolves to a local input column, keep it unchanged.
   2. Otherwise, look up visible previous aliases.
   3. If exactly one alias matches, replace the identifier with a deep copy of 
that alias's rewritten expression.
   4. If multiple aliases match, report alias ambiguity.
   5. If no alias matches, keep the identifier unchanged and let existing 
analysis handle it.
   
   The rewriter should not traverse into `SubqueryExpression`.
   
   It should also handle `DereferenceExpression` carefully so that expressions 
such as `t.x`, `table1.x`, and `x.y` are not rewritten as alias references.
   
   Each alias replacement must create a deep copy of the defining expression. 
Reusing the same expression instance is unsafe because IoTDB uses `NodeRef` as 
identity-based keys in analysis maps.
   
   ### AllColumns and COLUMNS(...)
   
   `AllColumns` should not register aliases for LCA.
   
   A `SingleColumn` containing `COLUMNS(...)` should be expanded and analyzed 
as today, but its alias should not be registered as a reusable LCA alias 
because it may expand to multiple output columns.
   
   ```sql
   SELECT COLUMNS('s.*') AS x, x + 1 AS y
   FROM table1;
   ```
   
   Here, `x` should not be treated as a unique alias definition from 
`COLUMNS(...)`.
   
   ### Aggregation and window functions
   
   LCA may reference previous aggregate expressions:
   
   ```sql
   SELECT avg(s1) AS a, a + 1 AS b
   FROM table1;
   ```
   
   This should be rewritten to:
   
   ```sql
   SELECT avg(s1) AS a, avg(s1) + 1 AS b
   FROM table1;
   ```
   
   Existing aggregation validity checks should still apply after rewriting.
   
   For window functions, supporting alias references is preferred:
   
   ```sql
   SELECT row_number() OVER (ORDER BY s1) AS rn, rn + 1 AS rn2
   FROM table1;
   ```
   
   Since LCA deep copy creates new `FunctionCall` nodes, any copied window 
functions need their resolved window metadata registered again in `Analysis`. 
If this is not supported in the first implementation, such aliases should be 
rejected explicitly with a clear error instead of failing silently.
   
   ### GROUP BY and ORDER BY compatibility
   
   This should remain compatible with the existing `GROUP BY` / `ORDER BY` 
select-alias reuse behavior.
   
   ```sql
   SELECT date_bin(1h, time) AS hour_time, avg(s1)
   FROM table1
   GROUP BY hour_time
   ORDER BY hour_time;
   ```
   
   `GROUP BY <alias>` should continue to resolve to the corresponding select 
expression, while `ORDER BY <alias>` can continue to resolve to the output 
field reference.
   
   Because `SELECT` output expressions will contain the LCA-rewritten 
expressions, the existing `GROUP BY` / `ORDER BY` alias reuse logic should 
compose naturally with this change.
   
   For example:
   
   ```sql
   SELECT s1 AS x, x + 1 AS y, count(*)
   FROM table1
   GROUP BY x, y;
   ```
   
   `GROUP BY x` should resolve to:
   
   ```sql
   s1
   ```
   
   and `GROUP BY y` should resolve to:
   
   ```sql
   s1 + 1
   ```
   
   ### Suggested tests
   
   I plan to add or update tests in `SelectAliasReuseTest` for:
   
   ```sql
   SELECT s1 AS x, x + 1 AS y FROM table1;
   SELECT s1 AS x, x + 1 AS y, y * 2 AS z FROM table1;
   SELECT s1 AS x, x + x AS y FROM table1;
   SELECT y + 1 AS x, s1 AS y FROM table1;
   SELECT x + 1 AS x FROM table1;
   SELECT s1 AS x, x + 1 AS y FROM table_with_x;
   SELECT s1 AS x, s2 AS x, x + 1 AS y FROM table1;
   SELECT s1 AS x, s2 AS x, x + 1 AS y FROM table_with_x;
   SELECT s1 AS x, table1.x + 1 AS y FROM table1;
   SELECT s1 AS x, (SELECT x FROM table1) AS y FROM table1;
   SELECT avg(s1) AS a, a + 1 AS b FROM table1;
   SELECT s1 AS x, avg(s2) + x AS y FROM table1;
   SELECT avg(s1) AS a FROM table1 HAVING a > 1;
   SELECT s1 AS x FROM table1 WHERE x > 1;
   SELECT COLUMNS('s.*') AS x, x + 1 AS y FROM table1;
   SELECT row_number() OVER (ORDER BY s1) AS rn, rn + 1 AS rn2 FROM table1;
   SELECT s1 AS x, x FROM table1;
   ```
   
   Please let me know whether this direction looks reasonable, especially the 
resolution priority, duplicate alias handling, and the preferred behavior for 
window function aliases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to