DaZuiZui commented on issue #17797:
URL: https://github.com/apache/iotdb/issues/17797#issuecomment-4608886769

   Hi!
   I would like to work on **Part 1** of this issue: allowing SELECT column 
aliases to be referenced in `GROUP BY` and `ORDER BY`.
   
   I will keep **Lateral Column Alias references inside the SELECT list** out 
of scope for this PR and handle that as a separate follow-up after Part 1 is 
merged.
   
   ## Feature Definition
   
   ### Scope
   
   This feature applies to the **table model SQL analyzer**.
   
   It allows an explicit alias defined by a SELECT item to be used as a 
grouping or ordering key.
   
   ### Supported Syntax
   
   ```sql
   SELECT <expression> [AS] <alias>, ...
   FROM <table>
   GROUP BY <alias>
   ORDER BY <alias> [ASC | DESC] [NULLS FIRST | NULLS LAST];
   ```
   Components
   <expression>: any SELECT expression already supported by the table model 
analyzer.
   <alias>: an explicit SELECT item alias, for example date_bin(1h, time) AS 
hour_time.
   GROUP BY <alias>: resolves the alias to the corresponding SELECT expression 
and uses that expression as the grouping key.
   ORDER BY <alias>: resolves the alias to the corresponding SELECT expression 
and uses that expression as the ordering key.
   ASC, DESC, NULLS FIRST, and NULLS LAST: keep the existing ORDER BY semantics.
   Type Rules
   The alias itself does not introduce a new data type. Its type is the type of 
the SELECT expression it refers to.
   
   For GROUP BY <alias>:
   
   The resolved expression must be comparable, following the existing GROUP BY 
type rules.
   The resolved expression must not contain aggregate, window, or grouping 
functions.
   Existing semantic checks should run after alias resolution.
   For ORDER BY <alias>:
   
   The resolved expression must be orderable, following the existing ORDER BY 
type rules.
   Existing semantic checks should run after alias resolution.
   In other words, supported data types are the same as the currently supported 
comparable/orderable types in the table model analyzer. Non-comparable or 
non-orderable types should continue to be rejected.
   
   Name Resolution Rules
   For a single unqualified identifier in GROUP BY:
   
   Try to resolve it as an input column first.
   If it does not resolve to an input column, try to resolve it as a SELECT 
alias.
   If multiple SELECT aliases match, report an ambiguity error.
   For a single unqualified identifier in ORDER BY:
   
   Try to resolve it as a SELECT output alias first.
   If no SELECT alias matches, fall back to the existing ORDER BY name 
resolution behavior.
   If multiple SELECT aliases match, report an ambiguity error.
   This follows the convention discussed in this issue:
   
   GROUP BY prefers the input column.
   ORDER BY prefers the SELECT output alias.
   Out of Scope
   This PR will not introduce SELECT aliases into:
   
   WHERE
   HAVING
   later SELECT items, namely Lateral Column Alias references
   Lateral Column Alias support will be handled in a separate follow-up PR.
   
   Examples
   Basic GROUP BY / ORDER BY alias
   SELECT date_bin(1h, time) AS hour_time, AVG(temperature) AS avg_temp
   FROM table1
   GROUP BY hour_time
   ORDER BY hour_time;
   This should be analyzed as if the user wrote:
   
   SELECT date_bin(1h, time) AS hour_time, AVG(temperature) AS avg_temp
   FROM table1
   GROUP BY date_bin(1h, time)
   ORDER BY date_bin(1h, time);
   Existing behavior should remain unchanged
   GROUP BY 1
   ORDER BY 1
   GROUP BY date_bin(1h, time)
   ORDER BY AVG(temperature)
   GROUP BY name collision prefers input column
   If table1 has an input column named hour_time:
   
   SELECT date_bin(1h, time) AS hour_time, AVG(temperature)
   FROM table1
   GROUP BY hour_time;
   GROUP BY hour_time should resolve to the input column hour_time, not the 
SELECT alias.
   
   ORDER BY name collision prefers SELECT alias
   If table1 has an input column named hour_time:
   
   SELECT date_bin(1h, time) AS hour_time
   FROM table1
   ORDER BY hour_time;
   ORDER BY hour_time should resolve to the SELECT alias.
   
   Duplicate alias ambiguity
   SELECT s1 AS x, s2 AS x
   FROM table1
   ORDER BY x;
   This should fail with a clear ambiguity error.
   
   Aggregate alias in GROUP BY should fail
   SELECT AVG(temperature) AS avg_temp
   FROM table1
   GROUP BY avg_temp;
   This should fail because GROUP BY cannot use an aggregate expression as a 
grouping key.
   
   Test Plan
   I plan to add tests for:
   
   GROUP BY alias
   ORDER BY alias
   alias/input-column collision behavior
   duplicate alias ambiguity
   aggregate alias used in GROUP BY should fail
   existing ordinal references should keep working
   existing full-expression references should keep working
   SELECT DISTINCT with ORDER BY alias, if applicable
   Could you please assign Part 1 of this issue to me?
   
   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to