JackieTien97 opened a new issue, #17797:
URL: https://github.com/apache/iotdb/issues/17797

   ### Motivation
   
   In the table model, IoTDB already supports column aliases via `AS`, but 
those aliases **cannot** be reused in `GROUP BY` / `ORDER BY`, nor by later 
items in the same `SELECT` list. Users have to repeat the full (often long) 
expression or fall back to positional references like `GROUP BY 1`. This hurts 
readability and is error-prone for downsampling/aggregation queries, which are 
extremely common in time-series workloads.
   
   **Today users must write:**
   
   ```sql
   SELECT date_bin(1h, time) AS hour_time, AVG(temperature) AS avg_temp
   FROM table1
   GROUP BY date_bin(1h, time)     -- must repeat the whole expression
   ORDER BY date_bin(1h, time);    -- ...and repeat it again
   ```
   
   **What we'd like to support:**
   
   ```sql
   SELECT date_bin(1h, time) AS hour_time, AVG(temperature) AS avg_temp
   FROM table1
   GROUP BY hour_time
   ORDER BY hour_time;
   ```
   
   ### Proposed Feature
   
   **Part 1 — Use SELECT aliases in `GROUP BY` and `ORDER BY`.**
   Allow an alias defined in the `SELECT` list to be referenced as a 
grouping/ordering key. Existing positional references (`GROUP BY 1`) and 
full-expression references should keep working.
   
   **Part 2 — Lateral Column Alias (LCA) in the `SELECT` list.**
   Allow a later select item to reference an alias defined by an earlier select 
item in the same `SELECT`:
   
   ```sql
   SELECT
     date_bin(1h, time)  AS hour_time,
     AVG(temperature)    AS avg_temp,
     avg_temp - 273.15   AS avg_temp_celsius   -- references the earlier alias
   FROM table1
   GROUP BY hour_time;
   ```
   
   ### Prior Art (how other databases behave)
   
   | Database | Alias in `GROUP BY` | Alias in `ORDER BY` | Lateral alias in 
`SELECT` |
   |---|:---:|:---:|:---:|
   | SQL standard | ❌ | ✅ | ❌ |
   | PostgreSQL | ✅ (extension) | ✅ | ❌ |
   | MySQL | ✅ | ✅ | ❌ |
   | SQL Server | ❌ | ✅ | ❌ |
   | Oracle | ✅ 23c+ / ❌ before | ✅ | ❌ |
   | BigQuery | ✅ | ✅ | ❌ |
   | ClickHouse | ✅ | ✅ | ✅ |
   | DuckDB | ✅ | ✅ | ✅ |
   | Spark SQL / Databricks | ✅ | ✅ | ✅ (Spark 3.4+) |
   | Amazon Redshift | ✅ | ✅ | ✅ |
   | Snowflake | ✅ | ✅ | ✅ |
   
   Takeaway: Part 1 (alias in GROUP BY/ORDER BY) is essentially expected 
behavior across mainstream engines. Part 2 (LCA) is supported by the 
analytics-oriented engines (ClickHouse, DuckDB, Redshift, Snowflake, 
Databricks/Spark) and is a popular convenience that removes the need for nested 
subqueries/CTEs.
   
   ### Semantics to decide (important — please discuss before implementing)
   
   1. **Name collision (alias vs. underlying column) in GROUP BY/ORDER BY.** 
When a name matches both a SELECT-output alias and an input column, which wins? 
The common convention (PostgreSQL): **ORDER BY** prefers the **output alias**, 
**GROUP BY** prefers the **input column** (for standard compatibility). We 
should pick a documented, deterministic rule.
   2. **Name collision for LCA.** Following Redshift/Databricks: if a table 
column has the same name as a preceding alias, the **table column takes 
priority**; if more than one preceding alias matches, raise an **ambiguity 
error**.
   3. **Only backward references for LCA** (an item may reference aliases 
defined *before* it, not after); no cyclic references.
   4. **Scope of clauses.** This proposal targets `GROUP BY` / `ORDER BY` / 
lateral `SELECT`. Whether aliases should also be usable in `WHERE` / `HAVING` 
is intentionally left out of scope (standard SQL disallows alias in `WHERE`); 
can be a follow-up.
   5. **Aggregation interaction.** Ensure an alias whose expression is an 
aggregate is rejected/handled correctly when used as a `GROUP BY` key, and that 
the `non-aggregate must appear in GROUP BY` check still applies after alias 
resolution.
   
   ### Scope / Acceptance Criteria
   
   - [ ] `GROUP BY <alias>` resolves to the corresponding SELECT expression.
   - [ ] `ORDER BY <alias>` resolves to the corresponding SELECT expression.
   - [ ] A later SELECT item can reference an earlier SELECT item's alias 
(LCA), with backward-only resolution.
   - [ ] Deterministic, documented name-resolution rules for the collision 
cases above; ambiguous LCA raises a clear error.
   - [ ] Existing positional (`GROUP BY 1`) and full-expression references 
remain unchanged.
   - [ ] Tests covering basic usage, name collisions, ambiguity errors, 
aggregate interaction; documentation updated.
   
   > Note: this proposal can be split into two independently claimable tasks — 
**Part 1** (alias in GROUP BY/ORDER BY) and **Part 2** (lateral column alias) — 
if preferred.
   
   ### References
   
   - PostgreSQL — aliases in GROUP BY/ORDER BY (SQL-92 extension): 
https://www.postgresql.org/docs/current/sql-select.html
   - MySQL — column alias usage in GROUP BY/HAVING/ORDER BY: 
https://dev.mysql.com/doc/en/problems-with-alias.html
   - Databricks — Introducing Lateral Column Alias: 
https://www.databricks.com/blog/introducing-support-lateral-column-alias
   - Apache Spark — SPARK-27561 "lateral column alias references": 
https://issues.apache.org/jira/browse/SPARK-27561
   - Databricks — name resolution rules (alias vs column precedence): 
https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/sql-ref-name-resolution
   - ClickHouse — SELECT (alias reuse in GROUP BY/ORDER BY): 
https://clickhouse.com/docs/sql-reference/statements/select
   - BigQuery — query syntax (alias in GROUP BY/ORDER BY; no LCA): 
https://docs.cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax
   
   ---
   
   _Suggested labels: `enhancement`, table model / SQL._


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to