JackieTien97 opened a new issue, #17797:
URL: https://github.com/apache/iotdb/issues/17797
### Motivation
In the table model, IoTDB already supports column aliases via `AS`, but
those aliases **cannot** be reused in `GROUP BY` / `ORDER BY`, nor by later
items in the same `SELECT` list. Users have to repeat the full (often long)
expression or fall back to positional references like `GROUP BY 1`. This hurts
readability and is error-prone for downsampling/aggregation queries, which are
extremely common in time-series workloads.
**Today users must write:**
```sql
SELECT date_bin(1h, time) AS hour_time, AVG(temperature) AS avg_temp
FROM table1
GROUP BY date_bin(1h, time) -- must repeat the whole expression
ORDER BY date_bin(1h, time); -- ...and repeat it again
```
**What we'd like to support:**
```sql
SELECT date_bin(1h, time) AS hour_time, AVG(temperature) AS avg_temp
FROM table1
GROUP BY hour_time
ORDER BY hour_time;
```
### Proposed Feature
**Part 1 — Use SELECT aliases in `GROUP BY` and `ORDER BY`.**
Allow an alias defined in the `SELECT` list to be referenced as a
grouping/ordering key. Existing positional references (`GROUP BY 1`) and
full-expression references should keep working.
**Part 2 — Lateral Column Alias (LCA) in the `SELECT` list.**
Allow a later select item to reference an alias defined by an earlier select
item in the same `SELECT`:
```sql
SELECT
date_bin(1h, time) AS hour_time,
AVG(temperature) AS avg_temp,
avg_temp - 273.15 AS avg_temp_celsius -- references the earlier alias
FROM table1
GROUP BY hour_time;
```
### Prior Art (how other databases behave)
| Database | Alias in `GROUP BY` | Alias in `ORDER BY` | Lateral alias in
`SELECT` |
|---|:---:|:---:|:---:|
| SQL standard | ❌ | ✅ | ❌ |
| PostgreSQL | ✅ (extension) | ✅ | ❌ |
| MySQL | ✅ | ✅ | ❌ |
| SQL Server | ❌ | ✅ | ❌ |
| Oracle | ✅ 23c+ / ❌ before | ✅ | ❌ |
| BigQuery | ✅ | ✅ | ❌ |
| ClickHouse | ✅ | ✅ | ✅ |
| DuckDB | ✅ | ✅ | ✅ |
| Spark SQL / Databricks | ✅ | ✅ | ✅ (Spark 3.4+) |
| Amazon Redshift | ✅ | ✅ | ✅ |
| Snowflake | ✅ | ✅ | ✅ |
Takeaway: Part 1 (alias in GROUP BY/ORDER BY) is essentially expected
behavior across mainstream engines. Part 2 (LCA) is supported by the
analytics-oriented engines (ClickHouse, DuckDB, Redshift, Snowflake,
Databricks/Spark) and is a popular convenience that removes the need for nested
subqueries/CTEs.
### Semantics to decide (important — please discuss before implementing)
1. **Name collision (alias vs. underlying column) in GROUP BY/ORDER BY.**
When a name matches both a SELECT-output alias and an input column, which wins?
The common convention (PostgreSQL): **ORDER BY** prefers the **output alias**,
**GROUP BY** prefers the **input column** (for standard compatibility). We
should pick a documented, deterministic rule.
2. **Name collision for LCA.** Following Redshift/Databricks: if a table
column has the same name as a preceding alias, the **table column takes
priority**; if more than one preceding alias matches, raise an **ambiguity
error**.
3. **Only backward references for LCA** (an item may reference aliases
defined *before* it, not after); no cyclic references.
4. **Scope of clauses.** This proposal targets `GROUP BY` / `ORDER BY` /
lateral `SELECT`. Whether aliases should also be usable in `WHERE` / `HAVING`
is intentionally left out of scope (standard SQL disallows alias in `WHERE`);
can be a follow-up.
5. **Aggregation interaction.** Ensure an alias whose expression is an
aggregate is rejected/handled correctly when used as a `GROUP BY` key, and that
the `non-aggregate must appear in GROUP BY` check still applies after alias
resolution.
### Scope / Acceptance Criteria
- [ ] `GROUP BY <alias>` resolves to the corresponding SELECT expression.
- [ ] `ORDER BY <alias>` resolves to the corresponding SELECT expression.
- [ ] A later SELECT item can reference an earlier SELECT item's alias
(LCA), with backward-only resolution.
- [ ] Deterministic, documented name-resolution rules for the collision
cases above; ambiguous LCA raises a clear error.
- [ ] Existing positional (`GROUP BY 1`) and full-expression references
remain unchanged.
- [ ] Tests covering basic usage, name collisions, ambiguity errors,
aggregate interaction; documentation updated.
> Note: this proposal can be split into two independently claimable tasks —
**Part 1** (alias in GROUP BY/ORDER BY) and **Part 2** (lateral column alias) —
if preferred.
### References
- PostgreSQL — aliases in GROUP BY/ORDER BY (SQL-92 extension):
https://www.postgresql.org/docs/current/sql-select.html
- MySQL — column alias usage in GROUP BY/HAVING/ORDER BY:
https://dev.mysql.com/doc/en/problems-with-alias.html
- Databricks — Introducing Lateral Column Alias:
https://www.databricks.com/blog/introducing-support-lateral-column-alias
- Apache Spark — SPARK-27561 "lateral column alias references":
https://issues.apache.org/jira/browse/SPARK-27561
- Databricks — name resolution rules (alias vs column precedence):
https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/sql-ref-name-resolution
- ClickHouse — SELECT (alias reuse in GROUP BY/ORDER BY):
https://clickhouse.com/docs/sql-reference/statements/select
- BigQuery — query syntax (alias in GROUP BY/ORDER BY; no LCA):
https://docs.cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax
---
_Suggested labels: `enhancement`, table model / SQL._
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]