chenhao-db opened a new pull request, #45730:
URL: https://github.com/apache/spark/pull/45730
### What changes were proposed in this pull request?
In the `Window` node, both `partitionSpec` and `orderSpec` must be
orderable, but the current type check only verifies `orderSpec` is orderable.
This can cause an error in later optimizing phases.
Given a query:
```
with t as (select id, map(id, id) as m from range(0, 10))
select rank() over (partition by m order by id) from t
```
Before the PR, it fails with an `INTERNAL_ERROR`:
```
org.apache.spark.SparkException: [INTERNAL_ERROR] grouping/join/window
partition keys cannot be map type. SQLSTATE: XX000
at org.apache.spark.SparkException$.internalError(SparkException.scala:92)
at org.apache.spark.SparkException$.internalError(SparkException.scala:96)
at
org.apache.spark.sql.catalyst.optimizer.NormalizeFloatingNumbers$.needNormalize(NormalizeFloatingNumbers.scala:103)
at
org.apache.spark.sql.catalyst.optimizer.NormalizeFloatingNumbers$.org$apache$spark$sql$catalyst$optimizer$NormalizeFloatingNumbers$$needNormalize(NormalizeFloatingNumbers.scala:94)
...
```
After the PR, it fails with a `DATATYPE_MISMATCH.INVALID_ORDERING_TYPE`,
which is expected:
```
org.apache.spark.sql.catalyst.ExtendedAnalysisException:
[DATATYPE_MISMATCH.INVALID_ORDERING_TYPE] Cannot resolve "m" due to data type
mismatch: The `attributereference` does not support ordering on type
"MAP<BIGINT, BIGINT>". SQLSTATE: 42K09; line 2 pos 53;
Project [RANK() OVER (PARTITION BY m ORDER BY id ASC NULLS FIRST ROWS
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)#4]
+- Project [id#1L, m#0, RANK() OVER (PARTITION BY m ORDER BY id ASC NULLS
FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)#4, RANK() OVER
(PARTITION BY m ORDER BY id ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW)#4]
+- Window [rank(id#1L) windowspecdefinition(m#0, id#1L ASC NULLS FIRST,
specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS RANK()
OVER (PARTITION BY m ORDER BY id ASC NULLS FIRST ROWS BETWEEN UNBOUNDED
PRECEDING AND CURRENT ROW)#4], [m#0], [id#1L ASC NULLS FIRST]
+- Project [id#1L, m#0]
+- SubqueryAlias t
+- SubqueryAlias t
+- Project [id#1L, map(id#1L, id#1L) AS m#0]
+- Range (0, 10, step=1, splits=None)
at
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.dataTypeMismatch(package.scala:73)
...
```
### How was this patch tested?
Unit test.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]