cloud-fan commented on code in PR #55984:
URL: https://github.com/apache/spark/pull/55984#discussion_r3287828550
##########
sql/core/src/test/resources/sql-tests/inputs/listagg.sql:
##########
@@ -61,3 +61,18 @@ SELECT listagg(DISTINCT col1) WITHIN GROUP (ORDER BY col1,
col2) FROM df;
SELECT listagg(DISTINCT col, ',') WITHIN GROUP (ORDER BY col) FROM VALUES
(cast(1.1 as double)), (cast(2.2 as double)), (cast(2.2 as double)), (cast(3.3
as double)) AS t(col);
SELECT listagg(DISTINCT col, ',') WITHIN GROUP (ORDER BY col) FROM VALUES
(cast(1.0 as float)), (cast(2.0 as float)), (cast(2.0 as float)) AS t(col);
SELECT listagg(DISTINCT col, ',') WITHIN GROUP (ORDER BY col) FROM VALUES
(TIMESTAMP'2024-01-01 10:00:00'), (TIMESTAMP'2024-01-02 12:00:00'),
(TIMESTAMP'2024-01-01 10:00:00') AS t(col);
+
+-- LISTAGG with semi-structured extract (parser wraps v:a in Alias with fresh
ExprId)
+-- Tests that isOrderCompatible strips Alias wrappers before comparing via
semanticEquals
+SELECT listagg(DISTINCT v:a::string, ',') WITHIN GROUP (ORDER BY v:a::string)
FROM (SELECT parse_json('{"a": "x"}') v UNION ALL SELECT parse_json('{"a":
"y"}') UNION ALL SELECT parse_json('{"a": "x"}'));
+-- Semi-structured extract without DISTINCT
+SELECT listagg(v:a::string, ',') WITHIN GROUP (ORDER BY v:a::string) FROM
(SELECT parse_json('{"a": "x"}') v UNION ALL SELECT parse_json('{"a": "y"}')
UNION ALL SELECT parse_json('{"a": "x"}'));
+-- Semi-structured extract with DESC ordering
+SELECT listagg(DISTINCT v:a::string, ',') WITHIN GROUP (ORDER BY v:a::string
DESC) FROM (SELECT parse_json('{"a": "x"}') v UNION ALL SELECT
parse_json('{"a": "y"}') UNION ALL SELECT parse_json('{"a": "x"}'));
+-- Semi-structured extract with nested path
+SELECT listagg(DISTINCT v:a.b::string, ',') WITHIN GROUP (ORDER BY
v:a.b::string) FROM (SELECT parse_json('{"a": {"b": "x"}}') v UNION ALL SELECT
parse_json('{"a": {"b": "y"}}') UNION ALL SELECT parse_json('{"a": {"b":
"x"}}'));
+-- Semi-structured extract with GROUP BY
+SELECT grp, listagg(DISTINCT v:a::string, ',') WITHIN GROUP (ORDER BY
v:a::string) FROM (SELECT 1 grp, parse_json('{"a": "x"}') v UNION ALL SELECT 1,
parse_json('{"a": "y"}') UNION ALL SELECT 2, parse_json('{"a": "x"}') UNION ALL
SELECT 2, parse_json('{"a": "x"}') UNION ALL SELECT 1, parse_json('{"a":
"x"}')) GROUP BY grp;
+-- Semi-structured extract: DISTINCT cast with non-equality-preserving order
(variant)
+-- Tests that checkOrderValueDeterminism strips Alias wrappers before
comparing via semanticEquals
+SELECT listagg(DISTINCT v:a::string, ',') WITHIN GROUP (ORDER BY v:a) FROM
(SELECT parse_json('{"a": "x"}') v UNION ALL SELECT parse_json('{"a": "y"}')
UNION ALL SELECT parse_json('{"a": "x"}'));
Review Comment:
The comment above says this exercises `checkOrderValueDeterminism`'s alias
stripping, but `ORDER BY v:a` is VARIANT and fails with
`DATATYPE_MISMATCH.INVALID_ORDERING_TYPE` from `SortOrder.checkInputDataTypes`
before listagg's checks run — see this query's `analyzer-results` golden
output: only that error is raised, no
`functionAndOrderExpressionUnsafeCastError`. The test would pass identically
with or without the new `trimAliases` you added at line 684.
To actually hit the `Cast(castChild, ...)` arm in
`checkOrderValueDeterminism`, the order column needs to be orderable but
non-equality-preserving when cast to String, e.g. `LISTAGG(DISTINCT
(v:a)::double::string, ',') WITHIN GROUP (ORDER BY (v:a)::double)`. Without the
trim, the single-pass aliases inside `castChild` and
`orderExpressions.head.child` mismatch and you'd get `NonDeterministicMismatch`
→ `functionAndOrderExpressionMismatchError`; with the trim, they match and you
get `NonDeterministicCast(Double, String)` →
`functionAndOrderExpressionUnsafeCastError`. That divergence is what pins the
fix.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]