Angryrou commented on code in PR #48413: URL: https://github.com/apache/spark/pull/48413#discussion_r1817004266
########## sql/core/src/test/resources/sql-tests/inputs/pipe-operators.sql: ########## @@ -571,6 +583,97 @@ table t table t |> union all table st; +-- Sorting and repartitioning operators: positive tests. +-------------------------------------------------------- + +-- Order by. +table t +|> order by x; + +-- Order by with a table subquery. +(select * from t) +|> order by x; + +-- Order by with a VALUES list. +values (0, 'abc') tab(x, y) +|> order by x; + +-- Limit. +table t +|> order by x +|> limit 1; + +-- Limit with offset. +table t +|> where x = 1 +|> select y +|> limit 2 offset 1; + +-- Offset is allowed without limit. +table t +|> where x = 1 +|> select y +|> offset 1; + +-- LIMIT ALL and OFFSET 0 are equivalent to no LIMIT or OFFSET clause, respectively. +table t +|> limit all offset 0; + +-- Distribute by. +table t +|> distribute by x; + +-- Cluster by. +table t +|> cluster by x; + +-- Sort and distribute by. +table t +|> sort by x distribute by x; + +-- It is possible to apply a final ORDER BY clause on the result of a query containing pipe +-- operators. +table t +|> order by x desc +order by y; + +-- Sorting and repartitioning operators: negative tests. +-------------------------------------------------------- + +-- Multiple order by clauses are not supported in the same pipe operator. +-- We add an extra "ORDER BY y" clause at the end in this test to show that the "ORDER BY x + y" +-- clause was consumed end the of the final query, not as part of the pipe operator. +table t +|> order by x desc order by x + y +order by y; + +-- The ORDER BY clause may only refer to column names from the previous input relation. +table t +|> select 1 + 2 as result +|> order by x; + +-- The DISTRIBUTE BY clause may only refer to column names from the previous input relation. +table t +|> select 1 + 2 as result +|> distribute by x; + +-- Combinations of multiple ordering and limit clauses are not supported. +table t +|> order by x limit 1; + +-- ORDER BY and SORT BY are not supported at the same time. +table t +|> order by x sort by x; + +-- The WINDOW clause is not supported yet. +table windowTestData +|> window w as (partition by cte order by val) Review Comment: Hi Daniel @dtenedor , I noticed that this window clause in Spark differs from what’s described in the original paper and documentation. Could you share your thoughts on this? The [documentation](https://github.com/google/zetasql/blob/master/docs/pipe-syntax.md#window-pipe-operator ) specifies that a window operator should always include a window function with an OVER clause. However, in Spark's syntax, the window operator only returns a window definition without requiring an OVER clause. I think it makes sense to keep the existing window syntax (as shown in this example) since the Extend clause will cover the window operator’s functionality as described in the paper. However, I’d like to confirm the expected behavior of the window clause in Spark SQL before proceeding with a PR. Thanks in advance! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
