beliefer opened a new pull request #25416: [SPARK-28330][SQL] Support ANSI SQL: result offset clause in query expression URL: https://github.com/apache/spark/pull/25416 ## What changes were proposed in this pull request? This is a ANSI SQL and feature id is `F861` ``` <query expression> ::= [ <with clause> ] <query expression body> [ <order by clause> ] [ <result offset clause> ] [ <fetch first clause> ] <result offset clause> ::= OFFSET <offset row count> { ROW | ROWS } ``` For example: ``` SELECT customer_name, customer_gender FROM customer_dimension WHERE occupation='Dancer' AND customer_city = 'San Francisco' ORDER BY customer_name; customer_name | customer_gender ----------------------+----------------- Amy X. Lang | Female Anna H. Li | Female Brian O. Weaver | Male Craig O. Pavlov | Male Doug Z. Goldberg | Male Harold S. Jones | Male Jack E. Perkins | Male Joseph W. Overstreet | Male Kevin . Campbell | Male Raja Y. Wilson | Male Samantha O. Brown | Female Steve H. Gauthier | Male William . Nielson | Male William Z. Roy | Male (14 rows) SELECT customer_name, customer_gender FROM customer_dimension WHERE occupation='Dancer' AND customer_city = 'San Francisco' ORDER BY customer_name OFFSET 8; customer_name | customer_gender -------------------+----------------- Kevin . Campbell | Male Raja Y. Wilson | Male Samantha O. Brown | Female Steve H. Gauthier | Male William . Nielson | Male William Z. Roy | Male (6 rows) ``` There are some mainstream database support the syntax. **PostgreSQL:** https://www.postgresql.org/docs/11/queries-limit.html **Vertica:** https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Statements/SELECT/OFFSETClause.htm?zoom_highlight=offset **MySQL:** https://dev.mysql.com/doc/refman/5.6/en/select.html ## How was this patch tested? new UT. There are some show of the PR on my production environment. ``` spark-sql> select * from gja_test_partition; a A ao 1 b B bo 1 c C co 1 d D do 1 e E eo 2 g G go 2 h H ho 2 j J jo 2 f F fo 3 k K ko 3 l L lo 4 i I io 4 Time taken: 6.618 s spark-sql> select * from gja_test_partition offset 3; d D do 1 e E eo 2 g G go 2 h H ho 2 j J jo 2 f F fo 3 k K ko 3 l L lo 4 i I io 4 Time taken: 6.368 s spark-sql> select * from gja_test_partition limit 5 offset 3; d D do 1 e E eo 2 g G go 2 h H ho 2 j J jo 2 Time taken: 25.141 s spark-sql> select * from gja_test_partition order by key; a A ao 1 b B bo 1 c C co 1 d D do 1 e E eo 2 f F fo 3 g G go 2 h H ho 2 i I io 4 j J jo 2 k K ko 3 l L lo 4 Time taken: 16.894 s spark-sql> select * from gja_test_partition order by key offset 3; d D do 1 e E eo 2 f F fo 3 g G go 2 h H ho 2 i I io 4 j J jo 2 k K ko 3 l L lo 4 Time taken: 19.191 s spark-sql> select * from gja_test_partition order by key limit 5 offset 3; d D do 1 e E eo 2 f F fo 3 g G go 2 h H ho 2 Time taken: 12.556 s ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
