peter-toth opened a new pull request, #40744: URL: https://github.com/apache/spark/pull/40744
### What changes were proposed in this pull request? This PR adds recursive query feature to Spark SQL. A recursive query is defined using the `WITH RECURSIVE` keywords and referring the name of the common table expression within the query. The implementation complies with SQL standard and follows similar rules to other relational databases: - A query is made of an anchor followed by a recursive term. - The anchor terms doesn't contain self reference and it is used to initialize the query. - The recursive term contains a self reference and it is used to expand the current set of rows with new ones. - The anchor and recursive terms must be joined with each other by `UNION` or `UNION ALL` operators. - New rows can only be derived from the newly added rows of the previous iteration (or from the initial set of rows of anchor terms). This limitation implies that recursive references can't be used with some of the joins, aggregations or subqueries. Please see `cte-recursive.sql` for some examples. ### Why are the changes needed? Recursive query is an ANSI SQL feature that is useful to process hierarchical data. ### Does this PR introduce _any_ user-facing change? Yes, adds recursive query feature. ### How was this patch tested? Added new UTs and tests in `cte-recursion.sql`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
