peter-toth opened a new pull request, #40744:
URL: https://github.com/apache/spark/pull/40744

   ### What changes were proposed in this pull request?
   This PR adds recursive query feature to Spark SQL.
   
   A recursive query is defined using the `WITH RECURSIVE` keywords and 
referring the name of the common table expression within the query.
   The implementation complies with SQL standard and follows similar rules to 
other relational databases:
   - A query is made of an anchor followed by a recursive term.
   - The anchor terms doesn't contain self reference and it is used to 
initialize the query.
   - The recursive term contains a self reference and it is used to expand the 
current set of rows with new ones.
   - The anchor and recursive terms must be joined with each other by `UNION` 
or `UNION ALL` operators.
   - New rows can only be derived from the newly added rows of the previous 
iteration (or from the initial set of rows of anchor terms). This limitation 
implies that recursive references can't be used with some of the joins, 
aggregations or subqueries.
   
   Please see `cte-recursive.sql` for some examples.
   
   ### Why are the changes needed?
   Recursive query is an ANSI SQL feature that is useful to process 
hierarchical data.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, adds recursive query feature.
   
   ### How was this patch tested?
   Added new UTs and tests in `cte-recursion.sql`. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to