peter-toth opened a new pull request, #29210:
URL: https://github.com/apache/spark/pull/29210

   ### What changes were proposed in this pull request?
   This PR adds recursive query feature to Spark SQL.
   
   A recursive query is defined using the `WITH RECURSIVE` keywords and 
referring the name of the common table expression within the query.
   The implementation complies with SQL standard and follows similar rules to 
other relational databases:
   - A query is made of an anchor followed by a recursive term.
   - The anchor terms doesn't contain self reference and it is used to 
initialize the query.
   - The recursive term contains a self reference and it is used to expand the 
current set of rows with new ones.
   - The anchor and recursive terms must be joined with each other by `UNION` 
or `UNION ALL` operators.
   - New rows can only be derived from the newly added rows of the previous 
iteration (or from the initial set of rows of anchor terms). This limitation 
implies that recursive references can't be used with some of the joins, 
aggregations or subqueries.
   
   Please see `cte-recursive.sql` and `with.sql` for some examples.
   
   Please note that this PR focuses on the minimal working implementation which 
means:
   - SQL recursion is actually loop where the current iteration is computed 
based on the previous one's result and when an iteration returns no rows the 
loop is over. The final result is the union of all iteration results. This 
means that caching intermediate results could speed up the process, but caching 
was removed from this PR to reduce complexity and can be added back in a 
follow-up PR.
   - A common way to stop SQL recursion is using the LIMIT operator to stop 
computing more than the required number of rows. LIMIT support was removed from 
this PR to reduce complexity and can be added back in a follow-up PR.
   - Some relational databases are more relaxed in terms how many anchor and 
recursive terms can be in a recursion. This PR allows the most simple case and 
allows only 1-1 of them. A follow-up PR can target to relax this limitation.
   
   ### Why are the changes needed?
   Recursive query is an ANSI SQL feature that is useful to process 
hierarchical data.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, adds recursive query feature.
   
   ### How was this patch tested?
   Added new UTs and tests in `cte-recursion.sql` and `with.sql`. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to