This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.0 by this push: new 4dbcda4 [SPARK-31362][SQL][DOCS] Document Set Operators in SQL Reference 4dbcda4 is described below commit 4dbcda4672682333e1041a24b9b1ab3d1e10e99d Author: Huaxin Gao <huax...@us.ibm.com> AuthorDate: Wed Apr 8 10:51:04 2020 -0500 [SPARK-31362][SQL][DOCS] Document Set Operators in SQL Reference ### What changes were proposed in this pull request? Document Set Operators in SQL Reference ### Why are the changes needed? To make SQL Reference complete ### Does this PR introduce any user-facing change? Yes <img width="1050" alt="Screen Shot 2020-04-07 at 9 20 05 AM" src="https://user-images.githubusercontent.com/13592258/78694605-c6ea2680-78b1-11ea-8590-afb43dbe5933.png"> <img width="1050" alt="Screen Shot 2020-04-07 at 9 20 41 AM" src="https://user-images.githubusercontent.com/13592258/78694613-c8b3ea00-78b1-11ea-89b9-d6cd71ee86a0.png"> <img width="1050" alt="Screen Shot 2020-04-07 at 9 21 29 AM" src="https://user-images.githubusercontent.com/13592258/78694622-ca7dad80-78b1-11ea-9acf-7611ee57d4f2.png"> <img width="1050" alt="Screen Shot 2020-04-07 at 9 21 54 AM" src="https://user-images.githubusercontent.com/13592258/78694626-cc477100-78b1-11ea-82f8-4deaf0048de7.png"> ### How was this patch tested? Manually build and check Closes #28139 from huaxingao/set-operators. Authored-by: Huaxin Gao <huax...@us.ibm.com> Signed-off-by: Sean Owen <sro...@gmail.com> (cherry picked from commit 5dc9b9c7c18b853cb3bdcb50041ef46888922308) Signed-off-by: Sean Owen <sro...@gmail.com> --- docs/_data/menu-sql.yaml | 2 + docs/sql-ref-syntax-qry-select-setops.md | 163 ++++++++++++++++++++++++++++++- docs/sql-ref-syntax-qry-select.md | 1 + 3 files changed, 163 insertions(+), 3 deletions(-) diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml index ffc3b3c..6f300e2 100644 --- a/docs/_data/menu-sql.yaml +++ b/docs/_data/menu-sql.yaml @@ -156,6 +156,8 @@ url: sql-ref-syntax-qry-select-limit.html - text: Join Hints url: sql-ref-syntax-qry-select-hints.html + - text: Set Operators + url: sql-ref-syntax-qry-select-setops.html - text: EXPLAIN url: sql-ref-syntax-qry-explain.html - text: Auxiliary Statements diff --git a/docs/sql-ref-syntax-qry-select-setops.md b/docs/sql-ref-syntax-qry-select-setops.md index 526f857..8ed6e48 100644 --- a/docs/sql-ref-syntax-qry-select-setops.md +++ b/docs/sql-ref-syntax-qry-select-setops.md @@ -1,7 +1,7 @@ --- layout: global -title: Set Operations -displayTitle: Set Operations +title: Set Operators +displayTitle: Set Operators license: | Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with @@ -19,4 +19,161 @@ license: | limitations under the License. --- -**This page is under construction** +Set operators are used to combine two input relations into a single one. Spark SQL supports three types of set operators: +- `EXCEPT` or `MINUS` +- `INTERSECT` +- `UNION` + +Note that input relations must have the same number of columns and compatible data types for the respective columns. + +### EXCEPT +`EXCEPT` and `EXCEPT ALL` return the rows that are found in one relation but not the other. `EXCEPT` (alternatively, `EXCEPT DISTINCT`) takes only distinct rows while `EXCEPT ALL` does not remove duplicates from the result rows. Note that `MINUS` is an alias for `EXCEPT`. + +#### Syntax +{% highlight sql %} +[ ( ] relation [ ) ] EXCEPT | MINUS [ ALL | DISTINCT ] [ ( ] relation [ ) ] +{% endhighlight %} + +### INTERSECT +`INTERSECT` and `INTERSECT ALL` return the rows that are found in both relations. `INTERSECT` (alternatively, `INTERSECT DISTINCT`) takes only distinct rows while `INTERSECT ALL` does not remove duplicates from the result rows. + +#### Syntax +{% highlight sql %} +[ ( ] relation [ ) ] INTERSECT [ ALL | DISTINCT ] [ ( ] relation [ ) ] +{% endhighlight %} + +### UNION +`UNION` and `UNION ALL` return the rows that are found in either relation. `UNION` (alternatively, `UNION DISTINCT`) takes only distinct rows while `UNION ALL` does not remove duplicates from the result rows. + +#### Syntax +{% highlight sql %} +[ ( ] relation [ ) ] UNION [ ALL | DISTINCT ] [ ( ] relation [ ) ] +{% endhighlight %} + +### Examples +{% highlight sql %} +-- Use number1 and number2 tables to demonstrate set operators. +SELECT * FROM number1; ++---+ +| c| ++---+ +| 3| +| 1| +| 2| +| 2| +| 3| +| 4| ++---+ + +SELECT * FROM number2; ++---+ +| c| ++---+ +| 5| +| 1| +| 2| +| 2| ++---+ + +SELECT c FROM number1 EXCEPT SELECT c FROM number2; ++---+ +| c| ++---+ +| 3| +| 4| ++---+ + +SELECT c FROM number1 MINUS SELECT c FROM number2; ++---+ +| c| ++---+ +| 3| +| 4| ++---+ + +SELECT c FROM number1 EXCEPT ALL (SELECT c FROM number2); ++---+ +| c| ++---+ +| 3| +| 3| +| 4| ++---+ + +SELECT c FROM number1 MINUS ALL (SELECT c FROM number2); ++---+ +| c| ++---+ +| 3| +| 3| +| 4| ++---+ + +(SELECT c FROM number1) INTERSECT (SELECT c FROM number2); ++---+ +| c| ++---+ +| 1| +| 2| ++---+ + +(SELECT c FROM number1) INTERSECT DISTINCT (SELECT c FROM number2); ++---+ +| c| ++---+ +| 1| +| 2| ++---+ + +(SELECT c FROM number1) INTERSECT ALL (SELECT c FROM number2); ++---+ +| c| ++---+ +| 1| +| 2| +| 2| ++---+ + +(SELECT c FROM number1) UNION (SELECT c FROM number2); ++---+ +| c| ++---+ +| 1| +| 3| +| 5| +| 4| +| 2| ++---+ + +(SELECT c FROM number1) UNION DISTINCT (SELECT c FROM number2); ++---+ +| c| ++---+ +| 1| +| 3| +| 5| +| 4| +| 2| ++---+ + +SELECT c FROM number1 UNION ALL (SELECT c FROM number2); ++---+ +| c| ++---+ +| 3| +| 1| +| 2| +| 2| +| 3| +| 4| +| 5| +| 1| +| 2| +| 2| ++---+ + +{% endhighlight %} + +### Related Statement +- [SELECT Statement](sql-ref-syntax-qry-select.html) + diff --git a/docs/sql-ref-syntax-qry-select.md b/docs/sql-ref-syntax-qry-select.md index 7ad1dd1..420cf1f 100644 --- a/docs/sql-ref-syntax-qry-select.md +++ b/docs/sql-ref-syntax-qry-select.md @@ -150,3 +150,4 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { named_expression [ , ... ] } - [CLUSTER BY Clause](sql-ref-syntax-qry-select-clusterby.html) - [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html) - [LIMIT Clause](sql-ref-syntax-qry-select-limit.html) +- [SET Operators](sql-ref-syntax-qry-select-setops.html) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org