[spark] branch branch-3.0 updated: [SPARK-31556][SQL][DOCS] Document LIKE clause in SQL Reference
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new da8c7b8 [SPARK-31556][SQL][DOCS] Document LIKE clause in SQL Reference da8c7b8 is described below commit da8c7b8ceffa1566ae35280a2d1c3abcbff47542 Author: Huaxin Gao AuthorDate: Wed Apr 29 09:17:23 2020 +0900 [SPARK-31556][SQL][DOCS] Document LIKE clause in SQL Reference ### What changes were proposed in this pull request? Document LIKE clause in SQL Reference ### Why are the changes needed? To make SQL Reference complete ### Does this PR introduce any user-facing change? Yes https://user-images.githubusercontent.com/13592258/80294346-5babab80-871d-11ea-8ac9-51bbab0aca88.png;> https://user-images.githubusercontent.com/13592258/80294347-5ea69c00-871d-11ea-8c51-7a90ee20f7da.png;> https://user-images.githubusercontent.com/13592258/80294351-61a18c80-871d-11ea-9e75-e3345d2f52f5.png;> ### How was this patch tested? Manually build and check Closes #28332 from huaxingao/where_clause. Authored-by: Huaxin Gao Signed-off-by: Takeshi Yamamuro (cherry picked from commit d34cb59fb311c3d700e4f4f877b61b17cea313ee) Signed-off-by: Takeshi Yamamuro --- docs/_data/menu-sql.yaml | 2 + docs/sql-ref-syntax-aux-show-databases.md | 13 +++- docs/sql-ref-syntax-aux-show-functions.md | 8 +- docs/sql-ref-syntax-aux-show-table.md | 14 ++-- docs/sql-ref-syntax-aux-show-tables.md| 10 +-- docs/sql-ref-syntax-aux-show-views.md | 12 +-- docs/sql-ref-syntax-qry-explain.md| 2 +- docs/sql-ref-syntax-qry-select-like.md| 120 ++ 8 files changed, 154 insertions(+), 27 deletions(-) diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml index 1097079..dfe4cfa 100644 --- a/docs/_data/menu-sql.yaml +++ b/docs/_data/menu-sql.yaml @@ -170,6 +170,8 @@ url: sql-ref-syntax-qry-select-inline-table.html - text: Common Table Expression url: sql-ref-syntax-qry-select-cte.html +- text: LIKE Predicate + url: sql-ref-syntax-qry-select-like.html - text: Window Function url: sql-ref-syntax-qry-window.html - text: EXPLAIN diff --git a/docs/sql-ref-syntax-aux-show-databases.md b/docs/sql-ref-syntax-aux-show-databases.md index 0ed3452..3599009 100644 --- a/docs/sql-ref-syntax-aux-show-databases.md +++ b/docs/sql-ref-syntax-aux-show-databases.md @@ -29,16 +29,21 @@ and mean the same thing. ### Syntax {% highlight sql %} -SHOW { DATABASES | SCHEMAS } [ LIKE string_pattern ] +SHOW { DATABASES | SCHEMAS } [ LIKE regex_pattern ] {% endhighlight %} ### Parameters - LIKE string_pattern + regex_pattern -Specifies a string pattern that is used to match the databases in the system. In -the specified string pattern '*' matches any number of characters. +Specifies a regular expression pattern that is used to filter the results of the +statement. + + Only * and | are allowed as wildcard pattern. + Excluding * and |, the remaining pattern follows the regular expression semantics. + The leading and trailing blanks are trimmed in the input pattern before processing. The pattern match is case-insensitive. + diff --git a/docs/sql-ref-syntax-aux-show-functions.md b/docs/sql-ref-syntax-aux-show-functions.md index da33d99..ed22a3a 100644 --- a/docs/sql-ref-syntax-aux-show-functions.md +++ b/docs/sql-ref-syntax-aux-show-functions.md @@ -58,12 +58,12 @@ SHOW [ function_kind ] FUNCTIONS ( [ LIKE ] function_name | regex_pattern ) regex_pattern -Specifies a regular expression pattern that is used to limit the results of the +Specifies a regular expression pattern that is used to filter the results of the statement. - Only `*` and `|` are allowed as wildcard pattern. - Excluding `*` and `|` the remaining pattern follows the regex semantics. - The leading and trailing blanks are trimmed in the input pattern before processing. + Only * and | are allowed as wildcard pattern. + Excluding * and |, the remaining pattern follows the regular expression semantics. + The leading and trailing blanks are trimmed in the input pattern before processing. The pattern match is case-insensitive. diff --git a/docs/sql-ref-syntax-aux-show-table.md b/docs/sql-ref-syntax-aux-show-table.md index 1aa44d3..c688a99 100644 --- a/docs/sql-ref-syntax-aux-show-table.md +++ b/docs/sql-ref-syntax-aux-show-table.md @@ -33,7 +33,7 @@ cannot be used with a partition specification. ### Syntax {% highlight sql %} -SHOW TABLE EXTENDED [ IN | FROM database_na
[spark] branch branch-3.0 updated: [SPARK-31556][SQL][DOCS] Document LIKE clause in SQL Reference
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new da8c7b8 [SPARK-31556][SQL][DOCS] Document LIKE clause in SQL Reference da8c7b8 is described below commit da8c7b8ceffa1566ae35280a2d1c3abcbff47542 Author: Huaxin Gao AuthorDate: Wed Apr 29 09:17:23 2020 +0900 [SPARK-31556][SQL][DOCS] Document LIKE clause in SQL Reference ### What changes were proposed in this pull request? Document LIKE clause in SQL Reference ### Why are the changes needed? To make SQL Reference complete ### Does this PR introduce any user-facing change? Yes https://user-images.githubusercontent.com/13592258/80294346-5babab80-871d-11ea-8ac9-51bbab0aca88.png;> https://user-images.githubusercontent.com/13592258/80294347-5ea69c00-871d-11ea-8c51-7a90ee20f7da.png;> https://user-images.githubusercontent.com/13592258/80294351-61a18c80-871d-11ea-9e75-e3345d2f52f5.png;> ### How was this patch tested? Manually build and check Closes #28332 from huaxingao/where_clause. Authored-by: Huaxin Gao Signed-off-by: Takeshi Yamamuro (cherry picked from commit d34cb59fb311c3d700e4f4f877b61b17cea313ee) Signed-off-by: Takeshi Yamamuro --- docs/_data/menu-sql.yaml | 2 + docs/sql-ref-syntax-aux-show-databases.md | 13 +++- docs/sql-ref-syntax-aux-show-functions.md | 8 +- docs/sql-ref-syntax-aux-show-table.md | 14 ++-- docs/sql-ref-syntax-aux-show-tables.md| 10 +-- docs/sql-ref-syntax-aux-show-views.md | 12 +-- docs/sql-ref-syntax-qry-explain.md| 2 +- docs/sql-ref-syntax-qry-select-like.md| 120 ++ 8 files changed, 154 insertions(+), 27 deletions(-) diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml index 1097079..dfe4cfa 100644 --- a/docs/_data/menu-sql.yaml +++ b/docs/_data/menu-sql.yaml @@ -170,6 +170,8 @@ url: sql-ref-syntax-qry-select-inline-table.html - text: Common Table Expression url: sql-ref-syntax-qry-select-cte.html +- text: LIKE Predicate + url: sql-ref-syntax-qry-select-like.html - text: Window Function url: sql-ref-syntax-qry-window.html - text: EXPLAIN diff --git a/docs/sql-ref-syntax-aux-show-databases.md b/docs/sql-ref-syntax-aux-show-databases.md index 0ed3452..3599009 100644 --- a/docs/sql-ref-syntax-aux-show-databases.md +++ b/docs/sql-ref-syntax-aux-show-databases.md @@ -29,16 +29,21 @@ and mean the same thing. ### Syntax {% highlight sql %} -SHOW { DATABASES | SCHEMAS } [ LIKE string_pattern ] +SHOW { DATABASES | SCHEMAS } [ LIKE regex_pattern ] {% endhighlight %} ### Parameters - LIKE string_pattern + regex_pattern -Specifies a string pattern that is used to match the databases in the system. In -the specified string pattern '*' matches any number of characters. +Specifies a regular expression pattern that is used to filter the results of the +statement. + + Only * and | are allowed as wildcard pattern. + Excluding * and |, the remaining pattern follows the regular expression semantics. + The leading and trailing blanks are trimmed in the input pattern before processing. The pattern match is case-insensitive. + diff --git a/docs/sql-ref-syntax-aux-show-functions.md b/docs/sql-ref-syntax-aux-show-functions.md index da33d99..ed22a3a 100644 --- a/docs/sql-ref-syntax-aux-show-functions.md +++ b/docs/sql-ref-syntax-aux-show-functions.md @@ -58,12 +58,12 @@ SHOW [ function_kind ] FUNCTIONS ( [ LIKE ] function_name | regex_pattern ) regex_pattern -Specifies a regular expression pattern that is used to limit the results of the +Specifies a regular expression pattern that is used to filter the results of the statement. - Only `*` and `|` are allowed as wildcard pattern. - Excluding `*` and `|` the remaining pattern follows the regex semantics. - The leading and trailing blanks are trimmed in the input pattern before processing. + Only * and | are allowed as wildcard pattern. + Excluding * and |, the remaining pattern follows the regular expression semantics. + The leading and trailing blanks are trimmed in the input pattern before processing. The pattern match is case-insensitive. diff --git a/docs/sql-ref-syntax-aux-show-table.md b/docs/sql-ref-syntax-aux-show-table.md index 1aa44d3..c688a99 100644 --- a/docs/sql-ref-syntax-aux-show-table.md +++ b/docs/sql-ref-syntax-aux-show-table.md @@ -33,7 +33,7 @@ cannot be used with a partition specification. ### Syntax {% highlight sql %} -SHOW TABLE EXTENDED [ IN | FROM database_na
[spark] branch master updated (dcc0902 -> d34cb59)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from dcc0902 [SPARK-29458][SQL][DOCS] Add a paragraph for scalar function in sql getting started add d34cb59 [SPARK-31556][SQL][DOCS] Document LIKE clause in SQL Reference No new revisions were added by this update. Summary of changes: docs/_data/menu-sql.yaml | 2 + docs/sql-ref-syntax-aux-show-databases.md | 13 +++- docs/sql-ref-syntax-aux-show-functions.md | 8 +- docs/sql-ref-syntax-aux-show-table.md | 14 ++-- docs/sql-ref-syntax-aux-show-tables.md| 10 +-- docs/sql-ref-syntax-aux-show-views.md | 12 +-- docs/sql-ref-syntax-qry-explain.md| 2 +- docs/sql-ref-syntax-qry-select-like.md| 120 ++ 8 files changed, 154 insertions(+), 27 deletions(-) create mode 100644 docs/sql-ref-syntax-qry-select-like.md - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-31556][SQL][DOCS] Document LIKE clause in SQL Reference
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d34cb59 [SPARK-31556][SQL][DOCS] Document LIKE clause in SQL Reference d34cb59 is described below commit d34cb59fb311c3d700e4f4f877b61b17cea313ee Author: Huaxin Gao AuthorDate: Wed Apr 29 09:17:23 2020 +0900 [SPARK-31556][SQL][DOCS] Document LIKE clause in SQL Reference ### What changes were proposed in this pull request? Document LIKE clause in SQL Reference ### Why are the changes needed? To make SQL Reference complete ### Does this PR introduce any user-facing change? Yes https://user-images.githubusercontent.com/13592258/80294346-5babab80-871d-11ea-8ac9-51bbab0aca88.png;> https://user-images.githubusercontent.com/13592258/80294347-5ea69c00-871d-11ea-8c51-7a90ee20f7da.png;> https://user-images.githubusercontent.com/13592258/80294351-61a18c80-871d-11ea-9e75-e3345d2f52f5.png;> ### How was this patch tested? Manually build and check Closes #28332 from huaxingao/where_clause. Authored-by: Huaxin Gao Signed-off-by: Takeshi Yamamuro --- docs/_data/menu-sql.yaml | 2 + docs/sql-ref-syntax-aux-show-databases.md | 13 +++- docs/sql-ref-syntax-aux-show-functions.md | 8 +- docs/sql-ref-syntax-aux-show-table.md | 14 ++-- docs/sql-ref-syntax-aux-show-tables.md| 10 +-- docs/sql-ref-syntax-aux-show-views.md | 12 +-- docs/sql-ref-syntax-qry-explain.md| 2 +- docs/sql-ref-syntax-qry-select-like.md| 120 ++ 8 files changed, 154 insertions(+), 27 deletions(-) diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml index 1097079..dfe4cfa 100644 --- a/docs/_data/menu-sql.yaml +++ b/docs/_data/menu-sql.yaml @@ -170,6 +170,8 @@ url: sql-ref-syntax-qry-select-inline-table.html - text: Common Table Expression url: sql-ref-syntax-qry-select-cte.html +- text: LIKE Predicate + url: sql-ref-syntax-qry-select-like.html - text: Window Function url: sql-ref-syntax-qry-window.html - text: EXPLAIN diff --git a/docs/sql-ref-syntax-aux-show-databases.md b/docs/sql-ref-syntax-aux-show-databases.md index 0ed3452..3599009 100644 --- a/docs/sql-ref-syntax-aux-show-databases.md +++ b/docs/sql-ref-syntax-aux-show-databases.md @@ -29,16 +29,21 @@ and mean the same thing. ### Syntax {% highlight sql %} -SHOW { DATABASES | SCHEMAS } [ LIKE string_pattern ] +SHOW { DATABASES | SCHEMAS } [ LIKE regex_pattern ] {% endhighlight %} ### Parameters - LIKE string_pattern + regex_pattern -Specifies a string pattern that is used to match the databases in the system. In -the specified string pattern '*' matches any number of characters. +Specifies a regular expression pattern that is used to filter the results of the +statement. + + Only * and | are allowed as wildcard pattern. + Excluding * and |, the remaining pattern follows the regular expression semantics. + The leading and trailing blanks are trimmed in the input pattern before processing. The pattern match is case-insensitive. + diff --git a/docs/sql-ref-syntax-aux-show-functions.md b/docs/sql-ref-syntax-aux-show-functions.md index da33d99..ed22a3a 100644 --- a/docs/sql-ref-syntax-aux-show-functions.md +++ b/docs/sql-ref-syntax-aux-show-functions.md @@ -58,12 +58,12 @@ SHOW [ function_kind ] FUNCTIONS ( [ LIKE ] function_name | regex_pattern ) regex_pattern -Specifies a regular expression pattern that is used to limit the results of the +Specifies a regular expression pattern that is used to filter the results of the statement. - Only `*` and `|` are allowed as wildcard pattern. - Excluding `*` and `|` the remaining pattern follows the regex semantics. - The leading and trailing blanks are trimmed in the input pattern before processing. + Only * and | are allowed as wildcard pattern. + Excluding * and |, the remaining pattern follows the regular expression semantics. + The leading and trailing blanks are trimmed in the input pattern before processing. The pattern match is case-insensitive. diff --git a/docs/sql-ref-syntax-aux-show-table.md b/docs/sql-ref-syntax-aux-show-table.md index 1aa44d3..c688a99 100644 --- a/docs/sql-ref-syntax-aux-show-table.md +++ b/docs/sql-ref-syntax-aux-show-table.md @@ -33,7 +33,7 @@ cannot be used with a partition specification. ### Syntax {% highlight sql %} -SHOW TABLE EXTENDED [ IN | FROM database_name ] LIKE 'identifier_with_wildcards' +SHOW TABLE EXTENDED [ IN | FROM database_name ] LIKE regex_pattern [ parti
[spark] branch branch-3.0 updated: [SPARK-31491][SQL][DOCS] Re-arrange Data Types page to document Floating Point Special Values
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 37002fe [SPARK-31491][SQL][DOCS] Re-arrange Data Types page to document Floating Point Special Values 37002fe is described below commit 37002fe69a58ba071ada798842d7e77c4cd6e47e Author: Huaxin Gao AuthorDate: Sat Apr 25 09:02:16 2020 +0900 [SPARK-31491][SQL][DOCS] Re-arrange Data Types page to document Floating Point Special Values ### What changes were proposed in this pull request? Re-arrange Data Types page to document Floating Point Special Values ### Why are the changes needed? To complete SQL Reference ### Does this PR introduce any user-facing change? Yes - add Floating Point Special Values in Data Types page - move NaN Semantics to Data Types page https://user-images.githubusercontent.com/13592258/80233996-3da25600-860c-11ea-8285-538efc16e431.png;> https://user-images.githubusercontent.com/13592258/80234001-4004b000-860c-11ea-8954-72f63c92d50d.png;> https://user-images.githubusercontent.com/13592258/80234006-41ce7380-860c-11ea-96bf-15e1aa2102ff.png;> ### How was this patch tested? Manually build and check Closes #28264 from huaxingao/datatypes. Authored-by: Huaxin Gao Signed-off-by: Takeshi Yamamuro (cherry picked from commit 054bef94ca7e84ff8e2e27af65e00e183f7be6da) Signed-off-by: Takeshi Yamamuro --- docs/_data/menu-sql.yaml | 2 - docs/sql-ref-datatypes.md | 119 ++ docs/sql-ref-nan-semantics.md | 29 -- 3 files changed, 119 insertions(+), 31 deletions(-) diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml index 26cca61..1097079 100644 --- a/docs/_data/menu-sql.yaml +++ b/docs/_data/menu-sql.yaml @@ -84,8 +84,6 @@ url: sql-ref-literals.html - text: Null Semantics url: sql-ref-null-semantics.html -- text: NaN Semantics - url: sql-ref-nan-semantics.html - text: ANSI Compliance url: sql-ref-ansi-compliance.html subitems: diff --git a/docs/sql-ref-datatypes.md b/docs/sql-ref-datatypes.md index 150e194..0d49f6f 100644 --- a/docs/sql-ref-datatypes.md +++ b/docs/sql-ref-datatypes.md @@ -19,6 +19,8 @@ license: | limitations under the License. --- +### Supported Data Types + Spark SQL and DataFrames support the following data types: * Numeric types @@ -706,3 +708,120 @@ The following table shows the type names as well as aliases used in Spark SQL pa + +### Floating Point Special Values + +Spark SQL supports several special floating point values in a case-insensitive manner: + + * Inf/+Inf/Infinity/+Infinity: positive infinity + * ```FloatType```: equivalent to Scala Float.PositiveInfinity. + * ```DoubleType```: equivalent to Scala Double.PositiveInfinity. + * -Inf/-Infinity: negative infinity + * ```FloatType```: equivalent to Scala Float.NegativeInfinity. + * ```DoubleType```: equivalent to Scala Double.NegativeInfinity. + * NaN: not a number + * ```FloatType```: equivalent to Scala Float.NaN. + * ```DoubleType```: equivalent to Scala Double.NaN. + + Positive/Negative Infinity Semantics + +There is special handling for positive and negative infinity. They have the following semantics: + + * Positive infinity multiplied by any positive value returns positive infinity. + * Negative infinity multiplied by any positive value returns negative infinity. + * Positive infinity multiplied by any negative value returns negative infinity. + * Negative infinity multiplied by any negative value returns positive infinity. + * Positive/negative infinity multiplied by 0 returns NaN. + * Positive/negative infinity is equal to itself. + * In aggregations, all positive infinity values are grouped together. Similarly, all negative infinity values are grouped together. + * Positive infinity and negative infinity are treated as normal values in join keys. + * Positive infinity sorts lower than NaN and higher than any other values. + * Negative infinity sorts lower than any other values. + + NaN Semantics + +There is special handling for not-a-number (NaN) when dealing with `float` or `double` types that +do not exactly match standard floating point semantics. +Specifically: + + * NaN = NaN returns true. + * In aggregations, all NaN values are grouped together. + * NaN is treated as a normal value in join keys. + * NaN values go last when in ascending order, larger than any other numeric value. + + Examples + +{% highlight sql %} +SELECT double('infinity') AS col; +++ +| col| +++ +|Infinity| +++ + +SELECT float('-inf') AS col; ++-+ +| col| ++-+ +|-Infinity| ++-+ + +SELECT float('NaN') AS col; ++-
[spark] branch master updated: [SPARK-31491][SQL][DOCS] Re-arrange Data Types page to document Floating Point Special Values
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 054bef9 [SPARK-31491][SQL][DOCS] Re-arrange Data Types page to document Floating Point Special Values 054bef9 is described below commit 054bef94ca7e84ff8e2e27af65e00e183f7be6da Author: Huaxin Gao AuthorDate: Sat Apr 25 09:02:16 2020 +0900 [SPARK-31491][SQL][DOCS] Re-arrange Data Types page to document Floating Point Special Values ### What changes were proposed in this pull request? Re-arrange Data Types page to document Floating Point Special Values ### Why are the changes needed? To complete SQL Reference ### Does this PR introduce any user-facing change? Yes - add Floating Point Special Values in Data Types page - move NaN Semantics to Data Types page https://user-images.githubusercontent.com/13592258/80233996-3da25600-860c-11ea-8285-538efc16e431.png;> https://user-images.githubusercontent.com/13592258/80234001-4004b000-860c-11ea-8954-72f63c92d50d.png;> https://user-images.githubusercontent.com/13592258/80234006-41ce7380-860c-11ea-96bf-15e1aa2102ff.png;> ### How was this patch tested? Manually build and check Closes #28264 from huaxingao/datatypes. Authored-by: Huaxin Gao Signed-off-by: Takeshi Yamamuro --- docs/_data/menu-sql.yaml | 2 - docs/sql-ref-datatypes.md | 119 ++ docs/sql-ref-nan-semantics.md | 29 -- 3 files changed, 119 insertions(+), 31 deletions(-) diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml index 26cca61..1097079 100644 --- a/docs/_data/menu-sql.yaml +++ b/docs/_data/menu-sql.yaml @@ -84,8 +84,6 @@ url: sql-ref-literals.html - text: Null Semantics url: sql-ref-null-semantics.html -- text: NaN Semantics - url: sql-ref-nan-semantics.html - text: ANSI Compliance url: sql-ref-ansi-compliance.html subitems: diff --git a/docs/sql-ref-datatypes.md b/docs/sql-ref-datatypes.md index 150e194..0d49f6f 100644 --- a/docs/sql-ref-datatypes.md +++ b/docs/sql-ref-datatypes.md @@ -19,6 +19,8 @@ license: | limitations under the License. --- +### Supported Data Types + Spark SQL and DataFrames support the following data types: * Numeric types @@ -706,3 +708,120 @@ The following table shows the type names as well as aliases used in Spark SQL pa + +### Floating Point Special Values + +Spark SQL supports several special floating point values in a case-insensitive manner: + + * Inf/+Inf/Infinity/+Infinity: positive infinity + * ```FloatType```: equivalent to Scala Float.PositiveInfinity. + * ```DoubleType```: equivalent to Scala Double.PositiveInfinity. + * -Inf/-Infinity: negative infinity + * ```FloatType```: equivalent to Scala Float.NegativeInfinity. + * ```DoubleType```: equivalent to Scala Double.NegativeInfinity. + * NaN: not a number + * ```FloatType```: equivalent to Scala Float.NaN. + * ```DoubleType```: equivalent to Scala Double.NaN. + + Positive/Negative Infinity Semantics + +There is special handling for positive and negative infinity. They have the following semantics: + + * Positive infinity multiplied by any positive value returns positive infinity. + * Negative infinity multiplied by any positive value returns negative infinity. + * Positive infinity multiplied by any negative value returns negative infinity. + * Negative infinity multiplied by any negative value returns positive infinity. + * Positive/negative infinity multiplied by 0 returns NaN. + * Positive/negative infinity is equal to itself. + * In aggregations, all positive infinity values are grouped together. Similarly, all negative infinity values are grouped together. + * Positive infinity and negative infinity are treated as normal values in join keys. + * Positive infinity sorts lower than NaN and higher than any other values. + * Negative infinity sorts lower than any other values. + + NaN Semantics + +There is special handling for not-a-number (NaN) when dealing with `float` or `double` types that +do not exactly match standard floating point semantics. +Specifically: + + * NaN = NaN returns true. + * In aggregations, all NaN values are grouped together. + * NaN is treated as a normal value in join keys. + * NaN values go last when in ascending order, larger than any other numeric value. + + Examples + +{% highlight sql %} +SELECT double('infinity') AS col; +++ +| col| +++ +|Infinity| +++ + +SELECT float('-inf') AS col; ++-+ +| col| ++-+ +|-Infinity| ++-+ + +SELECT float('NaN') AS col; ++---+ +|col| ++---+ +|NaN| ++---+ + +SELECT double('infinity') * 0 AS col; ++---+ +|col| ++---+ +|NaN| ++---+ + +SELE
[spark] branch branch-2.4 updated: [SPARK-31532][SQL] Builder should not propagate static sql configs to the existing active or default SparkSession
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new a2a0c52 [SPARK-31532][SQL] Builder should not propagate static sql configs to the existing active or default SparkSession a2a0c52 is described below commit a2a0c52d7b21ef1e5f06cee6c8c83ad82f8b1b0b Author: Kent Yao AuthorDate: Sat Apr 25 08:53:00 2020 +0900 [SPARK-31532][SQL] Builder should not propagate static sql configs to the existing active or default SparkSession ### What changes were proposed in this pull request? SparkSessionBuilder shoud not propagate static sql configurations to the existing active/default SparkSession This seems a long-standing bug. ```scala scala> spark.sql("set spark.sql.warehouse.dir").show +++ | key| value| +++ |spark.sql.warehou...|file:/Users/kenty...| +++ scala> spark.sql("set spark.sql.warehouse.dir=2"); org.apache.spark.sql.AnalysisException: Cannot modify the value of a static config: spark.sql.warehouse.dir; at org.apache.spark.sql.RuntimeConfig.requireNonStaticConf(RuntimeConfig.scala:154) at org.apache.spark.sql.RuntimeConfig.set(RuntimeConfig.scala:42) at org.apache.spark.sql.execution.command.SetCommand.$anonfun$x$7$6(SetCommand.scala:100) at org.apache.spark.sql.execution.command.SetCommand.run(SetCommand.scala:156) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79) at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229) at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3644) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3642) at org.apache.spark.sql.Dataset.(Dataset.scala:229) at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97) at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:607) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:602) ... 47 elided scala> import org.apache.spark.sql.SparkSession import org.apache.spark.sql.SparkSession scala> SparkSession.builder.config("spark.sql.warehouse.dir", "xyz").get getClass getOrCreate scala> SparkSession.builder.config("spark.sql.warehouse.dir", "xyz").getOrCreate 20/04/23 23:49:13 WARN SparkSession$Builder: Using an existing SparkSession; some configuration may not take effect. res7: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession6403d574 scala> spark.sql("set spark.sql.warehouse.dir").show ++-+ | key|value| ++-+ |spark.sql.warehou...| xyz| ++-+ scala> OptionsAttachments ``` ### Why are the changes needed? bugfix as shown in the previous section ### Does this PR introduce any user-facing change? Yes, static SQL configurations with SparkSession.builder.config do not propagate to any existing or new SparkSession instances. ### How was this patch tested? new ut. Closes #28316 from yaooqinn/SPARK-31532. Authored-by: Kent Yao Signed-off-by: Takeshi Yamamuro (cherry picked from commit 8424f552293677717da7411ed43e68e73aa7f0d6) Signed-off-by: Takeshi Yamamuro --- .../scala/org/apache/spark/sql/SparkSession.scala | 28 + .../spark/sql/SparkSessionBuilderSuite.scala | 49 +- 2 files changed, 67 insertions(+), 10 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scal
[spark] branch branch-3.0 updated: [SPARK-31532][SQL] Builder should not propagate static sql configs to the existing active or default SparkSession
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 0dbd69c [SPARK-31532][SQL] Builder should not propagate static sql configs to the existing active or default SparkSession 0dbd69c is described below commit 0dbd69c61492f537bc0326d6ad86b616577f46df Author: Kent Yao AuthorDate: Sat Apr 25 08:53:00 2020 +0900 [SPARK-31532][SQL] Builder should not propagate static sql configs to the existing active or default SparkSession ### What changes were proposed in this pull request? SparkSessionBuilder shoud not propagate static sql configurations to the existing active/default SparkSession This seems a long-standing bug. ```scala scala> spark.sql("set spark.sql.warehouse.dir").show +++ | key| value| +++ |spark.sql.warehou...|file:/Users/kenty...| +++ scala> spark.sql("set spark.sql.warehouse.dir=2"); org.apache.spark.sql.AnalysisException: Cannot modify the value of a static config: spark.sql.warehouse.dir; at org.apache.spark.sql.RuntimeConfig.requireNonStaticConf(RuntimeConfig.scala:154) at org.apache.spark.sql.RuntimeConfig.set(RuntimeConfig.scala:42) at org.apache.spark.sql.execution.command.SetCommand.$anonfun$x$7$6(SetCommand.scala:100) at org.apache.spark.sql.execution.command.SetCommand.run(SetCommand.scala:156) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79) at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229) at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3644) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3642) at org.apache.spark.sql.Dataset.(Dataset.scala:229) at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97) at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:607) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:602) ... 47 elided scala> import org.apache.spark.sql.SparkSession import org.apache.spark.sql.SparkSession scala> SparkSession.builder.config("spark.sql.warehouse.dir", "xyz").get getClass getOrCreate scala> SparkSession.builder.config("spark.sql.warehouse.dir", "xyz").getOrCreate 20/04/23 23:49:13 WARN SparkSession$Builder: Using an existing SparkSession; some configuration may not take effect. res7: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession6403d574 scala> spark.sql("set spark.sql.warehouse.dir").show ++-+ | key|value| ++-+ |spark.sql.warehou...| xyz| ++-+ scala> OptionsAttachments ``` ### Why are the changes needed? bugfix as shown in the previous section ### Does this PR introduce any user-facing change? Yes, static SQL configurations with SparkSession.builder.config do not propagate to any existing or new SparkSession instances. ### How was this patch tested? new ut. Closes #28316 from yaooqinn/SPARK-31532. Authored-by: Kent Yao Signed-off-by: Takeshi Yamamuro (cherry picked from commit 8424f552293677717da7411ed43e68e73aa7f0d6) Signed-off-by: Takeshi Yamamuro --- .../scala/org/apache/spark/sql/SparkSession.scala | 28 + .../spark/sql/SparkSessionBuilderSuite.scala | 49 +- 2 files changed, 67 insertions(+), 10 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scal
[spark] branch branch-2.4 updated: [SPARK-31532][SQL] Builder should not propagate static sql configs to the existing active or default SparkSession
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new a2a0c52 [SPARK-31532][SQL] Builder should not propagate static sql configs to the existing active or default SparkSession a2a0c52 is described below commit a2a0c52d7b21ef1e5f06cee6c8c83ad82f8b1b0b Author: Kent Yao AuthorDate: Sat Apr 25 08:53:00 2020 +0900 [SPARK-31532][SQL] Builder should not propagate static sql configs to the existing active or default SparkSession ### What changes were proposed in this pull request? SparkSessionBuilder shoud not propagate static sql configurations to the existing active/default SparkSession This seems a long-standing bug. ```scala scala> spark.sql("set spark.sql.warehouse.dir").show +++ | key| value| +++ |spark.sql.warehou...|file:/Users/kenty...| +++ scala> spark.sql("set spark.sql.warehouse.dir=2"); org.apache.spark.sql.AnalysisException: Cannot modify the value of a static config: spark.sql.warehouse.dir; at org.apache.spark.sql.RuntimeConfig.requireNonStaticConf(RuntimeConfig.scala:154) at org.apache.spark.sql.RuntimeConfig.set(RuntimeConfig.scala:42) at org.apache.spark.sql.execution.command.SetCommand.$anonfun$x$7$6(SetCommand.scala:100) at org.apache.spark.sql.execution.command.SetCommand.run(SetCommand.scala:156) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79) at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229) at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3644) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3642) at org.apache.spark.sql.Dataset.(Dataset.scala:229) at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97) at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:607) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:602) ... 47 elided scala> import org.apache.spark.sql.SparkSession import org.apache.spark.sql.SparkSession scala> SparkSession.builder.config("spark.sql.warehouse.dir", "xyz").get getClass getOrCreate scala> SparkSession.builder.config("spark.sql.warehouse.dir", "xyz").getOrCreate 20/04/23 23:49:13 WARN SparkSession$Builder: Using an existing SparkSession; some configuration may not take effect. res7: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession6403d574 scala> spark.sql("set spark.sql.warehouse.dir").show ++-+ | key|value| ++-+ |spark.sql.warehou...| xyz| ++-+ scala> OptionsAttachments ``` ### Why are the changes needed? bugfix as shown in the previous section ### Does this PR introduce any user-facing change? Yes, static SQL configurations with SparkSession.builder.config do not propagate to any existing or new SparkSession instances. ### How was this patch tested? new ut. Closes #28316 from yaooqinn/SPARK-31532. Authored-by: Kent Yao Signed-off-by: Takeshi Yamamuro (cherry picked from commit 8424f552293677717da7411ed43e68e73aa7f0d6) Signed-off-by: Takeshi Yamamuro --- .../scala/org/apache/spark/sql/SparkSession.scala | 28 + .../spark/sql/SparkSessionBuilderSuite.scala | 49 +- 2 files changed, 67 insertions(+), 10 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scal
[spark] branch master updated (6a57616 -> 8424f55)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 6a57616 [SPARK-31364][SQL][TESTS] Benchmark Parquet Nested Field Predicate Pushdown add 8424f55 [SPARK-31532][SQL] Builder should not propagate static sql configs to the existing active or default SparkSession No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/SparkSession.scala | 28 + .../spark/sql/SparkSessionBuilderSuite.scala | 49 +- 2 files changed, 67 insertions(+), 10 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (463c544 -> b10263b)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 463c544 [SPARK-31010][SQL][DOC][FOLLOW-UP] Improve deprecated warning message for untyped scala udf add b10263b [SPARK-30724][SQL] Support 'LIKE ANY' and 'LIKE ALL' operators No new revisions were added by this update. Summary of changes: .../apache/spark/sql/catalyst/parser/SqlBase.g4| 1 + .../spark/sql/catalyst/parser/AstBuilder.scala | 33 +++-- .../catalyst/parser/ExpressionParserSuite.scala| 17 ++- .../test/resources/sql-tests/inputs/like-all.sql | 39 ++ .../test/resources/sql-tests/inputs/like-any.sql | 39 ++ .../resources/sql-tests/results/like-all.sql.out | 140 .../resources/sql-tests/results/like-any.sql.out | 146 + 7 files changed, 405 insertions(+), 10 deletions(-) create mode 100644 sql/core/src/test/resources/sql-tests/inputs/like-all.sql create mode 100644 sql/core/src/test/resources/sql-tests/inputs/like-any.sql create mode 100644 sql/core/src/test/resources/sql-tests/results/like-all.sql.out create mode 100644 sql/core/src/test/resources/sql-tests/results/like-any.sql.out - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (463c544 -> b10263b)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 463c544 [SPARK-31010][SQL][DOC][FOLLOW-UP] Improve deprecated warning message for untyped scala udf add b10263b [SPARK-30724][SQL] Support 'LIKE ANY' and 'LIKE ALL' operators No new revisions were added by this update. Summary of changes: .../apache/spark/sql/catalyst/parser/SqlBase.g4| 1 + .../spark/sql/catalyst/parser/AstBuilder.scala | 33 +++-- .../catalyst/parser/ExpressionParserSuite.scala| 17 ++- .../test/resources/sql-tests/inputs/like-all.sql | 39 ++ .../test/resources/sql-tests/inputs/like-any.sql | 39 ++ .../resources/sql-tests/results/like-all.sql.out | 140 .../resources/sql-tests/results/like-any.sql.out | 146 + 7 files changed, 405 insertions(+), 10 deletions(-) create mode 100644 sql/core/src/test/resources/sql-tests/inputs/like-all.sql create mode 100644 sql/core/src/test/resources/sql-tests/inputs/like-any.sql create mode 100644 sql/core/src/test/resources/sql-tests/results/like-all.sql.out create mode 100644 sql/core/src/test/resources/sql-tests/results/like-any.sql.out - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-30724][SQL] Support 'LIKE ANY' and 'LIKE ALL' operators
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new b10263b [SPARK-30724][SQL] Support 'LIKE ANY' and 'LIKE ALL' operators b10263b is described below commit b10263b8e5106409467e0115968bbaf0b9141cd1 Author: Yuming Wang AuthorDate: Fri Apr 24 22:20:32 2020 +0900 [SPARK-30724][SQL] Support 'LIKE ANY' and 'LIKE ALL' operators ### What changes were proposed in this pull request? `LIKE ANY/SOME` and `LIKE ALL` operators are mostly used when we are matching a text field with numbers of patterns. For example: Teradata / Hive 3.0 / Snowflake: ```sql --like any select 'foo' LIKE ANY ('%foo%','%bar%'); --like all select 'foo' LIKE ALL ('%foo%','%bar%'); ``` PostgreSQL: ```sql -- like any select 'foo' LIKE ANY (array['%foo%','%bar%']); -- like all select 'foo' LIKE ALL (array['%foo%','%bar%']); ``` This PR add support these two operators. More details: https://docs.teradata.com/reader/756LNiPSFdY~4JcCCcR5Cw/4~AyrPNmDN0Xk4SALLo6aQ https://issues.apache.org/jira/browse/HIVE-15229 https://docs.snowflake.net/manuals/sql-reference/functions/like_any.html ### Why are the changes needed? To smoothly migrate SQLs to Spark SQL. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Unit test. Closes #27477 from wangyum/SPARK-30724. Authored-by: Yuming Wang Signed-off-by: Takeshi Yamamuro --- .../apache/spark/sql/catalyst/parser/SqlBase.g4| 1 + .../spark/sql/catalyst/parser/AstBuilder.scala | 33 +++-- .../catalyst/parser/ExpressionParserSuite.scala| 17 ++- .../test/resources/sql-tests/inputs/like-all.sql | 39 ++ .../test/resources/sql-tests/inputs/like-any.sql | 39 ++ .../resources/sql-tests/results/like-all.sql.out | 140 .../resources/sql-tests/results/like-any.sql.out | 146 + 7 files changed, 405 insertions(+), 10 deletions(-) diff --git a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 index d78f584..e49bc07 100644 --- a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 +++ b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 @@ -766,6 +766,7 @@ predicate | NOT? kind=IN '(' expression (',' expression)* ')' | NOT? kind=IN '(' query ')' | NOT? kind=RLIKE pattern=valueExpression +| NOT? kind=LIKE quantifier=(ANY | SOME | ALL) ('('')' | '(' expression (',' expression)* ')') | NOT? kind=LIKE pattern=valueExpression (ESCAPE escapeChar=STRING)? | IS NOT? kind=NULL | IS NOT? kind=(TRUE | FALSE | UNKNOWN) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala index ff362e7..e51b8f3 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala @@ -1373,7 +1373,7 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging * Add a predicate to the given expression. Supported expressions are: * - (NOT) BETWEEN * - (NOT) IN - * - (NOT) LIKE + * - (NOT) LIKE (ANY | SOME | ALL) * - (NOT) RLIKE * - IS (NOT) NULL. * - IS (NOT) (TRUE | FALSE | UNKNOWN) @@ -1391,6 +1391,14 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging case other => Seq(other) } +def getLikeQuantifierExprs(expressions: java.util.List[ExpressionContext]): Seq[Expression] = { + if (expressions.isEmpty) { +throw new ParseException("Expected something between '(' and ')'.", ctx) + } else { +expressions.asScala.map(expression).map(p => invertIfNotDefined(new Like(e, p))) + } +} + // Create the predicate. ctx.kind.getType match { case SqlBaseParser.BETWEEN => @@ -1403,14 +1411,21 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging case SqlBaseParser.IN => invertIfNotDefined(In(e, ctx.expression.asScala.map(expression))) case SqlBaseParser.LIKE => -val escapeChar = Option(ctx.escapeChar).map(string).map { str => - if (str.length != 1) { -throw new ParseException("Invalid escape string." + - "Escape string must contains only one character.", ctx) - } - str.charAt(0) -}.getOrElse('\\') -invertIfNotDef
[spark] branch branch-3.0 updated: [SPARK-31465][SQL][DOCS][FOLLOW-UP] Document Literal in SQL Reference
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 0f02997 [SPARK-31465][SQL][DOCS][FOLLOW-UP] Document Literal in SQL Reference 0f02997 is described below commit 0f0299749421bc7328c6b962a9305bf460f51ddf Author: Huaxin Gao AuthorDate: Thu Apr 23 15:03:20 2020 +0900 [SPARK-31465][SQL][DOCS][FOLLOW-UP] Document Literal in SQL Reference ### What changes were proposed in this pull request? Need to address a few more comments ### Why are the changes needed? Fix a few problems ### Does this PR introduce any user-facing change? Yes ### How was this patch tested? Manually build and check Closes #28306 from huaxingao/literal-folllowup. Authored-by: Huaxin Gao Signed-off-by: Takeshi Yamamuro (cherry picked from commit f543d6a1ee76de8cae417ff480fa9c0e0ce5343d) Signed-off-by: Takeshi Yamamuro --- docs/sql-ref-literals.md | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/docs/sql-ref-literals.md b/docs/sql-ref-literals.md index 7cf078c..0088f79 100644 --- a/docs/sql-ref-literals.md +++ b/docs/sql-ref-literals.md @@ -437,11 +437,11 @@ SELECT TIMESTAMP '1997-01-31 09:26:56.123' AS col; |1997-01-31 09:26:56.123| +---+ -SELECT TIMESTAMP '1997-01-31 09:26:56.CST' AS col; +SELECT TIMESTAMP '1997-01-31 09:26:56.UTC+08:00' AS col; +--+ | col | +--+ -|1997-01-31 07:26:56.66| +|1997-01-30 17:26:56.66| +--+ SELECT TIMESTAMP '1997-01' AS col; @@ -508,7 +508,7 @@ SELECT INTERVAL -2 HOUR '3' MINUTE AS col; |-1 hours -57 minutes| ++ -SELECT INTERVAL 'INTERVAL 1 YEAR 2 DAYS 3 HOURS'; +SELECT INTERVAL '1 YEAR 2 DAYS 3 HOURS'; +--+ | col| +--+ @@ -523,6 +523,13 @@ SELECT INTERVAL 1 YEARS 2 MONTH 3 WEEK 4 DAYS 5 HOUR 6 MINUTES 7 SECOND 8 |1 years 2 months 25 days 5 hours 6 minutes 7.008009 seconds| +---+ +SELECT INTERVAL '2-3' YEAR TO MONTH AS col; +++ +| col| +++ +|2 years 3 months| +++ + SELECT INTERVAL '20 15:40:32.9989' DAY TO SECOND AS col; +-+ | col| - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ca90e19 -> f543d6a)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ca90e19 [SPARK-31515][SQL] Canonicalize Cast should consider the value of needTimeZone add f543d6a [SPARK-31465][SQL][DOCS][FOLLOW-UP] Document Literal in SQL Reference No new revisions were added by this update. Summary of changes: docs/sql-ref-literals.md | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (03fe9ee -> ca90e19)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 03fe9ee [SPARK-31465][SQL][DOCS] Document Literal in SQL Reference add ca90e19 [SPARK-31515][SQL] Canonicalize Cast should consider the value of needTimeZone No new revisions were added by this update. Summary of changes: .../apache/spark/sql/catalyst/expressions/Canonicalize.scala | 10 +- .../org/apache/spark/sql/catalyst/expressions/Cast.scala | 4 +++- .../spark/sql/catalyst/expressions/CanonicalizeSuite.scala| 11 ++- 3 files changed, 22 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31515][SQL] Canonicalize Cast should consider the value of needTimeZone
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 2ebef75 [SPARK-31515][SQL] Canonicalize Cast should consider the value of needTimeZone 2ebef75 is described below commit 2ebef75ec654bdbb01a4fa6a85225a7503de84b7 Author: Yuanjian Li AuthorDate: Thu Apr 23 14:32:10 2020 +0900 [SPARK-31515][SQL] Canonicalize Cast should consider the value of needTimeZone ### What changes were proposed in this pull request? Override the canonicalized fields with respect to the result of `needsTimeZone`. ### Why are the changes needed? The current approach breaks sematic equal of two cast expressions that don't relate with datetime type. If we don't need to use `timeZone` information casting `from` type to `to` type, then the timeZoneId should not influence the canonicalize result. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? New UT added. Closes #28288 from xuanyuanking/SPARK-31515. Authored-by: Yuanjian Li Signed-off-by: Takeshi Yamamuro (cherry picked from commit ca90e1932dcdc43748297c627ec857b6ea97dff7) Signed-off-by: Takeshi Yamamuro --- .../apache/spark/sql/catalyst/expressions/Canonicalize.scala | 10 +- .../org/apache/spark/sql/catalyst/expressions/Cast.scala | 4 +++- .../spark/sql/catalyst/expressions/CanonicalizeSuite.scala| 11 ++- 3 files changed, 22 insertions(+), 3 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala index 4d218b9..a803108 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala @@ -27,6 +27,7 @@ package org.apache.spark.sql.catalyst.expressions * The following rules are applied: * - Names and nullability hints for [[org.apache.spark.sql.types.DataType]]s are stripped. * - Names for [[GetStructField]] are stripped. + * - TimeZoneId for [[Cast]] and [[AnsiCast]] are stripped if `needsTimeZone` is false. * - Commutative and associative operations ([[Add]] and [[Multiply]]) have their children ordered *by `hashCode`. * - [[EqualTo]] and [[EqualNullSafe]] are reordered by `hashCode`. @@ -35,7 +36,7 @@ package org.apache.spark.sql.catalyst.expressions */ object Canonicalize { def execute(e: Expression): Expression = { -expressionReorder(ignoreNamesTypes(e)) +expressionReorder(ignoreTimeZone(ignoreNamesTypes(e))) } /** Remove names and nullability from types, and names from `GetStructField`. */ @@ -46,6 +47,13 @@ object Canonicalize { case _ => e } + /** Remove TimeZoneId for Cast if needsTimeZone return false. */ + private[expressions] def ignoreTimeZone(e: Expression): Expression = e match { +case c: CastBase if c.timeZoneId.nonEmpty && !c.needsTimeZone => + c.withTimeZone(null) +case _ => e + } + /** Collects adjacent commutative operations. */ private def gatherCommutative( e: Expression, diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala index 8d82956..fa615d7 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala @@ -279,7 +279,7 @@ abstract class CastBase extends UnaryExpression with TimeZoneAwareExpression wit override lazy val resolved: Boolean = childrenResolved && checkInputDataTypes().isSuccess && (!needsTimeZone || timeZoneId.isDefined) - private[this] def needsTimeZone: Boolean = Cast.needsTimeZone(child.dataType, dataType) + def needsTimeZone: Boolean = Cast.needsTimeZone(child.dataType, dataType) // [[func]] assumes the input is no longer null because eval already does the null check. @inline private[this] def buildCast[T](a: Any, func: T => Any): Any = func(a.asInstanceOf[T]) @@ -1708,6 +1708,7 @@ abstract class CastBase extends UnaryExpression with TimeZoneAwareExpression wit """) case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String] = None) extends CastBase { + override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression = copy(timeZoneId = Option(timeZoneId)) @@ -1724,6 +1725,7 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String */ case class AnsiCast(child: Expression, dataType: DataType, timeZoneId: Option[Stri
[spark] branch branch-3.0 updated: [SPARK-31465][SQL][DOCS] Document Literal in SQL Reference
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new ed3e4bd [SPARK-31465][SQL][DOCS] Document Literal in SQL Reference ed3e4bd is described below commit ed3e4bd09a6a183c3ba181eea7f2d47bde7fb1db Author: Huaxin Gao AuthorDate: Thu Apr 23 14:12:10 2020 +0900 [SPARK-31465][SQL][DOCS] Document Literal in SQL Reference ### What changes were proposed in this pull request? Document Literal in SQL Reference ### Why are the changes needed? Make SQL Reference complete ### Does this PR introduce any user-facing change? Yes https://user-images.githubusercontent.com/13592258/80057912-9ecb0c00-84dc-11ea-881e-1415108d674f.png;> https://user-images.githubusercontent.com/13592258/80057917-a12d6600-84dc-11ea-8884-81f2a94644d5.png;> https://user-images.githubusercontent.com/13592258/80057922-a4c0ed00-84dc-11ea-9857-75db50f7b054.png;> https://user-images.githubusercontent.com/13592258/80057927-a7234700-84dc-11ea-9124-45ae1f6143fd.png;> https://user-images.githubusercontent.com/13592258/80057932-ab4f6480-84dc-11ea-8393-cf005af13ce9.png;> https://user-images.githubusercontent.com/13592258/80057936-ad192800-84dc-11ea-8d78-9f071a82f1df.png;> https://user-images.githubusercontent.com/13592258/80057940-b0141880-84dc-11ea-97a7-f787cad0ee03.png;> https://user-images.githubusercontent.com/13592258/80057945-b30f0900-84dc-11ea-985f-c070609e2329.png;> https://user-images.githubusercontent.com/13592258/80057949-b5716300-84dc-11ea-9452-3f51137fe03d.png;> https://user-images.githubusercontent.com/13592258/80057957-b904ea00-84dc-11ea-8b12-a6f00362aa55.png;> https://user-images.githubusercontent.com/13592258/80057962-bacead80-84dc-11ea-94da-916b1d1c1756.png;> ### How was this patch tested? Manually build and check Closes #28237 from huaxingao/literal. Authored-by: Huaxin Gao Signed-off-by: Takeshi Yamamuro (cherry picked from commit 03fe9ee428ebb4544f0f47e861bccea43e0cb325) Signed-off-by: Takeshi Yamamuro --- docs/_data/menu-sql.yaml | 2 + docs/sql-ref-literals.md | 532 +++ 2 files changed, 534 insertions(+) diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml index a16e114..5d8c265 100644 --- a/docs/_data/menu-sql.yaml +++ b/docs/_data/menu-sql.yaml @@ -78,6 +78,8 @@ subitems: - text: Data Types url: sql-ref-datatypes.html +- text: Literals + url: sql-ref-literals.html - text: Null Semantics url: sql-ref-null-semantics.html - text: NaN Semantics diff --git a/docs/sql-ref-literals.md b/docs/sql-ref-literals.md new file mode 100644 index 000..7cf078c --- /dev/null +++ b/docs/sql-ref-literals.md @@ -0,0 +1,532 @@ +--- +layout: global +title: Literals +displayTitle: Literals +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +A literal (also known as a constant) represents a fixed data value. Spark SQL supports the following literals: + + * [String Literal](#string-literal) + * [Binary Literal](#binary-literal) + * [Null Literal](#null-literal) + * [Boolean Literal](#boolean-literal) + * [Numeric Literal](#numeric-literal) + * [Datetime Literal](#datetime-literal) + * [Interval Literal](#interval-literal) + +### String Literal + +A string literal is used to specify a character string value. + + Syntax + +{% highlight sql %} +'c [ ... ]' | "c [ ... ]" +{% endhighlight %} + + Parameters + + + c + +One character from the character set. Use \ to escape special characters (e.g., ' or \). + + + + Examples + +{% highlight sql %} +SELECT 'Hello, World!' AS col; ++-+ +| col| ++-+ +|Hello, World!| ++-+ + +SELECT "SPARK SQL" AS col; ++-+ +| col| ++-+ +|Spark SQL| ++-+ + +SELECT 'it\'s $10.' AS col; ++-+ +| col| ++-+ +|It's
[spark] branch master updated: [SPARK-31465][SQL][DOCS] Document Literal in SQL Reference
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 03fe9ee [SPARK-31465][SQL][DOCS] Document Literal in SQL Reference 03fe9ee is described below commit 03fe9ee428ebb4544f0f47e861bccea43e0cb325 Author: Huaxin Gao AuthorDate: Thu Apr 23 14:12:10 2020 +0900 [SPARK-31465][SQL][DOCS] Document Literal in SQL Reference ### What changes were proposed in this pull request? Document Literal in SQL Reference ### Why are the changes needed? Make SQL Reference complete ### Does this PR introduce any user-facing change? Yes https://user-images.githubusercontent.com/13592258/80057912-9ecb0c00-84dc-11ea-881e-1415108d674f.png;> https://user-images.githubusercontent.com/13592258/80057917-a12d6600-84dc-11ea-8884-81f2a94644d5.png;> https://user-images.githubusercontent.com/13592258/80057922-a4c0ed00-84dc-11ea-9857-75db50f7b054.png;> https://user-images.githubusercontent.com/13592258/80057927-a7234700-84dc-11ea-9124-45ae1f6143fd.png;> https://user-images.githubusercontent.com/13592258/80057932-ab4f6480-84dc-11ea-8393-cf005af13ce9.png;> https://user-images.githubusercontent.com/13592258/80057936-ad192800-84dc-11ea-8d78-9f071a82f1df.png;> https://user-images.githubusercontent.com/13592258/80057940-b0141880-84dc-11ea-97a7-f787cad0ee03.png;> https://user-images.githubusercontent.com/13592258/80057945-b30f0900-84dc-11ea-985f-c070609e2329.png;> https://user-images.githubusercontent.com/13592258/80057949-b5716300-84dc-11ea-9452-3f51137fe03d.png;> https://user-images.githubusercontent.com/13592258/80057957-b904ea00-84dc-11ea-8b12-a6f00362aa55.png;> https://user-images.githubusercontent.com/13592258/80057962-bacead80-84dc-11ea-94da-916b1d1c1756.png;> ### How was this patch tested? Manually build and check Closes #28237 from huaxingao/literal. Authored-by: Huaxin Gao Signed-off-by: Takeshi Yamamuro --- docs/_data/menu-sql.yaml | 2 + docs/sql-ref-literals.md | 532 +++ 2 files changed, 534 insertions(+) diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml index a16e114..5d8c265 100644 --- a/docs/_data/menu-sql.yaml +++ b/docs/_data/menu-sql.yaml @@ -78,6 +78,8 @@ subitems: - text: Data Types url: sql-ref-datatypes.html +- text: Literals + url: sql-ref-literals.html - text: Null Semantics url: sql-ref-null-semantics.html - text: NaN Semantics diff --git a/docs/sql-ref-literals.md b/docs/sql-ref-literals.md new file mode 100644 index 000..7cf078c --- /dev/null +++ b/docs/sql-ref-literals.md @@ -0,0 +1,532 @@ +--- +layout: global +title: Literals +displayTitle: Literals +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +A literal (also known as a constant) represents a fixed data value. Spark SQL supports the following literals: + + * [String Literal](#string-literal) + * [Binary Literal](#binary-literal) + * [Null Literal](#null-literal) + * [Boolean Literal](#boolean-literal) + * [Numeric Literal](#numeric-literal) + * [Datetime Literal](#datetime-literal) + * [Interval Literal](#interval-literal) + +### String Literal + +A string literal is used to specify a character string value. + + Syntax + +{% highlight sql %} +'c [ ... ]' | "c [ ... ]" +{% endhighlight %} + + Parameters + + + c + +One character from the character set. Use \ to escape special characters (e.g., ' or \). + + + + Examples + +{% highlight sql %} +SELECT 'Hello, World!' AS col; ++-+ +| col| ++-+ +|Hello, World!| ++-+ + +SELECT "SPARK SQL" AS col; ++-+ +| col| ++-+ +|Spark SQL| ++-+ + +SELECT 'it\'s $10.' AS col; ++-+ +| col| ++-+ +|It's $10.| ++-+ +{% endhighlight %} + +### Binary Literal + +A binary literal is used to specify a byte se
[spark] branch master updated: [SPARK-31477][SQL] Dump codegen and compile time in BenchmarkQueryTest
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6bf5f01 [SPARK-31477][SQL] Dump codegen and compile time in BenchmarkQueryTest 6bf5f01 is described below commit 6bf5f01a4a8b7708ce563e0a0e9a49e8ff89c71e Author: gatorsmile AuthorDate: Sat Apr 18 20:59:45 2020 +0900 [SPARK-31477][SQL] Dump codegen and compile time in BenchmarkQueryTest ### What changes were proposed in this pull request? This PR is to dump the codegen and compilation time for benchmark query tests. ### Why are the changes needed? Measure the codegen and compilation time costs in TPC-DS queries ### Does this PR introduce any user-facing change? No ### How was this patch tested? Manual test in my local laptop: ``` 23:13:12.845 WARN org.apache.spark.sql.TPCDSQuerySuite: === Metrics of Whole-stage Codegen === Total code generation time: 21.275102261 seconds Total compilation time: 12.223771828 seconds ``` Closes #28252 from gatorsmile/testMastercode. Authored-by: gatorsmile Signed-off-by: Takeshi Yamamuro --- .../sql/catalyst/expressions/codegen/CodeGenerator.scala| 2 +- .../apache/spark/sql/execution/WholeStageCodegenExec.scala | 2 +- .../scala/org/apache/spark/sql/BenchmarkQueryTest.scala | 13 + .../test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala | 13 +++-- 4 files changed, 22 insertions(+), 8 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala index 3042a27..1cc7836 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala @@ -1324,7 +1324,7 @@ object CodeGenerator extends Logging { // Reset compile time. // Visible for testing - def resetCompileTime: Unit = _compileTime.reset() + def resetCompileTime(): Unit = _compileTime.reset() /** * Compile the Java source code into a Java class, using Janino. diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala index 9f6e4fc..0244542 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala @@ -586,7 +586,7 @@ object WholeStageCodegenExec { // Reset generation time of Java source code. // Visible for testing - def resetCodeGenTime: Unit = _codeGenTime.set(0L) + def resetCodeGenTime(): Unit = _codeGenTime.set(0L) } /** diff --git a/sql/core/src/test/scala/org/apache/spark/sql/BenchmarkQueryTest.scala b/sql/core/src/test/scala/org/apache/spark/sql/BenchmarkQueryTest.scala index 07afd41..2c3b37a 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/BenchmarkQueryTest.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/BenchmarkQueryTest.scala @@ -20,6 +20,7 @@ package org.apache.spark.sql import org.apache.spark.internal.config.Tests.IS_TESTING import org.apache.spark.sql.catalyst.expressions.codegen.{ByteCodeStats, CodeFormatter, CodeGenerator} import org.apache.spark.sql.catalyst.rules.RuleExecutor +import org.apache.spark.sql.catalyst.util.DateTimeConstants.NANOS_PER_SECOND import org.apache.spark.sql.execution.{SparkPlan, WholeStageCodegenExec} import org.apache.spark.sql.test.SharedSparkSession import org.apache.spark.util.Utils @@ -36,7 +37,17 @@ abstract class BenchmarkQueryTest extends QueryTest with SharedSparkSession { protected override def afterAll(): Unit = { try { // For debugging dump some statistics about how much time was spent in various optimizer rules + // code generation, and compilation. logWarning(RuleExecutor.dumpTimeSpent()) + val codeGenTime = WholeStageCodegenExec.codeGenTime.toDouble / NANOS_PER_SECOND + val compileTime = CodeGenerator.compileTime.toDouble / NANOS_PER_SECOND + val codegenInfo = +s""" + |=== Metrics of Whole-stage Codegen === + |Total code generation time: $codeGenTime seconds + |Total compile time: $compileTime seconds + """.stripMargin + logWarning(codegenInfo) spark.sessionState.catalog.reset() } finally { super.afterAll() @@ -46,6 +57,8 @@ abstract class BenchmarkQueryTest extends QueryTest with SharedSparkSession { override def beforeAll(): Unit = { super.beforeAll() RuleExecutor.resetMetrics() +
[spark] branch branch-3.0 updated: [SPARK-31390][SQL][DOCS] Document Window Function in SQL Syntax Section
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 1139e9b [SPARK-31390][SQL][DOCS] Document Window Function in SQL Syntax Section 1139e9b is described below commit 1139e9b50150e3a99a9c8df0ed57d3fd2b391788 Author: Huaxin Gao AuthorDate: Sat Apr 18 09:31:52 2020 +0900 [SPARK-31390][SQL][DOCS] Document Window Function in SQL Syntax Section ### What changes were proposed in this pull request? Document Window Function in SQL syntax ### Why are the changes needed? Make SQL Reference complete ### Does this PR introduce any user-facing change? Yes https://user-images.githubusercontent.com/13592258/79531509-7bf5af00-8027-11ea-8291-a91b2e97a1b5.png;> https://user-images.githubusercontent.com/13592258/79531514-7e580900-8027-11ea-8761-4c5a888c476f.png;> https://user-images.githubusercontent.com/13592258/79531518-82842680-8027-11ea-876f-6375aa5b5ead.png;> https://user-images.githubusercontent.com/13592258/79531521-844dea00-8027-11ea-8948-712f054d42ee.png;> https://user-images.githubusercontent.com/13592258/79531528-8748da80-8027-11ea-9dae-a465286982ac.png;> ### How was this patch tested? Manually build and check Closes #28220 from huaxingao/sql-win-fun. Authored-by: Huaxin Gao Signed-off-by: Takeshi Yamamuro (cherry picked from commit 142f43629c42ad750d9b506283191aa830d95c08) Signed-off-by: Takeshi Yamamuro --- docs/_data/menu-sql.yaml | 2 + docs/sql-ref-syntax-qry-window.md | 190 +- 2 files changed, 189 insertions(+), 3 deletions(-) diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml index 7827a0f..5042c2588 100644 --- a/docs/_data/menu-sql.yaml +++ b/docs/_data/menu-sql.yaml @@ -168,6 +168,8 @@ url: sql-ref-syntax-qry-select-inline-table.html - text: Common Table Expression url: sql-ref-syntax-qry-select-cte.html +- text: Window Function + url: sql-ref-syntax-qry-window.html - text: EXPLAIN url: sql-ref-syntax-qry-explain.html - text: Auxiliary Statements diff --git a/docs/sql-ref-syntax-qry-window.md b/docs/sql-ref-syntax-qry-window.md index 767f477..4ec1af7 100644 --- a/docs/sql-ref-syntax-qry-window.md +++ b/docs/sql-ref-syntax-qry-window.md @@ -1,7 +1,7 @@ --- layout: global -title: Windowing Analytic Functions -displayTitle: Windowing Analytic Functions +title: Window Functions +displayTitle: Window Functions license: | Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with @@ -19,4 +19,188 @@ license: | limitations under the License. --- -**This page is under construction** +### Description + +Window functions operate on a group of rows, referred to as a window, and calculate a return value for each row based on the group of rows. Window functions are useful for processing tasks such as calculating a moving average, computing a cumulative statistic, or accessing the value of rows given the relative position of the current row. + +### Syntax + +{% highlight sql %} +window_function OVER +( [ { PARTITION | DISTRIBUTE } BY partition_col_name = partition_col_val ( [ , ... ] ) ] + { ORDER | SORT } BY expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ... ] + [ window_frame ] ) +{% endhighlight %} + +### Parameters + + + window_function + + + Ranking Functions + + Syntax: + + RANK | DENSE_RANK | PERCENT_RANK | NTILE | ROW_NUMBER + + + + Analytic Functions + + Syntax: + + CUME_DIST | LAG | LEAD + + + + Aggregate Functions + + Syntax: + + MAX | MIN | COUNT | SUM | AVG | ... + + +Please refer to the Built-in Functions document for a complete list of Spark aggregate functions. + + + + + window_frame + +Specifies which row to start the window on and where to end it. +Syntax: + { RANGE | ROWS } { frame_start | BETWEEN frame_start AND frame_end } + If frame_end is omitted it defaults to CURRENT ROW. + + frame_start and frame_end have the following syntax + Syntax: + + UNBOUNDED PRECEDING | offset PRECEDING | CURRENT ROW | offset FOLLOWING | UNBOUNDED FOLLOWING + +offset:specifies the offset from the position of the current row. + + + + +### Examples + +{% highlight sql %} +CREATE TABLE employees (name STRING, dept STRING, salary INT, age INT); + +INSERT INTO employees VALUES ("Lisa", "Sales", 1,
[spark] branch branch-3.0 updated: [SPARK-31390][SQL][DOCS] Document Window Function in SQL Syntax Section
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 1139e9b [SPARK-31390][SQL][DOCS] Document Window Function in SQL Syntax Section 1139e9b is described below commit 1139e9b50150e3a99a9c8df0ed57d3fd2b391788 Author: Huaxin Gao AuthorDate: Sat Apr 18 09:31:52 2020 +0900 [SPARK-31390][SQL][DOCS] Document Window Function in SQL Syntax Section ### What changes were proposed in this pull request? Document Window Function in SQL syntax ### Why are the changes needed? Make SQL Reference complete ### Does this PR introduce any user-facing change? Yes https://user-images.githubusercontent.com/13592258/79531509-7bf5af00-8027-11ea-8291-a91b2e97a1b5.png;> https://user-images.githubusercontent.com/13592258/79531514-7e580900-8027-11ea-8761-4c5a888c476f.png;> https://user-images.githubusercontent.com/13592258/79531518-82842680-8027-11ea-876f-6375aa5b5ead.png;> https://user-images.githubusercontent.com/13592258/79531521-844dea00-8027-11ea-8948-712f054d42ee.png;> https://user-images.githubusercontent.com/13592258/79531528-8748da80-8027-11ea-9dae-a465286982ac.png;> ### How was this patch tested? Manually build and check Closes #28220 from huaxingao/sql-win-fun. Authored-by: Huaxin Gao Signed-off-by: Takeshi Yamamuro (cherry picked from commit 142f43629c42ad750d9b506283191aa830d95c08) Signed-off-by: Takeshi Yamamuro --- docs/_data/menu-sql.yaml | 2 + docs/sql-ref-syntax-qry-window.md | 190 +- 2 files changed, 189 insertions(+), 3 deletions(-) diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml index 7827a0f..5042c2588 100644 --- a/docs/_data/menu-sql.yaml +++ b/docs/_data/menu-sql.yaml @@ -168,6 +168,8 @@ url: sql-ref-syntax-qry-select-inline-table.html - text: Common Table Expression url: sql-ref-syntax-qry-select-cte.html +- text: Window Function + url: sql-ref-syntax-qry-window.html - text: EXPLAIN url: sql-ref-syntax-qry-explain.html - text: Auxiliary Statements diff --git a/docs/sql-ref-syntax-qry-window.md b/docs/sql-ref-syntax-qry-window.md index 767f477..4ec1af7 100644 --- a/docs/sql-ref-syntax-qry-window.md +++ b/docs/sql-ref-syntax-qry-window.md @@ -1,7 +1,7 @@ --- layout: global -title: Windowing Analytic Functions -displayTitle: Windowing Analytic Functions +title: Window Functions +displayTitle: Window Functions license: | Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with @@ -19,4 +19,188 @@ license: | limitations under the License. --- -**This page is under construction** +### Description + +Window functions operate on a group of rows, referred to as a window, and calculate a return value for each row based on the group of rows. Window functions are useful for processing tasks such as calculating a moving average, computing a cumulative statistic, or accessing the value of rows given the relative position of the current row. + +### Syntax + +{% highlight sql %} +window_function OVER +( [ { PARTITION | DISTRIBUTE } BY partition_col_name = partition_col_val ( [ , ... ] ) ] + { ORDER | SORT } BY expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ... ] + [ window_frame ] ) +{% endhighlight %} + +### Parameters + + + window_function + + + Ranking Functions + + Syntax: + + RANK | DENSE_RANK | PERCENT_RANK | NTILE | ROW_NUMBER + + + + Analytic Functions + + Syntax: + + CUME_DIST | LAG | LEAD + + + + Aggregate Functions + + Syntax: + + MAX | MIN | COUNT | SUM | AVG | ... + + +Please refer to the Built-in Functions document for a complete list of Spark aggregate functions. + + + + + window_frame + +Specifies which row to start the window on and where to end it. +Syntax: + { RANGE | ROWS } { frame_start | BETWEEN frame_start AND frame_end } + If frame_end is omitted it defaults to CURRENT ROW. + + frame_start and frame_end have the following syntax + Syntax: + + UNBOUNDED PRECEDING | offset PRECEDING | CURRENT ROW | offset FOLLOWING | UNBOUNDED FOLLOWING + +offset:specifies the offset from the position of the current row. + + + + +### Examples + +{% highlight sql %} +CREATE TABLE employees (name STRING, dept STRING, salary INT, age INT); + +INSERT INTO employees VALUES ("Lisa", "Sales", 1,
[spark] branch master updated (db7b865 -> 142f436)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from db7b865 [SPARK-31253][SQL][FOLLOW-UP] simplify the code of calculating size metrics of AQE shuffle add 142f436 [SPARK-31390][SQL][DOCS] Document Window Function in SQL Syntax Section No new revisions were added by this update. Summary of changes: docs/_data/menu-sql.yaml | 2 + docs/sql-ref-syntax-qry-window.md | 190 +- 2 files changed, 189 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (db7b865 -> 142f436)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from db7b865 [SPARK-31253][SQL][FOLLOW-UP] simplify the code of calculating size metrics of AQE shuffle add 142f436 [SPARK-31390][SQL][DOCS] Document Window Function in SQL Syntax Section No new revisions were added by this update. Summary of changes: docs/_data/menu-sql.yaml | 2 + docs/sql-ref-syntax-qry-window.md | 190 +- 2 files changed, 189 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-30564][SQL] Revert Block.length and use comment placeholders in HashAggregateExec
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 020f3a3 [SPARK-30564][SQL] Revert Block.length and use comment placeholders in HashAggregateExec 020f3a3 is described below commit 020f3a33dd711d05337bb42d5f65708a4aec2daa Author: Peter Toth AuthorDate: Thu Apr 16 17:52:22 2020 +0900 [SPARK-30564][SQL] Revert Block.length and use comment placeholders in HashAggregateExec ### What changes were proposed in this pull request? SPARK-21870 (cb0cddf#diff-06dc5de6163687b7810aa76e7e152a76R146-R149) caused significant performance regression in cases where the source code size is fairly large as `HashAggregateExec` uses `Block.length` to decide on splitting the code. The change in `length` makes sense as the comment and extra new lines shouldn't be taken into account when deciding on splitting, but the regular expression based approach is very slow and adds a big relative overhead to cases where the execution is [...] This PR: - restores `Block.length` to its original form - places comments in `HashAggragateExec` with `CodegenContext.registerComment` so as to appear only when comments are enabled (`spark.sql.codegen.comments=true`) Before this PR: ``` deeply nested struct field r/w: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative 250 deep x 400 rows (read in-mem) 1137 1143 8 0.1 11368.3 0.0X ``` After this PR: ``` deeply nested struct field r/w: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative 250 deep x 400 rows (read in-mem) 167180 7 0.61674.3 0.1X ``` ### Why are the changes needed? To fix performance regression. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing UTs. Closes #28083 from peter-toth/SPARK-30564-use-comment-placeholders. Authored-by: Peter Toth Signed-off-by: Takeshi Yamamuro (cherry picked from commit 7ad6ba36f28b7a5ca548950dec6afcd61e5d68b9) Signed-off-by: Takeshi Yamamuro --- .../spark/sql/catalyst/expressions/codegen/javaCode.scala | 8 .../spark/sql/execution/aggregate/HashAggregateExec.scala | 14 +++--- 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala index dff2589..1c59c3c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala @@ -143,10 +143,10 @@ trait Block extends TreeNode[Block] with JavaCode { case _ => code.trim } - def length: Int = { -// Returns a code length without comments -CodeFormatter.stripExtraNewLinesAndComments(toString).length - } + // We could remove comments, extra whitespaces and newlines when calculating length as it is used + // only for codegen method splitting, but SPARK-30564 showed that this is a performance critical + // function so we decided not to do so. + def length: Int = toString.length def isEmpty: Boolean = toString.isEmpty diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala index 7a26fd7..87a4081 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala @@ -367,10 +367,10 @@ case class HashAggregateExec( """.stripMargin } code""" - |// do aggregate for ${aggNames(i)} - |// evaluate aggregate function + |${ctx.registerComment(s"do aggregate for ${aggNames(i)}")} + |${ctx.registerComment("evaluate aggregate function")} |${evaluateVariables(bufferEvalsForOneFunc)} - |// update aggregation buffers + |${ctx.registerComment("update aggregation buffers")} |${updates.mkString("\n").trim} """.stripMargin } @@ -975,9 +975,
[spark] branch branch-3.0 updated: [SPARK-30564][SQL] Revert Block.length and use comment placeholders in HashAggregateExec
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 020f3a3 [SPARK-30564][SQL] Revert Block.length and use comment placeholders in HashAggregateExec 020f3a3 is described below commit 020f3a33dd711d05337bb42d5f65708a4aec2daa Author: Peter Toth AuthorDate: Thu Apr 16 17:52:22 2020 +0900 [SPARK-30564][SQL] Revert Block.length and use comment placeholders in HashAggregateExec ### What changes were proposed in this pull request? SPARK-21870 (cb0cddf#diff-06dc5de6163687b7810aa76e7e152a76R146-R149) caused significant performance regression in cases where the source code size is fairly large as `HashAggregateExec` uses `Block.length` to decide on splitting the code. The change in `length` makes sense as the comment and extra new lines shouldn't be taken into account when deciding on splitting, but the regular expression based approach is very slow and adds a big relative overhead to cases where the execution is [...] This PR: - restores `Block.length` to its original form - places comments in `HashAggragateExec` with `CodegenContext.registerComment` so as to appear only when comments are enabled (`spark.sql.codegen.comments=true`) Before this PR: ``` deeply nested struct field r/w: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative 250 deep x 400 rows (read in-mem) 1137 1143 8 0.1 11368.3 0.0X ``` After this PR: ``` deeply nested struct field r/w: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative 250 deep x 400 rows (read in-mem) 167180 7 0.61674.3 0.1X ``` ### Why are the changes needed? To fix performance regression. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing UTs. Closes #28083 from peter-toth/SPARK-30564-use-comment-placeholders. Authored-by: Peter Toth Signed-off-by: Takeshi Yamamuro (cherry picked from commit 7ad6ba36f28b7a5ca548950dec6afcd61e5d68b9) Signed-off-by: Takeshi Yamamuro --- .../spark/sql/catalyst/expressions/codegen/javaCode.scala | 8 .../spark/sql/execution/aggregate/HashAggregateExec.scala | 14 +++--- 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala index dff2589..1c59c3c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala @@ -143,10 +143,10 @@ trait Block extends TreeNode[Block] with JavaCode { case _ => code.trim } - def length: Int = { -// Returns a code length without comments -CodeFormatter.stripExtraNewLinesAndComments(toString).length - } + // We could remove comments, extra whitespaces and newlines when calculating length as it is used + // only for codegen method splitting, but SPARK-30564 showed that this is a performance critical + // function so we decided not to do so. + def length: Int = toString.length def isEmpty: Boolean = toString.isEmpty diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala index 7a26fd7..87a4081 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala @@ -367,10 +367,10 @@ case class HashAggregateExec( """.stripMargin } code""" - |// do aggregate for ${aggNames(i)} - |// evaluate aggregate function + |${ctx.registerComment(s"do aggregate for ${aggNames(i)}")} + |${ctx.registerComment("evaluate aggregate function")} |${evaluateVariables(bufferEvalsForOneFunc)} - |// update aggregation buffers + |${ctx.registerComment("update aggregation buffers")} |${updates.mkString("\n").trim} """.stripMargin } @@ -975,9 +975,
[spark] branch master updated (c76c31e -> 7ad6ba3)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c76c31e [SPARK-31455][SQL] Fix rebasing of not-existed timestamps add 7ad6ba3 [SPARK-30564][SQL] Revert Block.length and use comment placeholders in HashAggregateExec No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/expressions/codegen/javaCode.scala | 8 .../spark/sql/execution/aggregate/HashAggregateExec.scala | 14 +++--- 2 files changed, 11 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c76c31e -> 7ad6ba3)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c76c31e [SPARK-31455][SQL] Fix rebasing of not-existed timestamps add 7ad6ba3 [SPARK-30564][SQL] Revert Block.length and use comment placeholders in HashAggregateExec No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/expressions/codegen/javaCode.scala | 8 .../spark/sql/execution/aggregate/HashAggregateExec.scala | 14 +++--- 2 files changed, 11 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31428][SQL][DOCS] Document Common Table Expression in SQL Reference
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 4476c85 [SPARK-31428][SQL][DOCS] Document Common Table Expression in SQL Reference 4476c85 is described below commit 4476c85775d231c8bb26399284c0baf4292bec7c Author: Huaxin Gao AuthorDate: Thu Apr 16 08:34:26 2020 +0900 [SPARK-31428][SQL][DOCS] Document Common Table Expression in SQL Reference ### What changes were proposed in this pull request? Document Common Table Expression in SQL Reference ### Why are the changes needed? Make SQL Reference complete ### Does this PR introduce any user-facing change? Yes https://user-images.githubusercontent.com/13592258/79100257-f61def00-7d1a-11ea-8402-17017059232e.png;> https://user-images.githubusercontent.com/13592258/79100260-f7e7b280-7d1a-11ea-9408-058c0851f0b6.png;> https://user-images.githubusercontent.com/13592258/79100262-fa4a0c80-7d1a-11ea-8862-eb1d8960296b.png;> Also link to Select page https://user-images.githubusercontent.com/13592258/79082246-217fea00-7cd9-11ea-8d96-1a69769d1e19.png;> ### How was this patch tested? Manually build and check Closes #28196 from huaxingao/cte. Authored-by: Huaxin Gao Signed-off-by: Takeshi Yamamuro (cherry picked from commit 92c1b246174948d0c1f4d0833e1ceac265b17be7) Signed-off-by: Takeshi Yamamuro --- docs/_data/menu-sql.yaml | 2 + docs/sql-ref-syntax-qry-select-cte.md | 109 +- docs/sql-ref-syntax-qry-select.md | 3 +- 3 files changed, 112 insertions(+), 2 deletions(-) diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml index badb98d..7827a0f 100644 --- a/docs/_data/menu-sql.yaml +++ b/docs/_data/menu-sql.yaml @@ -166,6 +166,8 @@ url: sql-ref-syntax-qry-select-tvf.html - text: Inline Table url: sql-ref-syntax-qry-select-inline-table.html +- text: Common Table Expression + url: sql-ref-syntax-qry-select-cte.html - text: EXPLAIN url: sql-ref-syntax-qry-explain.html - text: Auxiliary Statements diff --git a/docs/sql-ref-syntax-qry-select-cte.md b/docs/sql-ref-syntax-qry-select-cte.md index 2bd7748..2146f8e 100644 --- a/docs/sql-ref-syntax-qry-select-cte.md +++ b/docs/sql-ref-syntax-qry-select-cte.md @@ -19,4 +19,111 @@ license: | limitations under the License. --- -**This page is under construction** +### Description + +A common table expression (CTE) defines a temporary result set that a user can reference possibly multiple times within the scope of a SQL statement. A CTE is used mainly in a SELECT statement. + +### Syntax + +{% highlight sql %} +WITH common_table_expression [ , ... ] +{% endhighlight %} + +While `common_table_expression` is defined as +{% highlight sql %} +expression_name [ ( column_name [ , ... ] ) ] [ AS ] ( [ common_table_expression ] query ) +{% endhighlight %} + +### Parameters + + + expression_name + +Specifies a name for the common table expression. + + + + query + +A SELECT statement. + + + +### Examples + +{% highlight sql %} +-- CTE with multiple column aliases +WITH t(x, y) AS (SELECT 1, 2) +SELECT * FROM t WHERE x = 1 AND y = 2; + +---+---+ + | x| y| + +---+---+ + | 1| 2| + +---+---+ + +-- CTE in CTE definition +WITH t as ( +WITH t2 AS (SELECT 1) +SELECT * FROM t2 +) +SELECT * FROM t; + +---+ + | 1| + +---+ + | 1| + +---+ + +-- CTE in subquery +SELECT max(c) FROM ( +WITH t(c) AS (SELECT 1) +SELECT * FROM t +); + +--+ + |max(c)| + +--+ + | 1| + +--+ + +-- CTE in subquery expression +SELECT ( +WITH t AS (SELECT 1) +SELECT * FROM t +); + ++ + |scalarsubquery()| + ++ + | 1| + ++ + +-- CTE in CREATE VIEW statement +CREATE VIEW v AS +WITH t(a, b, c, d) AS (SELECT 1, 2, 3, 4) +SELECT * FROM t; +SELECT * FROM v; + +---+---+---+---+ + | a| b| c| d| + +---+---+---+---+ + | 1| 2| 3| 4| + +---+---+---+---+ + +-- If name conflict is detected in nested CTE, then AnalysisException is thrown by default. +-- SET spark.sql.legacy.ctePrecedencePolicy = CORRECTED (which is recommended), +-- inner CTE definitions take precedence over outer definitions. +SET spark.sql.legacy.ctePrecedencePolicy = CORRECTED; +WITH +t AS (SELECT 1), +t2 AS ( +WITH t AS (SELECT 2) +SELECT * FROM t +) +SELECT * FROM t2; + +---+ + | 2| + +---+ + | 2| + +---+ +{% endhighlight %} + +### Related Statements + + * [SELECT](sql-ref-syntax-qry-select.html) diff --git a/docs/sql-ref-syntax-qry-select.md b/docs/sql-ref-syntax-qry-select.md index 94f6
[spark] branch master updated: [SPARK-31428][SQL][DOCS] Document Common Table Expression in SQL Reference
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 92c1b24 [SPARK-31428][SQL][DOCS] Document Common Table Expression in SQL Reference 92c1b24 is described below commit 92c1b246174948d0c1f4d0833e1ceac265b17be7 Author: Huaxin Gao AuthorDate: Thu Apr 16 08:34:26 2020 +0900 [SPARK-31428][SQL][DOCS] Document Common Table Expression in SQL Reference ### What changes were proposed in this pull request? Document Common Table Expression in SQL Reference ### Why are the changes needed? Make SQL Reference complete ### Does this PR introduce any user-facing change? Yes https://user-images.githubusercontent.com/13592258/79100257-f61def00-7d1a-11ea-8402-17017059232e.png;> https://user-images.githubusercontent.com/13592258/79100260-f7e7b280-7d1a-11ea-9408-058c0851f0b6.png;> https://user-images.githubusercontent.com/13592258/79100262-fa4a0c80-7d1a-11ea-8862-eb1d8960296b.png;> Also link to Select page https://user-images.githubusercontent.com/13592258/79082246-217fea00-7cd9-11ea-8d96-1a69769d1e19.png;> ### How was this patch tested? Manually build and check Closes #28196 from huaxingao/cte. Authored-by: Huaxin Gao Signed-off-by: Takeshi Yamamuro --- docs/_data/menu-sql.yaml | 2 + docs/sql-ref-syntax-qry-select-cte.md | 109 +- docs/sql-ref-syntax-qry-select.md | 3 +- 3 files changed, 112 insertions(+), 2 deletions(-) diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml index badb98d..7827a0f 100644 --- a/docs/_data/menu-sql.yaml +++ b/docs/_data/menu-sql.yaml @@ -166,6 +166,8 @@ url: sql-ref-syntax-qry-select-tvf.html - text: Inline Table url: sql-ref-syntax-qry-select-inline-table.html +- text: Common Table Expression + url: sql-ref-syntax-qry-select-cte.html - text: EXPLAIN url: sql-ref-syntax-qry-explain.html - text: Auxiliary Statements diff --git a/docs/sql-ref-syntax-qry-select-cte.md b/docs/sql-ref-syntax-qry-select-cte.md index 2bd7748..2146f8e 100644 --- a/docs/sql-ref-syntax-qry-select-cte.md +++ b/docs/sql-ref-syntax-qry-select-cte.md @@ -19,4 +19,111 @@ license: | limitations under the License. --- -**This page is under construction** +### Description + +A common table expression (CTE) defines a temporary result set that a user can reference possibly multiple times within the scope of a SQL statement. A CTE is used mainly in a SELECT statement. + +### Syntax + +{% highlight sql %} +WITH common_table_expression [ , ... ] +{% endhighlight %} + +While `common_table_expression` is defined as +{% highlight sql %} +expression_name [ ( column_name [ , ... ] ) ] [ AS ] ( [ common_table_expression ] query ) +{% endhighlight %} + +### Parameters + + + expression_name + +Specifies a name for the common table expression. + + + + query + +A SELECT statement. + + + +### Examples + +{% highlight sql %} +-- CTE with multiple column aliases +WITH t(x, y) AS (SELECT 1, 2) +SELECT * FROM t WHERE x = 1 AND y = 2; + +---+---+ + | x| y| + +---+---+ + | 1| 2| + +---+---+ + +-- CTE in CTE definition +WITH t as ( +WITH t2 AS (SELECT 1) +SELECT * FROM t2 +) +SELECT * FROM t; + +---+ + | 1| + +---+ + | 1| + +---+ + +-- CTE in subquery +SELECT max(c) FROM ( +WITH t(c) AS (SELECT 1) +SELECT * FROM t +); + +--+ + |max(c)| + +--+ + | 1| + +--+ + +-- CTE in subquery expression +SELECT ( +WITH t AS (SELECT 1) +SELECT * FROM t +); + ++ + |scalarsubquery()| + ++ + | 1| + ++ + +-- CTE in CREATE VIEW statement +CREATE VIEW v AS +WITH t(a, b, c, d) AS (SELECT 1, 2, 3, 4) +SELECT * FROM t; +SELECT * FROM v; + +---+---+---+---+ + | a| b| c| d| + +---+---+---+---+ + | 1| 2| 3| 4| + +---+---+---+---+ + +-- If name conflict is detected in nested CTE, then AnalysisException is thrown by default. +-- SET spark.sql.legacy.ctePrecedencePolicy = CORRECTED (which is recommended), +-- inner CTE definitions take precedence over outer definitions. +SET spark.sql.legacy.ctePrecedencePolicy = CORRECTED; +WITH +t AS (SELECT 1), +t2 AS ( +WITH t AS (SELECT 2) +SELECT * FROM t +) +SELECT * FROM t2; + +---+ + | 2| + +---+ + | 2| + +---+ +{% endhighlight %} + +### Related Statements + + * [SELECT](sql-ref-syntax-qry-select.html) diff --git a/docs/sql-ref-syntax-qry-select.md b/docs/sql-ref-syntax-qry-select.md index 94f69d4..bc2cc02 100644 --- a/docs/sql-ref-syntax-qry-select.md +++ b/docs/sql-ref-syntax-qry-select.md @@ -53,7 +53
[spark] branch master updated: [SPARK-25154][SQL] Support NOT IN sub-queries inside nested OR conditions
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f0e2fc3 [SPARK-25154][SQL] Support NOT IN sub-queries inside nested OR conditions f0e2fc3 is described below commit f0e2fc37d1dc2a85fd08c87add5106bb51305182 Author: Dilip Biswal AuthorDate: Sat Apr 11 08:28:11 2020 +0900 [SPARK-25154][SQL] Support NOT IN sub-queries inside nested OR conditions ### What changes were proposed in this pull request? Currently NOT IN subqueries (predicated null aware subquery) are not allowed inside OR expressions. We currently catch this condition in checkAnalysis and throw an error. This PR enhances the subquery rewrite to support this type of queries. Query ```SQL SELECT * FROM s1 WHERE a > 5 or b NOT IN (SELECT c FROM s2); ``` Optimized Plan ```SQL == Optimized Logical Plan == Project [a#3, b#4] +- Filter ((a#3 > 5) || NOT exists#7) +- Join ExistenceJoin(exists#7), ((b#4 = c#5) || isnull((b#4 = c#5))) :- HiveTableRelation `default`.`s1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#3, b#4] +- Project [c#5] +- HiveTableRelation `default`.`s2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c#5, d#6] ``` This is rework from #22141. The original author of this PR is dilipbiswal. Closes #22141 ### Why are the changes needed? For better usability. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Added new tests in SQLQueryTestSuite, RewriteSubquerySuite and SubquerySuite. Output from DB2 as a reference: [nested-not-db2.txt](https://github.com/apache/spark/files/2299945/nested-not-db2.txt) Closes #28158 from maropu/pr22141. Lead-authored-by: Dilip Biswal Co-authored-by: Takeshi Yamamuro Co-authored-by: Dilip Biswal Signed-off-by: Takeshi Yamamuro --- .../sql/catalyst/analysis/CheckAnalysis.scala | 4 - .../spark/sql/catalyst/expressions/subquery.scala | 18 -- .../spark/sql/catalyst/optimizer/subquery.scala| 23 +- .../sql/catalyst/analysis/AnalysisErrorSuite.scala | 15 - .../catalyst/optimizer/RewriteSubquerySuite.scala | 19 +- .../apache/spark/sql/catalyst/plans/PlanTest.scala | 9 +- .../inputs/subquery/in-subquery/nested-not-in.sql | 198 .../subquery/in-subquery/nested-not-in.sql.out | 332 + .../scala/org/apache/spark/sql/SubquerySuite.scala | 6 +- 9 files changed, 580 insertions(+), 44 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala index 066dc6d..9e325d0 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala @@ -200,10 +200,6 @@ trait CheckAnalysis extends PredicateHelper { s"filter expression '${f.condition.sql}' " + s"of type ${f.condition.dataType.catalogString} is not a boolean.") - case Filter(condition, _) if hasNullAwarePredicateWithinNot(condition) => -failAnalysis("Null-aware predicate sub-queries cannot be used in nested " + - s"conditions: $condition") - case j @ Join(_, _, _, Some(condition), _) if condition.dataType != BooleanType => failAnalysis( s"join condition '${condition.sql}' " + diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala index e33cff2..f46a1c6 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala @@ -106,24 +106,6 @@ object SubExprUtils extends PredicateHelper { } /** - * Returns whether there are any null-aware predicate subqueries inside Not. If not, we could - * turn the null-aware predicate into not-null-aware predicate. - */ - def hasNullAwarePredicateWithinNot(condition: Expression): Boolean = { -splitConjunctivePredicates(condition).exists { - case _: Exists | Not(_: Exists) => false - case _: InSubquery | Not(_: InSubquery) => false - case e => e.find { x => -x.isInstanceOf[Not] && e.find { - case _: InSubquery => true - case _ => false -}.isDefined - }.isDefined -} - - } - - /** * Returns an
[spark] branch master updated (2d3692e -> f0e2fc3)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2d3692e [SPARK-31406][SQL][TEST] ThriftServerQueryTestSuite: Sharing test data and test tables among multiple test cases add f0e2fc3 [SPARK-25154][SQL] Support NOT IN sub-queries inside nested OR conditions No new revisions were added by this update. Summary of changes: .../sql/catalyst/analysis/CheckAnalysis.scala | 4 - .../spark/sql/catalyst/expressions/subquery.scala | 18 -- .../spark/sql/catalyst/optimizer/subquery.scala| 23 +- .../sql/catalyst/analysis/AnalysisErrorSuite.scala | 15 - .../catalyst/optimizer/RewriteSubquerySuite.scala | 19 +- .../apache/spark/sql/catalyst/plans/PlanTest.scala | 9 +- .../inputs/subquery/in-subquery/nested-not-in.sql | 198 .../subquery/in-subquery/nested-not-in.sql.out | 332 + .../scala/org/apache/spark/sql/SubquerySuite.scala | 6 +- 9 files changed, 580 insertions(+), 44 deletions(-) create mode 100644 sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/nested-not-in.sql create mode 100644 sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/nested-not-in.sql.out - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31358][SQL][DOC] Document FILTER clauses of aggregate functions in SQL references
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 51c80a4 [SPARK-31358][SQL][DOC] Document FILTER clauses of aggregate functions in SQL references 51c80a4 is described below commit 51c80a48024242a51940c9c0aafdfd7e3a0c481f Author: Takeshi Yamamuro AuthorDate: Mon Apr 6 21:36:51 2020 +0900 [SPARK-31358][SQL][DOC] Document FILTER clauses of aggregate functions in SQL references ### What changes were proposed in this pull request? This PR intends to improve the SQL document of `GROUP BY`; it added the description about FILTER clauses of aggregate functions. ### Why are the changes needed? To improve the SQL documents ### Does this PR introduce any user-facing change? Yes. https://user-images.githubusercontent.com/692303/78558612-e2234a80-784d-11ea-9353-b3feac4d57a7.png; width="500"> ### How was this patch tested? Manually checked. Closes #28134 from maropu/SPARK-31358. Authored-by: Takeshi Yamamuro Signed-off-by: Takeshi Yamamuro (cherry picked from commit e24f0dcd2754a1db27e9b0f3cf27ee6d7229f717) Signed-off-by: Takeshi Yamamuro --- docs/sql-ref-syntax-qry-select-groupby.md | 44 +++ docs/sql-ref-syntax-qry-select.md | 1 + 2 files changed, 45 insertions(+) diff --git a/docs/sql-ref-syntax-qry-select-groupby.md b/docs/sql-ref-syntax-qry-select-groupby.md index 49a11ca..c461a18 100644 --- a/docs/sql-ref-syntax-qry-select-groupby.md +++ b/docs/sql-ref-syntax-qry-select-groupby.md @@ -21,6 +21,7 @@ license: | The GROUP BY clause is used to group the rows based on a set of specified grouping expressions and compute aggregations on the group of rows based on one or more specified aggregate functions. Spark also supports advanced aggregations to do multiple aggregations for the same input record set via `GROUPING SETS`, `CUBE`, `ROLLUP` clauses. +When a FILTER clause is attached to an aggregate function, only the matching rows are passed to that function. ### Syntax {% highlight sql %} @@ -30,6 +31,11 @@ GROUP BY group_expression [ , group_expression [ , ... ] ] GROUP BY GROUPING SETS (grouping_set [ , ...]) {% endhighlight %} +While aggregate functions are defined as +{% highlight sql %} +aggregate_name ( [ DISTINCT ] expression [ , ... ] ) [ FILTER ( WHERE boolean_expression ) ] +{% endhighlight %} + ### Parameters GROUPING SETS @@ -70,6 +76,19 @@ GROUP BY GROUPING SETS (grouping_set [ , ...]) ((warehouse, product), (warehouse), (product), ()). The N elements of a CUBE specification results in 2^N GROUPING SETS. + aggregate_name + +Specifies an aggregate function name (MIN, MAX, COUNT, SUM, AVG, etc.). + + DISTINCT + +Removes duplicates in input rows before they are passed to aggregate functions. + + FILTER + +Filters the input rows for which the boolean_expression in the WHERE clause evaluates +to true are passed to the aggregate function; other rows are discarded. + ### Examples @@ -120,6 +139,31 @@ SELECT id, sum(quantity) AS sum, max(quantity) AS max FROM dealer GROUP BY id OR |300|13 |8 | +---+---+---+ +-- Count the number of distinct dealer cities per car_model. +SELECT car_model, count(DISTINCT city) AS count FROM dealer GROUP BY car_model; + + ++-+ + | car_model|count| + ++-+ + | Honda Civic|3| + | Honda CRV|2| + |Honda Accord|3| + ++-+ + +-- Sum of only 'Honda Civic' and 'Honda CRV' quantities per dealership. +SELECT id, sum(quantity) FILTER ( +WHERE car_model IN ('Honda Civic', 'Honda CRV') +) AS `sum(quantity)` FROM dealer +GROUP BY id ORDER BY id; + + +---+-+ + | id|sum(quantity)| + +---+-+ + |100| 17| + |200| 23| + |300|5| + +---+-+ + -- Aggregations using multiple sets of grouping columns in a single statement. -- Following performs aggregations based on four sets of grouping columns. -- 1. city, car_model diff --git a/docs/sql-ref-syntax-qry-select.md b/docs/sql-ref-syntax-qry-select.md index e87c4a5..7ad1dd1 100644 --- a/docs/sql-ref-syntax-qry-select.md +++ b/docs/sql-ref-syntax-qry-select.md @@ -92,6 +92,7 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { named_expression [ , ... ] } Specifies the expressions that are used to group the rows. This is used in conjunction with aggregate functions (MIN, MAX, COUNT, SUM, AVG, etc.) to group rows based on the grouping expressions and aggregate values in each group. +When a FILTER clause is attached to an aggregate function, only the matching rows are passed to that function.
[spark] branch master updated: [SPARK-31358][SQL][DOC] Document FILTER clauses of aggregate functions in SQL references
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e24f0dc [SPARK-31358][SQL][DOC] Document FILTER clauses of aggregate functions in SQL references e24f0dc is described below commit e24f0dcd2754a1db27e9b0f3cf27ee6d7229f717 Author: Takeshi Yamamuro AuthorDate: Mon Apr 6 21:36:51 2020 +0900 [SPARK-31358][SQL][DOC] Document FILTER clauses of aggregate functions in SQL references ### What changes were proposed in this pull request? This PR intends to improve the SQL document of `GROUP BY`; it added the description about FILTER clauses of aggregate functions. ### Why are the changes needed? To improve the SQL documents ### Does this PR introduce any user-facing change? Yes. https://user-images.githubusercontent.com/692303/78558612-e2234a80-784d-11ea-9353-b3feac4d57a7.png; width="500"> ### How was this patch tested? Manually checked. Closes #28134 from maropu/SPARK-31358. Authored-by: Takeshi Yamamuro Signed-off-by: Takeshi Yamamuro --- docs/sql-ref-syntax-qry-select-groupby.md | 44 +++ docs/sql-ref-syntax-qry-select.md | 1 + 2 files changed, 45 insertions(+) diff --git a/docs/sql-ref-syntax-qry-select-groupby.md b/docs/sql-ref-syntax-qry-select-groupby.md index 49a11ca..c461a18 100644 --- a/docs/sql-ref-syntax-qry-select-groupby.md +++ b/docs/sql-ref-syntax-qry-select-groupby.md @@ -21,6 +21,7 @@ license: | The GROUP BY clause is used to group the rows based on a set of specified grouping expressions and compute aggregations on the group of rows based on one or more specified aggregate functions. Spark also supports advanced aggregations to do multiple aggregations for the same input record set via `GROUPING SETS`, `CUBE`, `ROLLUP` clauses. +When a FILTER clause is attached to an aggregate function, only the matching rows are passed to that function. ### Syntax {% highlight sql %} @@ -30,6 +31,11 @@ GROUP BY group_expression [ , group_expression [ , ... ] ] GROUP BY GROUPING SETS (grouping_set [ , ...]) {% endhighlight %} +While aggregate functions are defined as +{% highlight sql %} +aggregate_name ( [ DISTINCT ] expression [ , ... ] ) [ FILTER ( WHERE boolean_expression ) ] +{% endhighlight %} + ### Parameters GROUPING SETS @@ -70,6 +76,19 @@ GROUP BY GROUPING SETS (grouping_set [ , ...]) ((warehouse, product), (warehouse), (product), ()). The N elements of a CUBE specification results in 2^N GROUPING SETS. + aggregate_name + +Specifies an aggregate function name (MIN, MAX, COUNT, SUM, AVG, etc.). + + DISTINCT + +Removes duplicates in input rows before they are passed to aggregate functions. + + FILTER + +Filters the input rows for which the boolean_expression in the WHERE clause evaluates +to true are passed to the aggregate function; other rows are discarded. + ### Examples @@ -120,6 +139,31 @@ SELECT id, sum(quantity) AS sum, max(quantity) AS max FROM dealer GROUP BY id OR |300|13 |8 | +---+---+---+ +-- Count the number of distinct dealer cities per car_model. +SELECT car_model, count(DISTINCT city) AS count FROM dealer GROUP BY car_model; + + ++-+ + | car_model|count| + ++-+ + | Honda Civic|3| + | Honda CRV|2| + |Honda Accord|3| + ++-+ + +-- Sum of only 'Honda Civic' and 'Honda CRV' quantities per dealership. +SELECT id, sum(quantity) FILTER ( +WHERE car_model IN ('Honda Civic', 'Honda CRV') +) AS `sum(quantity)` FROM dealer +GROUP BY id ORDER BY id; + + +---+-+ + | id|sum(quantity)| + +---+-+ + |100| 17| + |200| 23| + |300|5| + +---+-+ + -- Aggregations using multiple sets of grouping columns in a single statement. -- Following performs aggregations based on four sets of grouping columns. -- 1. city, car_model diff --git a/docs/sql-ref-syntax-qry-select.md b/docs/sql-ref-syntax-qry-select.md index e87c4a5..7ad1dd1 100644 --- a/docs/sql-ref-syntax-qry-select.md +++ b/docs/sql-ref-syntax-qry-select.md @@ -92,6 +92,7 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { named_expression [ , ... ] } Specifies the expressions that are used to group the rows. This is used in conjunction with aggregate functions (MIN, MAX, COUNT, SUM, AVG, etc.) to group rows based on the grouping expressions and aggregate values in each group. +When a FILTER clause is attached to an aggregate function, only the matching rows are passed to that function. HAVING - To unsubscribe, e-mail: com
[spark] branch branch-3.0 updated: [SPARK-31358][SQL][DOC] Document FILTER clauses of aggregate functions in SQL references
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 51c80a4 [SPARK-31358][SQL][DOC] Document FILTER clauses of aggregate functions in SQL references 51c80a4 is described below commit 51c80a48024242a51940c9c0aafdfd7e3a0c481f Author: Takeshi Yamamuro AuthorDate: Mon Apr 6 21:36:51 2020 +0900 [SPARK-31358][SQL][DOC] Document FILTER clauses of aggregate functions in SQL references ### What changes were proposed in this pull request? This PR intends to improve the SQL document of `GROUP BY`; it added the description about FILTER clauses of aggregate functions. ### Why are the changes needed? To improve the SQL documents ### Does this PR introduce any user-facing change? Yes. https://user-images.githubusercontent.com/692303/78558612-e2234a80-784d-11ea-9353-b3feac4d57a7.png; width="500"> ### How was this patch tested? Manually checked. Closes #28134 from maropu/SPARK-31358. Authored-by: Takeshi Yamamuro Signed-off-by: Takeshi Yamamuro (cherry picked from commit e24f0dcd2754a1db27e9b0f3cf27ee6d7229f717) Signed-off-by: Takeshi Yamamuro --- docs/sql-ref-syntax-qry-select-groupby.md | 44 +++ docs/sql-ref-syntax-qry-select.md | 1 + 2 files changed, 45 insertions(+) diff --git a/docs/sql-ref-syntax-qry-select-groupby.md b/docs/sql-ref-syntax-qry-select-groupby.md index 49a11ca..c461a18 100644 --- a/docs/sql-ref-syntax-qry-select-groupby.md +++ b/docs/sql-ref-syntax-qry-select-groupby.md @@ -21,6 +21,7 @@ license: | The GROUP BY clause is used to group the rows based on a set of specified grouping expressions and compute aggregations on the group of rows based on one or more specified aggregate functions. Spark also supports advanced aggregations to do multiple aggregations for the same input record set via `GROUPING SETS`, `CUBE`, `ROLLUP` clauses. +When a FILTER clause is attached to an aggregate function, only the matching rows are passed to that function. ### Syntax {% highlight sql %} @@ -30,6 +31,11 @@ GROUP BY group_expression [ , group_expression [ , ... ] ] GROUP BY GROUPING SETS (grouping_set [ , ...]) {% endhighlight %} +While aggregate functions are defined as +{% highlight sql %} +aggregate_name ( [ DISTINCT ] expression [ , ... ] ) [ FILTER ( WHERE boolean_expression ) ] +{% endhighlight %} + ### Parameters GROUPING SETS @@ -70,6 +76,19 @@ GROUP BY GROUPING SETS (grouping_set [ , ...]) ((warehouse, product), (warehouse), (product), ()). The N elements of a CUBE specification results in 2^N GROUPING SETS. + aggregate_name + +Specifies an aggregate function name (MIN, MAX, COUNT, SUM, AVG, etc.). + + DISTINCT + +Removes duplicates in input rows before they are passed to aggregate functions. + + FILTER + +Filters the input rows for which the boolean_expression in the WHERE clause evaluates +to true are passed to the aggregate function; other rows are discarded. + ### Examples @@ -120,6 +139,31 @@ SELECT id, sum(quantity) AS sum, max(quantity) AS max FROM dealer GROUP BY id OR |300|13 |8 | +---+---+---+ +-- Count the number of distinct dealer cities per car_model. +SELECT car_model, count(DISTINCT city) AS count FROM dealer GROUP BY car_model; + + ++-+ + | car_model|count| + ++-+ + | Honda Civic|3| + | Honda CRV|2| + |Honda Accord|3| + ++-+ + +-- Sum of only 'Honda Civic' and 'Honda CRV' quantities per dealership. +SELECT id, sum(quantity) FILTER ( +WHERE car_model IN ('Honda Civic', 'Honda CRV') +) AS `sum(quantity)` FROM dealer +GROUP BY id ORDER BY id; + + +---+-+ + | id|sum(quantity)| + +---+-+ + |100| 17| + |200| 23| + |300|5| + +---+-+ + -- Aggregations using multiple sets of grouping columns in a single statement. -- Following performs aggregations based on four sets of grouping columns. -- 1. city, car_model diff --git a/docs/sql-ref-syntax-qry-select.md b/docs/sql-ref-syntax-qry-select.md index e87c4a5..7ad1dd1 100644 --- a/docs/sql-ref-syntax-qry-select.md +++ b/docs/sql-ref-syntax-qry-select.md @@ -92,6 +92,7 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { named_expression [ , ... ] } Specifies the expressions that are used to group the rows. This is used in conjunction with aggregate functions (MIN, MAX, COUNT, SUM, AVG, etc.) to group rows based on the grouping expressions and aggregate values in each group. +When a FILTER clause is attached to an aggregate function, only the matching rows are passed to that function.
[spark] branch master updated: [SPARK-31358][SQL][DOC] Document FILTER clauses of aggregate functions in SQL references
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e24f0dc [SPARK-31358][SQL][DOC] Document FILTER clauses of aggregate functions in SQL references e24f0dc is described below commit e24f0dcd2754a1db27e9b0f3cf27ee6d7229f717 Author: Takeshi Yamamuro AuthorDate: Mon Apr 6 21:36:51 2020 +0900 [SPARK-31358][SQL][DOC] Document FILTER clauses of aggregate functions in SQL references ### What changes were proposed in this pull request? This PR intends to improve the SQL document of `GROUP BY`; it added the description about FILTER clauses of aggregate functions. ### Why are the changes needed? To improve the SQL documents ### Does this PR introduce any user-facing change? Yes. https://user-images.githubusercontent.com/692303/78558612-e2234a80-784d-11ea-9353-b3feac4d57a7.png; width="500"> ### How was this patch tested? Manually checked. Closes #28134 from maropu/SPARK-31358. Authored-by: Takeshi Yamamuro Signed-off-by: Takeshi Yamamuro --- docs/sql-ref-syntax-qry-select-groupby.md | 44 +++ docs/sql-ref-syntax-qry-select.md | 1 + 2 files changed, 45 insertions(+) diff --git a/docs/sql-ref-syntax-qry-select-groupby.md b/docs/sql-ref-syntax-qry-select-groupby.md index 49a11ca..c461a18 100644 --- a/docs/sql-ref-syntax-qry-select-groupby.md +++ b/docs/sql-ref-syntax-qry-select-groupby.md @@ -21,6 +21,7 @@ license: | The GROUP BY clause is used to group the rows based on a set of specified grouping expressions and compute aggregations on the group of rows based on one or more specified aggregate functions. Spark also supports advanced aggregations to do multiple aggregations for the same input record set via `GROUPING SETS`, `CUBE`, `ROLLUP` clauses. +When a FILTER clause is attached to an aggregate function, only the matching rows are passed to that function. ### Syntax {% highlight sql %} @@ -30,6 +31,11 @@ GROUP BY group_expression [ , group_expression [ , ... ] ] GROUP BY GROUPING SETS (grouping_set [ , ...]) {% endhighlight %} +While aggregate functions are defined as +{% highlight sql %} +aggregate_name ( [ DISTINCT ] expression [ , ... ] ) [ FILTER ( WHERE boolean_expression ) ] +{% endhighlight %} + ### Parameters GROUPING SETS @@ -70,6 +76,19 @@ GROUP BY GROUPING SETS (grouping_set [ , ...]) ((warehouse, product), (warehouse), (product), ()). The N elements of a CUBE specification results in 2^N GROUPING SETS. + aggregate_name + +Specifies an aggregate function name (MIN, MAX, COUNT, SUM, AVG, etc.). + + DISTINCT + +Removes duplicates in input rows before they are passed to aggregate functions. + + FILTER + +Filters the input rows for which the boolean_expression in the WHERE clause evaluates +to true are passed to the aggregate function; other rows are discarded. + ### Examples @@ -120,6 +139,31 @@ SELECT id, sum(quantity) AS sum, max(quantity) AS max FROM dealer GROUP BY id OR |300|13 |8 | +---+---+---+ +-- Count the number of distinct dealer cities per car_model. +SELECT car_model, count(DISTINCT city) AS count FROM dealer GROUP BY car_model; + + ++-+ + | car_model|count| + ++-+ + | Honda Civic|3| + | Honda CRV|2| + |Honda Accord|3| + ++-+ + +-- Sum of only 'Honda Civic' and 'Honda CRV' quantities per dealership. +SELECT id, sum(quantity) FILTER ( +WHERE car_model IN ('Honda Civic', 'Honda CRV') +) AS `sum(quantity)` FROM dealer +GROUP BY id ORDER BY id; + + +---+-+ + | id|sum(quantity)| + +---+-+ + |100| 17| + |200| 23| + |300|5| + +---+-+ + -- Aggregations using multiple sets of grouping columns in a single statement. -- Following performs aggregations based on four sets of grouping columns. -- 1. city, car_model diff --git a/docs/sql-ref-syntax-qry-select.md b/docs/sql-ref-syntax-qry-select.md index e87c4a5..7ad1dd1 100644 --- a/docs/sql-ref-syntax-qry-select.md +++ b/docs/sql-ref-syntax-qry-select.md @@ -92,6 +92,7 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { named_expression [ , ... ] } Specifies the expressions that are used to group the rows. This is used in conjunction with aggregate functions (MIN, MAX, COUNT, SUM, AVG, etc.) to group rows based on the grouping expressions and aggregate values in each group. +When a FILTER clause is attached to an aggregate function, only the matching rows are passed to that function. HAVING - To unsubscribe, e-mail: com
[spark] branch branch-3.0 updated: [SPARK-31326][SQL][DOCS] Create Function docs structure for SQL Reference
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 308a8fd [SPARK-31326][SQL][DOCS] Create Function docs structure for SQL Reference 308a8fd is described below commit 308a8fd3c67704b7fce0067a199707f46c6e6f1e Author: Huaxin Gao AuthorDate: Fri Apr 3 14:36:03 2020 +0900 [SPARK-31326][SQL][DOCS] Create Function docs structure for SQL Reference ### What changes were proposed in this pull request? Create Function docs structure for SQL Reference... ### Why are the changes needed? so the Function docs can be added later, also want to get a consensus about what to document for Functions in SQL Reference. ### Does this PR introduce any user-facing change? Yes https://user-images.githubusercontent.com/13592258/78220451-68b6e100-7476-11ea-9a21-733b41652785.png;> https://user-images.githubusercontent.com/13592258/78220460-6ce2fe80-7476-11ea-887c-defefd55c19d.png;> https://user-images.githubusercontent.com/13592258/78220463-6f455880-7476-11ea-81fc-fd4137db7c3f.png;> ### How was this patch tested? Manually build and check Closes #28099 from huaxingao/function. Authored-by: Huaxin Gao Signed-off-by: Takeshi Yamamuro (cherry picked from commit 4e45c07f5dbc4b178c41449320b7405b20aa05e9) Signed-off-by: Takeshi Yamamuro --- docs/_data/menu-sql.yaml| 21 + docs/sql-ref-functions-builtin-aggregate.md | 10 +- ...regate.md => sql-ref-functions-builtin-array.md} | 6 +++--- ...te.md => sql-ref-functions-builtin-date-time.md} | 6 +++--- docs/sql-ref-functions-builtin.md | 17 + ...n-aggregate.md => sql-ref-functions-udf-hive.md} | 10 +- docs/sql-ref-functions-udf.md | 17 + docs/sql-ref-functions.md | 13 + 8 files changed, 60 insertions(+), 40 deletions(-) diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml index 6534c50..500895a 100644 --- a/docs/_data/menu-sql.yaml +++ b/docs/_data/menu-sql.yaml @@ -225,5 +225,26 @@ url: sql-ref-syntax-aux-resource-mgmt-list-file.html - text: LIST JAR url: sql-ref-syntax-aux-resource-mgmt-list-jar.html +- text: Functions + url: sql-ref-functions.html + subitems: + - text: Build-in Functions +url: sql-ref-functions-builtin.html +subitems: +- text: Build-in Aggregate Functions + url: sql-ref-functions-builtin-aggregate.html +- text: Build-in Array Functions + url: sql-ref-functions-builtin-array.html +- text: Build-in Date Time Functions + url: sql-ref-functions-builtin-date-time.html + - text: UDFs (User-Defined Functions) +url: sql-ref-functions-udf.html +subitems: +- text: Scalar UDFs (User-Defined Functions) + url: sql-ref-functions-udf-scalar.html +- text: UDAFs (User-Defined Aggregate Functions) + url: sql-ref-functions-udf-aggregate.html +- text: Integration with Hive UDFs/UDAFs/UDTFs + url: sql-ref-functions-udf-hive.html - text: Datetime Pattern url: sql-ref-datetime-pattern.html diff --git a/docs/sql-ref-functions-builtin-aggregate.md b/docs/sql-ref-functions-builtin-aggregate.md index 3fcd782..d595436 100644 --- a/docs/sql-ref-functions-builtin-aggregate.md +++ b/docs/sql-ref-functions-builtin-aggregate.md @@ -1,7 +1,7 @@ --- layout: global -title: Builtin Aggregate Functions -displayTitle: Builtin Aggregate Functions +title: Built-in Aggregate Functions +displayTitle: Built-in Aggregate Functions license: | Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with @@ -9,9 +9,9 @@ license: | The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at - + http://www.apache.org/licenses/LICENSE-2.0 - + Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. @@ -19,4 +19,4 @@ license: | limitations under the License. --- -**This page is under construction** +Aggregate functions \ No newline at end of file diff --git a/docs/sql-ref-functions-builtin-aggregate.md b/docs/sql-ref-functions-builtin-array.md similarity index 87% copy from docs/sql-ref-functions-builtin-aggregate.md copy to docs/sql-ref-functions-b
[spark] branch master updated (820bb99 -> 4e45c07)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 820bb99 [SPARK-31328][SQL] Fix rebasing of overlapped local timestamps during daylight saving time add 4e45c07 [SPARK-31326][SQL][DOCS] Create Function docs structure for SQL Reference No new revisions were added by this update. Summary of changes: docs/_data/menu-sql.yaml| 21 + docs/sql-ref-functions-builtin-aggregate.md | 10 +- ...ueries.md => sql-ref-functions-builtin-array.md} | 6 +++--- ...ar.md => sql-ref-functions-builtin-date-time.md} | 6 +++--- docs/sql-ref-functions-builtin.md | 17 + ...aux-analyze.md => sql-ref-functions-udf-hive.md} | 6 +++--- docs/sql-ref-functions-udf.md | 17 + docs/sql-ref-functions.md | 13 + 8 files changed, 58 insertions(+), 38 deletions(-) copy docs/{sql-ref-syntax-qry-select-subqueries.md => sql-ref-functions-builtin-array.md} (90%) copy docs/{sql-ref-functions-builtin-scalar.md => sql-ref-functions-builtin-date-time.md} (88%) copy docs/{sql-ref-syntax-aux-analyze.md => sql-ref-functions-udf-hive.md} (85%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31305][SQL][DOCS] Add a page to list all commands in SQL Reference
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 01b26c4 [SPARK-31305][SQL][DOCS] Add a page to list all commands in SQL Reference 01b26c4 is described below commit 01b26c49009d8136f1f962e87ce7e35db43533ab Author: Huaxin Gao AuthorDate: Wed Apr 1 08:42:15 2020 +0900 [SPARK-31305][SQL][DOCS] Add a page to list all commands in SQL Reference ### What changes were proposed in this pull request? Add a page to list all commands in SQL Reference... ### Why are the changes needed? so it's easier for user to find a specific command. ### Does this PR introduce any user-facing change? before: ![image](https://user-images.githubusercontent.com/13592258/77938658-ec03e700-726a-11ea-983c-7a559cc0aae2.png) after: ![image](https://user-images.githubusercontent.com/13592258/77937899-d3df9800-7269-11ea-85db-749a9521576a.png) ![image](https://user-images.githubusercontent.com/13592258/77937924-db9f3c80-7269-11ea-9441-7603feee421c.png) Also move ```use database``` from query category to ddl category. ### How was this patch tested? Manually build and check Closes #28074 from huaxingao/list-all. Authored-by: Huaxin Gao Signed-off-by: Takeshi Yamamuro (cherry picked from commit 1a7f9649b67d2108cb14e9e466855dfe52db6d66) Signed-off-by: Takeshi Yamamuro --- docs/_data/menu-sql.yaml | 4 +-- docs/sql-ref-syntax-ddl.md | 1 + docs/sql-ref-syntax.md | 62 +- 3 files changed, 64 insertions(+), 3 deletions(-) diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml index 3bf4952..6534c50 100644 --- a/docs/_data/menu-sql.yaml +++ b/docs/_data/menu-sql.yaml @@ -123,6 +123,8 @@ url: sql-ref-syntax-ddl-truncate-table.html - text: REPAIR TABLE url: sql-ref-syntax-ddl-repair-table.html +- text: USE DATABASE + url: sql-ref-syntax-qry-select-usedb.html - text: Data Manipulation Statements url: sql-ref-syntax-dml.html subitems: @@ -152,8 +154,6 @@ url: sql-ref-syntax-qry-select-distribute-by.html - text: LIMIT Clause url: sql-ref-syntax-qry-select-limit.html -- text: USE database - url: sql-ref-syntax-qry-select-usedb.html - text: EXPLAIN url: sql-ref-syntax-qry-explain.html - text: Auxiliary Statements diff --git a/docs/sql-ref-syntax-ddl.md b/docs/sql-ref-syntax-ddl.md index 954020a..ab4e95a 100644 --- a/docs/sql-ref-syntax-ddl.md +++ b/docs/sql-ref-syntax-ddl.md @@ -36,3 +36,4 @@ Data Definition Statements are used to create or modify the structure of databas - [DROP VIEW](sql-ref-syntax-ddl-drop-view.html) - [TRUNCATE TABLE](sql-ref-syntax-ddl-truncate-table.html) - [REPAIR TABLE](sql-ref-syntax-ddl-repair-table.html) +- [USE DATABASE](sql-ref-syntax-qry-select-usedb.html) diff --git a/docs/sql-ref-syntax.md b/docs/sql-ref-syntax.md index 2510278..3db97ac 100644 --- a/docs/sql-ref-syntax.md +++ b/docs/sql-ref-syntax.md @@ -19,4 +19,64 @@ license: | limitations under the License. --- -Spark SQL is Apache Spark's module for working with structured data. The SQL Syntax section describes the SQL syntax in detail along with usage examples when applicable. +Spark SQL is Apache Spark's module for working with structured data. The SQL Syntax section describes the SQL syntax in detail along with usage examples when applicable. This document provides a list of Data Definition and Data Manipulation Statements, as well as Data Retrieval and Auxiliary Statements. + +### DDL Statements +- [ALTER DATABASE](sql-ref-syntax-ddl-alter-database.html) +- [ALTER TABLE](sql-ref-syntax-ddl-alter-table.html) +- [ALTER VIEW](sql-ref-syntax-ddl-alter-view.html) +- [CREATE DATABASE](sql-ref-syntax-ddl-create-database.html) +- [CREATE FUNCTION](sql-ref-syntax-ddl-create-function.html) +- [CREATE TABLE](sql-ref-syntax-ddl-create-table.html) +- [CREATE VIEW](sql-ref-syntax-ddl-create-view.html) +- [DROP DATABASE](sql-ref-syntax-ddl-drop-database.html) +- [DROP FUNCTION](sql-ref-syntax-ddl-drop-function.html) +- [DROP TABLE](sql-ref-syntax-ddl-drop-table.html) +- [DROP VIEW](sql-ref-syntax-ddl-drop-view.html) +- [REPAIR TABLE](sql-ref-syntax-ddl-repair-table.html) +- [TRUNCATE TABLE](sql-ref-syntax-ddl-truncate-table.html) +- [USE DATABASE](sql-ref-syntax-qry-select-usedb.html) + +### DML Statements +- [INSERT INTO](sql-ref-syntax-dml-insert-into.html) +- [INSERT OVERWRITE](sql-ref-syntax-dml-insert-overwrite-table.html) +- [INSERT OVERWRITE DIRECTORY](sql-ref-syntax-dml-insert-overwrite-directory.html) +- [INSERT OVERWRITE DIRECTORY
[spark] branch master updated: [SPARK-31305][SQL][DOCS] Add a page to list all commands in SQL Reference
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 1a7f964 [SPARK-31305][SQL][DOCS] Add a page to list all commands in SQL Reference 1a7f964 is described below commit 1a7f9649b67d2108cb14e9e466855dfe52db6d66 Author: Huaxin Gao AuthorDate: Wed Apr 1 08:42:15 2020 +0900 [SPARK-31305][SQL][DOCS] Add a page to list all commands in SQL Reference ### What changes were proposed in this pull request? Add a page to list all commands in SQL Reference... ### Why are the changes needed? so it's easier for user to find a specific command. ### Does this PR introduce any user-facing change? before: ![image](https://user-images.githubusercontent.com/13592258/77938658-ec03e700-726a-11ea-983c-7a559cc0aae2.png) after: ![image](https://user-images.githubusercontent.com/13592258/77937899-d3df9800-7269-11ea-85db-749a9521576a.png) ![image](https://user-images.githubusercontent.com/13592258/77937924-db9f3c80-7269-11ea-9441-7603feee421c.png) Also move ```use database``` from query category to ddl category. ### How was this patch tested? Manually build and check Closes #28074 from huaxingao/list-all. Authored-by: Huaxin Gao Signed-off-by: Takeshi Yamamuro --- docs/_data/menu-sql.yaml | 4 +-- docs/sql-ref-syntax-ddl.md | 1 + docs/sql-ref-syntax.md | 62 +- 3 files changed, 64 insertions(+), 3 deletions(-) diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml index 3bf4952..6534c50 100644 --- a/docs/_data/menu-sql.yaml +++ b/docs/_data/menu-sql.yaml @@ -123,6 +123,8 @@ url: sql-ref-syntax-ddl-truncate-table.html - text: REPAIR TABLE url: sql-ref-syntax-ddl-repair-table.html +- text: USE DATABASE + url: sql-ref-syntax-qry-select-usedb.html - text: Data Manipulation Statements url: sql-ref-syntax-dml.html subitems: @@ -152,8 +154,6 @@ url: sql-ref-syntax-qry-select-distribute-by.html - text: LIMIT Clause url: sql-ref-syntax-qry-select-limit.html -- text: USE database - url: sql-ref-syntax-qry-select-usedb.html - text: EXPLAIN url: sql-ref-syntax-qry-explain.html - text: Auxiliary Statements diff --git a/docs/sql-ref-syntax-ddl.md b/docs/sql-ref-syntax-ddl.md index 954020a..ab4e95a 100644 --- a/docs/sql-ref-syntax-ddl.md +++ b/docs/sql-ref-syntax-ddl.md @@ -36,3 +36,4 @@ Data Definition Statements are used to create or modify the structure of databas - [DROP VIEW](sql-ref-syntax-ddl-drop-view.html) - [TRUNCATE TABLE](sql-ref-syntax-ddl-truncate-table.html) - [REPAIR TABLE](sql-ref-syntax-ddl-repair-table.html) +- [USE DATABASE](sql-ref-syntax-qry-select-usedb.html) diff --git a/docs/sql-ref-syntax.md b/docs/sql-ref-syntax.md index 2510278..3db97ac 100644 --- a/docs/sql-ref-syntax.md +++ b/docs/sql-ref-syntax.md @@ -19,4 +19,64 @@ license: | limitations under the License. --- -Spark SQL is Apache Spark's module for working with structured data. The SQL Syntax section describes the SQL syntax in detail along with usage examples when applicable. +Spark SQL is Apache Spark's module for working with structured data. The SQL Syntax section describes the SQL syntax in detail along with usage examples when applicable. This document provides a list of Data Definition and Data Manipulation Statements, as well as Data Retrieval and Auxiliary Statements. + +### DDL Statements +- [ALTER DATABASE](sql-ref-syntax-ddl-alter-database.html) +- [ALTER TABLE](sql-ref-syntax-ddl-alter-table.html) +- [ALTER VIEW](sql-ref-syntax-ddl-alter-view.html) +- [CREATE DATABASE](sql-ref-syntax-ddl-create-database.html) +- [CREATE FUNCTION](sql-ref-syntax-ddl-create-function.html) +- [CREATE TABLE](sql-ref-syntax-ddl-create-table.html) +- [CREATE VIEW](sql-ref-syntax-ddl-create-view.html) +- [DROP DATABASE](sql-ref-syntax-ddl-drop-database.html) +- [DROP FUNCTION](sql-ref-syntax-ddl-drop-function.html) +- [DROP TABLE](sql-ref-syntax-ddl-drop-table.html) +- [DROP VIEW](sql-ref-syntax-ddl-drop-view.html) +- [REPAIR TABLE](sql-ref-syntax-ddl-repair-table.html) +- [TRUNCATE TABLE](sql-ref-syntax-ddl-truncate-table.html) +- [USE DATABASE](sql-ref-syntax-qry-select-usedb.html) + +### DML Statements +- [INSERT INTO](sql-ref-syntax-dml-insert-into.html) +- [INSERT OVERWRITE](sql-ref-syntax-dml-insert-overwrite-table.html) +- [INSERT OVERWRITE DIRECTORY](sql-ref-syntax-dml-insert-overwrite-directory.html) +- [INSERT OVERWRITE DIRECTORY with Hive format](sql-ref-syntax-dml-insert-overwrite-directory-hive.html) +- [LOAD](sql-ref-syntax-dml-load.html
[spark] branch branch-3.0 updated: [SPARK-31292][CORE][SQL] Replace toSet.toSeq with distinct for readability
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 71dcf66 [SPARK-31292][CORE][SQL] Replace toSet.toSeq with distinct for readability 71dcf66 is described below commit 71dcf6691a48dd622b83e128aa9be30f757b45ec Author: Kengo Seki AuthorDate: Sun Mar 29 08:48:08 2020 +0900 [SPARK-31292][CORE][SQL] Replace toSet.toSeq with distinct for readability ### What changes were proposed in this pull request? This PR replaces the method calls of `toSet.toSeq` with `distinct`. ### Why are the changes needed? `toSet.toSeq` is intended to make its elements unique but a bit verbose. Using `distinct` instead is easier to understand and improves readability. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Tested with the existing unit tests and found no problem. Closes #28062 from sekikn/SPARK-31292. Authored-by: Kengo Seki Signed-off-by: Takeshi Yamamuro (cherry picked from commit 0b237bd615da4b2c2b781e72af4ad3a4f2951444) Signed-off-by: Takeshi Yamamuro --- core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala | 2 +- core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala | 2 +- core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala | 2 +- core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala | 2 +- .../test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala | 2 +- sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala | 2 +- 6 files changed, 6 insertions(+), 6 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala b/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala index 7dd7fc1..994b363 100644 --- a/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala +++ b/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala @@ -149,7 +149,7 @@ private[spark] object ResourceUtils extends Logging { def listResourceIds(sparkConf: SparkConf, componentName: String): Seq[ResourceID] = { sparkConf.getAllWithPrefix(s"$componentName.$RESOURCE_PREFIX.").map { case (key, _) => key.substring(0, key.indexOf('.')) -}.toSet.toSeq.map(name => new ResourceID(componentName, name)) +}.distinct.map(name => new ResourceID(componentName, name)) } def parseAllResourceRequests( diff --git a/core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala b/core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala index 857c89d..15f2161 100644 --- a/core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala +++ b/core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala @@ -69,7 +69,7 @@ private[spark] class ResultTask[T, U]( with Serializable { @transient private[this] val preferredLocs: Seq[TaskLocation] = { -if (locs == null) Nil else locs.toSet.toSeq +if (locs == null) Nil else locs.distinct } override def runTask(context: TaskContext): U = { diff --git a/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala b/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala index 4c0c30a..a0ba920 100644 --- a/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala +++ b/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala @@ -71,7 +71,7 @@ private[spark] class ShuffleMapTask( } @transient private val preferredLocs: Seq[TaskLocation] = { -if (locs == null) Nil else locs.toSet.toSeq +if (locs == null) Nil else locs.distinct } override def runTask(context: TaskContext): MapStatus = { diff --git a/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala b/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala index 6a1d460..ed30473 100644 --- a/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala +++ b/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala @@ -408,7 +408,7 @@ private[spark] class TaskSchedulerImpl( newExecAvail = true } } -val hosts = offers.map(_.host).toSet.toSeq +val hosts = offers.map(_.host).distinct for ((host, Some(rack)) <- hosts.zip(getRacksForHosts(hosts))) { hostsByRack.getOrElseUpdate(rack, new HashSet[String]()) += host } diff --git a/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala index e7ecf84..a083cdb 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala @@ -758,7 +758,7 @@ class TaskSche
[spark] branch branch-3.0 updated: [SPARK-31292][CORE][SQL] Replace toSet.toSeq with distinct for readability
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 71dcf66 [SPARK-31292][CORE][SQL] Replace toSet.toSeq with distinct for readability 71dcf66 is described below commit 71dcf6691a48dd622b83e128aa9be30f757b45ec Author: Kengo Seki AuthorDate: Sun Mar 29 08:48:08 2020 +0900 [SPARK-31292][CORE][SQL] Replace toSet.toSeq with distinct for readability ### What changes were proposed in this pull request? This PR replaces the method calls of `toSet.toSeq` with `distinct`. ### Why are the changes needed? `toSet.toSeq` is intended to make its elements unique but a bit verbose. Using `distinct` instead is easier to understand and improves readability. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Tested with the existing unit tests and found no problem. Closes #28062 from sekikn/SPARK-31292. Authored-by: Kengo Seki Signed-off-by: Takeshi Yamamuro (cherry picked from commit 0b237bd615da4b2c2b781e72af4ad3a4f2951444) Signed-off-by: Takeshi Yamamuro --- core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala | 2 +- core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala | 2 +- core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala | 2 +- core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala | 2 +- .../test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala | 2 +- sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala | 2 +- 6 files changed, 6 insertions(+), 6 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala b/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala index 7dd7fc1..994b363 100644 --- a/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala +++ b/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala @@ -149,7 +149,7 @@ private[spark] object ResourceUtils extends Logging { def listResourceIds(sparkConf: SparkConf, componentName: String): Seq[ResourceID] = { sparkConf.getAllWithPrefix(s"$componentName.$RESOURCE_PREFIX.").map { case (key, _) => key.substring(0, key.indexOf('.')) -}.toSet.toSeq.map(name => new ResourceID(componentName, name)) +}.distinct.map(name => new ResourceID(componentName, name)) } def parseAllResourceRequests( diff --git a/core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala b/core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala index 857c89d..15f2161 100644 --- a/core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala +++ b/core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala @@ -69,7 +69,7 @@ private[spark] class ResultTask[T, U]( with Serializable { @transient private[this] val preferredLocs: Seq[TaskLocation] = { -if (locs == null) Nil else locs.toSet.toSeq +if (locs == null) Nil else locs.distinct } override def runTask(context: TaskContext): U = { diff --git a/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala b/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala index 4c0c30a..a0ba920 100644 --- a/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala +++ b/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala @@ -71,7 +71,7 @@ private[spark] class ShuffleMapTask( } @transient private val preferredLocs: Seq[TaskLocation] = { -if (locs == null) Nil else locs.toSet.toSeq +if (locs == null) Nil else locs.distinct } override def runTask(context: TaskContext): MapStatus = { diff --git a/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala b/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala index 6a1d460..ed30473 100644 --- a/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala +++ b/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala @@ -408,7 +408,7 @@ private[spark] class TaskSchedulerImpl( newExecAvail = true } } -val hosts = offers.map(_.host).toSet.toSeq +val hosts = offers.map(_.host).distinct for ((host, Some(rack)) <- hosts.zip(getRacksForHosts(hosts))) { hostsByRack.getOrElseUpdate(rack, new HashSet[String]()) += host } diff --git a/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala index e7ecf84..a083cdb 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala @@ -758,7 +758,7 @@ class TaskSche
[spark] branch master updated: [SPARK-31292][CORE][SQL] Replace toSet.toSeq with distinct for readability
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 0b237bd [SPARK-31292][CORE][SQL] Replace toSet.toSeq with distinct for readability 0b237bd is described below commit 0b237bd615da4b2c2b781e72af4ad3a4f2951444 Author: Kengo Seki AuthorDate: Sun Mar 29 08:48:08 2020 +0900 [SPARK-31292][CORE][SQL] Replace toSet.toSeq with distinct for readability ### What changes were proposed in this pull request? This PR replaces the method calls of `toSet.toSeq` with `distinct`. ### Why are the changes needed? `toSet.toSeq` is intended to make its elements unique but a bit verbose. Using `distinct` instead is easier to understand and improves readability. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Tested with the existing unit tests and found no problem. Closes #28062 from sekikn/SPARK-31292. Authored-by: Kengo Seki Signed-off-by: Takeshi Yamamuro --- core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala | 2 +- core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala | 2 +- core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala | 2 +- core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala | 2 +- .../test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala | 2 +- sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala | 2 +- 6 files changed, 6 insertions(+), 6 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala b/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala index 36ef906..162f090 100644 --- a/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala +++ b/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala @@ -150,7 +150,7 @@ private[spark] object ResourceUtils extends Logging { def listResourceIds(sparkConf: SparkConf, componentName: String): Seq[ResourceID] = { sparkConf.getAllWithPrefix(s"$componentName.$RESOURCE_PREFIX.").map { case (key, _) => key.substring(0, key.indexOf('.')) -}.toSet.toSeq.map(name => new ResourceID(componentName, name)) +}.distinct.map(name => new ResourceID(componentName, name)) } def parseAllResourceRequests( diff --git a/core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala b/core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala index 857c89d..15f2161 100644 --- a/core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala +++ b/core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala @@ -69,7 +69,7 @@ private[spark] class ResultTask[T, U]( with Serializable { @transient private[this] val preferredLocs: Seq[TaskLocation] = { -if (locs == null) Nil else locs.toSet.toSeq +if (locs == null) Nil else locs.distinct } override def runTask(context: TaskContext): U = { diff --git a/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala b/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala index 4c0c30a..a0ba920 100644 --- a/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala +++ b/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala @@ -71,7 +71,7 @@ private[spark] class ShuffleMapTask( } @transient private val preferredLocs: Seq[TaskLocation] = { -if (locs == null) Nil else locs.toSet.toSeq +if (locs == null) Nil else locs.distinct } override def runTask(context: TaskContext): MapStatus = { diff --git a/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala b/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala index 7e2fbb4..f0f84fe 100644 --- a/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala +++ b/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala @@ -487,7 +487,7 @@ private[spark] class TaskSchedulerImpl( newExecAvail = true } } -val hosts = offers.map(_.host).toSet.toSeq +val hosts = offers.map(_.host).distinct for ((host, Some(rack)) <- hosts.zip(getRacksForHosts(hosts))) { hostsByRack.getOrElseUpdate(rack, new HashSet[String]()) += host } diff --git a/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala index 9ee84a8..b9a11e7 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala @@ -761,7 +761,7 @@ class TaskSchedulerImplSuite extends SparkFunSuite with LocalSparkContext with B // that are explicitly blacklisted, plu
[spark] branch master updated (d025ddba -> 0b237bd)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d025ddba [SPARK-31238][SPARK-31284][TEST][FOLLOWUP] Fix readResourceOrcFile to create a local file from resource add 0b237bd [SPARK-31292][CORE][SQL] Replace toSet.toSeq with distinct for readability No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala | 2 +- core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala | 2 +- core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala | 2 +- core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala | 2 +- .../test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala | 2 +- sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala | 2 +- 6 files changed, 6 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31262][SQL][TESTS] Fix bug tests imported bracketed comments
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 6f30ff4 [SPARK-31262][SQL][TESTS] Fix bug tests imported bracketed comments 6f30ff4 is described below commit 6f30ff44cf2d3d347a516a0e0370d07e8de9352c Author: beliefer AuthorDate: Fri Mar 27 08:09:17 2020 +0900 [SPARK-31262][SQL][TESTS] Fix bug tests imported bracketed comments ### What changes were proposed in this pull request? This PR related to https://github.com/apache/spark/pull/27481. If test case A uses `--IMPORT` to import test case B contains bracketed comments, the output can't display bracketed comments in golden files well. The content of `nested-comments.sql` show below: ``` -- This test case just used to test imported bracketed comments. -- the first case of bracketed comment --QUERY-DELIMITER-START /* This is the first example of bracketed comment. SELECT 'ommented out content' AS first; */ SELECT 'selected content' AS first; --QUERY-DELIMITER-END ``` The test case `comments.sql` imports `nested-comments.sql` below: `--IMPORT nested-comments.sql` Before this PR, the output will be: ``` -- !query /* This is the first example of bracketed comment. SELECT 'ommented out content' AS first -- !query schema struct<> -- !query output org.apache.spark.sql.catalyst.parser.ParseException mismatched input '/' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', ' ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) == SQL == /* This is the first example of bracketed comment. ^^^ SELECT 'ommented out content' AS first -- !query */ SELECT 'selected content' AS first -- !query schema struct<> -- !query output org.apache.spark.sql.catalyst.parser.ParseException extraneous input '*/' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) == SQL == */ ^^^ SELECT 'selected content' AS first ``` After this PR, the output will be: ``` -- !query /* This is the first example of bracketed comment. SELECT 'ommented out content' AS first; */ SELECT 'selected content' AS first -- !query schema struct -- !query output selected content ``` ### Why are the changes needed? Golden files can't display the bracketed comments in imported test cases. ### Does this PR introduce any user-facing change? 'No'. ### How was this patch tested? New UT. Closes #28018 from beliefer/fix-bug-tests-imported-bracketed-comments. Authored-by: beliefer Signed-off-by: Takeshi Yamamuro (cherry picked from commit 9e0fee933e62eb309d4aa32bb1e5126125d0bf9f) Signed-off-by: Takeshi Yamamuro --- .../src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala index 6c66166..848966a 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala @@ -256,20 +256,23 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession { def splitWithSemicolon(seq: Seq[String]) = { seq.mkString("\n").split("(?<=[^]);") } -val input = fileToString(new File(testCase.inputFile)) -val (comments, code) = input.split("\n").partition { line => +def splitCommentsAndCodes(input: String) = input.split("\n").partition { line => val newLine = line.trim newLine.startsWith("--") && !newLine.startsWith("--QUERY-DELIMITER") } +val input = fileToString(new File(testCase.inputFile)) + +val (comments, code) = splitCommentsAndCodes(input) + // If `--IMPORT` found, load code from another test case file, then insert
[spark] branch master updated: [SPARK-31262][SQL][TESTS] Fix bug tests imported bracketed comments
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 9e0fee9 [SPARK-31262][SQL][TESTS] Fix bug tests imported bracketed comments 9e0fee9 is described below commit 9e0fee933e62eb309d4aa32bb1e5126125d0bf9f Author: beliefer AuthorDate: Fri Mar 27 08:09:17 2020 +0900 [SPARK-31262][SQL][TESTS] Fix bug tests imported bracketed comments ### What changes were proposed in this pull request? This PR related to https://github.com/apache/spark/pull/27481. If test case A uses `--IMPORT` to import test case B contains bracketed comments, the output can't display bracketed comments in golden files well. The content of `nested-comments.sql` show below: ``` -- This test case just used to test imported bracketed comments. -- the first case of bracketed comment --QUERY-DELIMITER-START /* This is the first example of bracketed comment. SELECT 'ommented out content' AS first; */ SELECT 'selected content' AS first; --QUERY-DELIMITER-END ``` The test case `comments.sql` imports `nested-comments.sql` below: `--IMPORT nested-comments.sql` Before this PR, the output will be: ``` -- !query /* This is the first example of bracketed comment. SELECT 'ommented out content' AS first -- !query schema struct<> -- !query output org.apache.spark.sql.catalyst.parser.ParseException mismatched input '/' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', ' ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) == SQL == /* This is the first example of bracketed comment. ^^^ SELECT 'ommented out content' AS first -- !query */ SELECT 'selected content' AS first -- !query schema struct<> -- !query output org.apache.spark.sql.catalyst.parser.ParseException extraneous input '*/' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) == SQL == */ ^^^ SELECT 'selected content' AS first ``` After this PR, the output will be: ``` -- !query /* This is the first example of bracketed comment. SELECT 'ommented out content' AS first; */ SELECT 'selected content' AS first -- !query schema struct -- !query output selected content ``` ### Why are the changes needed? Golden files can't display the bracketed comments in imported test cases. ### Does this PR introduce any user-facing change? 'No'. ### How was this patch tested? New UT. Closes #28018 from beliefer/fix-bug-tests-imported-bracketed-comments. Authored-by: beliefer Signed-off-by: Takeshi Yamamuro --- .../src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala index 6c66166..848966a 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala @@ -256,20 +256,23 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession { def splitWithSemicolon(seq: Seq[String]) = { seq.mkString("\n").split("(?<=[^]);") } -val input = fileToString(new File(testCase.inputFile)) -val (comments, code) = input.split("\n").partition { line => +def splitCommentsAndCodes(input: String) = input.split("\n").partition { line => val newLine = line.trim newLine.startsWith("--") && !newLine.startsWith("--QUERY-DELIMITER") } +val input = fileToString(new File(testCase.inputFile)) + +val (comments, code) = splitCommentsAndCodes(input) + // If `--IMPORT` found, load code from another test case file, then insert them // into the head in this test. val importedTestCaseName = comments.filter(_.startsWith("--IMPOR
[spark] branch branch-3.0 updated: [SPARK-30292][SQL][FOLLOWUP] ansi cast from strings to integral numbers (byte/short/int/long) should fail with fraction
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 1a5cd16 [SPARK-30292][SQL][FOLLOWUP] ansi cast from strings to integral numbers (byte/short/int/long) should fail with fraction 1a5cd16 is described below commit 1a5cd167e0901948d68d6c7880d39966e74d10b3 Author: Wenchen Fan AuthorDate: Fri Mar 20 00:52:09 2020 +0900 [SPARK-30292][SQL][FOLLOWUP] ansi cast from strings to integral numbers (byte/short/int/long) should fail with fraction ### What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/26933 Fraction string like "1.23" is definitely not a valid integral format and we should fail to do the cast under the ANSI mode. ### Why are the changes needed? correct the ANSI cast behavior from string to integral ### Does this PR introduce any user-facing change? Yes under ANSI mode, but ANSI mode is off by default. ### How was this patch tested? new test Closes #27957 from cloud-fan/ansi. Authored-by: Wenchen Fan Signed-off-by: Takeshi Yamamuro (cherry picked from commit ac262cb27255f989f6a6dd864bd5114a928b96da) Signed-off-by: Takeshi Yamamuro --- .../org/apache/spark/unsafe/types/UTF8String.java | 24 +- .../spark/sql/catalyst/expressions/CastSuite.scala | 2 ++ 2 files changed, 16 insertions(+), 10 deletions(-) diff --git a/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java b/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java index c538466..186597f 100644 --- a/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java +++ b/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java @@ -1105,6 +1105,10 @@ public final class UTF8String implements Comparable, Externalizable, * @return true if the parsing was successful else false */ public boolean toLong(LongWrapper toLongResult) { +return toLong(toLongResult, true); + } + + private boolean toLong(LongWrapper toLongResult, boolean allowDecimal) { int offset = 0; while (offset < this.numBytes && getByte(offset) <= ' ') offset++; if (offset == this.numBytes) return false; @@ -1129,7 +1133,7 @@ public final class UTF8String implements Comparable, Externalizable, while (offset <= end) { b = getByte(offset); offset++; - if (b == separator) { + if (b == separator && allowDecimal) { // We allow decimals and will return a truncated integral in that case. // Therefore we won't throw an exception here (checking the fractional // part happens below.) @@ -1198,6 +1202,10 @@ public final class UTF8String implements Comparable, Externalizable, * @return true if the parsing was successful else false */ public boolean toInt(IntWrapper intWrapper) { +return toInt(intWrapper, true); + } + + private boolean toInt(IntWrapper intWrapper, boolean allowDecimal) { int offset = 0; while (offset < this.numBytes && getByte(offset) <= ' ') offset++; if (offset == this.numBytes) return false; @@ -1222,7 +1230,7 @@ public final class UTF8String implements Comparable, Externalizable, while (offset <= end) { b = getByte(offset); offset++; - if (b == separator) { + if (b == separator && allowDecimal) { // We allow decimals and will return a truncated integral in that case. // Therefore we won't throw an exception here (checking the fractional // part happens below.) @@ -1276,9 +1284,7 @@ public final class UTF8String implements Comparable, Externalizable, if (toInt(intWrapper)) { int intValue = intWrapper.value; short result = (short) intValue; - if (result == intValue) { -return true; - } + return result == intValue; } return false; } @@ -1287,9 +1293,7 @@ public final class UTF8String implements Comparable, Externalizable, if (toInt(intWrapper)) { int intValue = intWrapper.value; byte result = (byte) intValue; - if (result == intValue) { -return true; - } + return result == intValue; } return false; } @@ -1302,7 +1306,7 @@ public final class UTF8String implements Comparable, Externalizable, */ public long toLongExact() { LongWrapper result = new LongWrapper(); -if (toLong(result)) { +if (toLong(result, false)) { return result.value; } throw new NumberFormatException("invalid input syntax for type numeric: " + this); @@ -1316,7 +1320,7 @@ public final class UTF8String implements Comparable, Externalizable, *
[spark] branch branch-3.0 updated: [SPARK-30292][SQL][FOLLOWUP] ansi cast from strings to integral numbers (byte/short/int/long) should fail with fraction
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 1a5cd16 [SPARK-30292][SQL][FOLLOWUP] ansi cast from strings to integral numbers (byte/short/int/long) should fail with fraction 1a5cd16 is described below commit 1a5cd167e0901948d68d6c7880d39966e74d10b3 Author: Wenchen Fan AuthorDate: Fri Mar 20 00:52:09 2020 +0900 [SPARK-30292][SQL][FOLLOWUP] ansi cast from strings to integral numbers (byte/short/int/long) should fail with fraction ### What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/26933 Fraction string like "1.23" is definitely not a valid integral format and we should fail to do the cast under the ANSI mode. ### Why are the changes needed? correct the ANSI cast behavior from string to integral ### Does this PR introduce any user-facing change? Yes under ANSI mode, but ANSI mode is off by default. ### How was this patch tested? new test Closes #27957 from cloud-fan/ansi. Authored-by: Wenchen Fan Signed-off-by: Takeshi Yamamuro (cherry picked from commit ac262cb27255f989f6a6dd864bd5114a928b96da) Signed-off-by: Takeshi Yamamuro --- .../org/apache/spark/unsafe/types/UTF8String.java | 24 +- .../spark/sql/catalyst/expressions/CastSuite.scala | 2 ++ 2 files changed, 16 insertions(+), 10 deletions(-) diff --git a/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java b/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java index c538466..186597f 100644 --- a/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java +++ b/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java @@ -1105,6 +1105,10 @@ public final class UTF8String implements Comparable, Externalizable, * @return true if the parsing was successful else false */ public boolean toLong(LongWrapper toLongResult) { +return toLong(toLongResult, true); + } + + private boolean toLong(LongWrapper toLongResult, boolean allowDecimal) { int offset = 0; while (offset < this.numBytes && getByte(offset) <= ' ') offset++; if (offset == this.numBytes) return false; @@ -1129,7 +1133,7 @@ public final class UTF8String implements Comparable, Externalizable, while (offset <= end) { b = getByte(offset); offset++; - if (b == separator) { + if (b == separator && allowDecimal) { // We allow decimals and will return a truncated integral in that case. // Therefore we won't throw an exception here (checking the fractional // part happens below.) @@ -1198,6 +1202,10 @@ public final class UTF8String implements Comparable, Externalizable, * @return true if the parsing was successful else false */ public boolean toInt(IntWrapper intWrapper) { +return toInt(intWrapper, true); + } + + private boolean toInt(IntWrapper intWrapper, boolean allowDecimal) { int offset = 0; while (offset < this.numBytes && getByte(offset) <= ' ') offset++; if (offset == this.numBytes) return false; @@ -1222,7 +1230,7 @@ public final class UTF8String implements Comparable, Externalizable, while (offset <= end) { b = getByte(offset); offset++; - if (b == separator) { + if (b == separator && allowDecimal) { // We allow decimals and will return a truncated integral in that case. // Therefore we won't throw an exception here (checking the fractional // part happens below.) @@ -1276,9 +1284,7 @@ public final class UTF8String implements Comparable, Externalizable, if (toInt(intWrapper)) { int intValue = intWrapper.value; short result = (short) intValue; - if (result == intValue) { -return true; - } + return result == intValue; } return false; } @@ -1287,9 +1293,7 @@ public final class UTF8String implements Comparable, Externalizable, if (toInt(intWrapper)) { int intValue = intWrapper.value; byte result = (byte) intValue; - if (result == intValue) { -return true; - } + return result == intValue; } return false; } @@ -1302,7 +1306,7 @@ public final class UTF8String implements Comparable, Externalizable, */ public long toLongExact() { LongWrapper result = new LongWrapper(); -if (toLong(result)) { +if (toLong(result, false)) { return result.value; } throw new NumberFormatException("invalid input syntax for type numeric: " + this); @@ -1316,7 +1320,7 @@ public final class UTF8String implements Comparable, Externalizable, *
[spark] branch master updated (a177628 -> ac262cb)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a177628 [SPARK-31187][SQL] Sort the whole-stage codegen debug output by codegenStageId add ac262cb [SPARK-30292][SQL][FOLLOWUP] ansi cast from strings to integral numbers (byte/short/int/long) should fail with fraction No new revisions were added by this update. Summary of changes: .../org/apache/spark/unsafe/types/UTF8String.java | 24 +- .../spark/sql/catalyst/expressions/CastSuite.scala | 2 ++ 2 files changed, 16 insertions(+), 10 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a177628 -> ac262cb)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a177628 [SPARK-31187][SQL] Sort the whole-stage codegen debug output by codegenStageId add ac262cb [SPARK-30292][SQL][FOLLOWUP] ansi cast from strings to integral numbers (byte/short/int/long) should fail with fraction No new revisions were added by this update. Summary of changes: .../org/apache/spark/unsafe/types/UTF8String.java | 24 +- .../spark/sql/catalyst/expressions/CastSuite.scala | 2 ++ 2 files changed, 16 insertions(+), 10 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31187][SQL] Sort the whole-stage codegen debug output by codegenStageId
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new a8c08b1 [SPARK-31187][SQL] Sort the whole-stage codegen debug output by codegenStageId a8c08b1 is described below commit a8c08b1d81aefd1e3d7f4616b76e2285f9981cc7 Author: Kris Mok AuthorDate: Thu Mar 19 20:53:01 2020 +0900 [SPARK-31187][SQL] Sort the whole-stage codegen debug output by codegenStageId ### What changes were proposed in this pull request? Spark SQL's whole-stage codegen (WSCG) supports dumping the generated code to help with debugging. One way to get the generated code is through `df.queryExecution.debug.codegen`, or SQL `EXPLAIN CODEGEN` statement. The generated code is currently printed without specific ordering, which can make debugging a bit annoying. This PR makes a minor improvement to sort the codegen dump by the `codegenStageId`, ascending. After this change, the following query: ```scala spark.range(10).agg(sum('id)).queryExecution.debug.codegen ``` will always dump the generated code in a natural, stable order. A version of this example with shorter output is: ``` spark.range(10).agg(sum('id)).queryExecution.debug.codegenToSeq.map(_._1).foreach(println) *(1) HashAggregate(keys=[], functions=[partial_sum(id#8L)], output=[sum#15L]) +- *(1) Range (0, 10, step=1, splits=16) *(2) HashAggregate(keys=[], functions=[sum(id#8L)], output=[sum(id)#12L]) +- Exchange SinglePartition, true, [id=#30] +- *(1) HashAggregate(keys=[], functions=[partial_sum(id#8L)], output=[sum#15L]) +- *(1) Range (0, 10, step=1, splits=16) ``` The number of codegen stages within a single SQL query tends to be very small, most likely < 50, so the overhead of adding the sorting shouldn't be significant. ### Why are the changes needed? Minor improvement to aid WSCG debugging. ### Does this PR introduce any user-facing change? No user-facing change for end-users; minor change for developers who debug WSCG generated code. ### How was this patch tested? Manually tested the output; all other tests still pass. Closes #27955 from rednaxelafx/codegen. Authored-by: Kris Mok Signed-off-by: Takeshi Yamamuro (cherry picked from commit a1776288f48d450fea28f50fef78fd6aa10a8160) Signed-off-by: Takeshi Yamamuro --- .../src/main/scala/org/apache/spark/sql/execution/debug/package.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala index 6a57ef2..6c40104 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala @@ -113,7 +113,7 @@ package object debug { s case s => s } -codegenSubtrees.toSeq.map { subtree => +codegenSubtrees.toSeq.sortBy(_.codegenStageId).map { subtree => val (_, source) = subtree.doCodeGen() val codeStats = try { CodeGenerator.compile(source)._2 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-31187][SQL] Sort the whole-stage codegen debug output by codegenStageId
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a177628 [SPARK-31187][SQL] Sort the whole-stage codegen debug output by codegenStageId a177628 is described below commit a1776288f48d450fea28f50fef78fd6aa10a8160 Author: Kris Mok AuthorDate: Thu Mar 19 20:53:01 2020 +0900 [SPARK-31187][SQL] Sort the whole-stage codegen debug output by codegenStageId ### What changes were proposed in this pull request? Spark SQL's whole-stage codegen (WSCG) supports dumping the generated code to help with debugging. One way to get the generated code is through `df.queryExecution.debug.codegen`, or SQL `EXPLAIN CODEGEN` statement. The generated code is currently printed without specific ordering, which can make debugging a bit annoying. This PR makes a minor improvement to sort the codegen dump by the `codegenStageId`, ascending. After this change, the following query: ```scala spark.range(10).agg(sum('id)).queryExecution.debug.codegen ``` will always dump the generated code in a natural, stable order. A version of this example with shorter output is: ``` spark.range(10).agg(sum('id)).queryExecution.debug.codegenToSeq.map(_._1).foreach(println) *(1) HashAggregate(keys=[], functions=[partial_sum(id#8L)], output=[sum#15L]) +- *(1) Range (0, 10, step=1, splits=16) *(2) HashAggregate(keys=[], functions=[sum(id#8L)], output=[sum(id)#12L]) +- Exchange SinglePartition, true, [id=#30] +- *(1) HashAggregate(keys=[], functions=[partial_sum(id#8L)], output=[sum#15L]) +- *(1) Range (0, 10, step=1, splits=16) ``` The number of codegen stages within a single SQL query tends to be very small, most likely < 50, so the overhead of adding the sorting shouldn't be significant. ### Why are the changes needed? Minor improvement to aid WSCG debugging. ### Does this PR introduce any user-facing change? No user-facing change for end-users; minor change for developers who debug WSCG generated code. ### How was this patch tested? Manually tested the output; all other tests still pass. Closes #27955 from rednaxelafx/codegen. Authored-by: Kris Mok Signed-off-by: Takeshi Yamamuro --- .../src/main/scala/org/apache/spark/sql/execution/debug/package.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala index 6a57ef2..6c40104 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala @@ -113,7 +113,7 @@ package object debug { s case s => s } -codegenSubtrees.toSeq.map { subtree => +codegenSubtrees.toSeq.sortBy(_.codegenStageId).map { subtree => val (_, source) = subtree.doCodeGen() val codeStats = try { CodeGenerator.compile(source)._2 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31171][SQL][FOLLOWUP] update document
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 39e9b55 [SPARK-31171][SQL][FOLLOWUP] update document 39e9b55 is described below commit 39e9b554ea171e71ea152c1d3a59f72e8918dfd2 Author: Wenchen Fan AuthorDate: Thu Mar 19 07:29:31 2020 +0900 [SPARK-31171][SQL][FOLLOWUP] update document ### What changes were proposed in this pull request? A followup of https://github.com/apache/spark/pull/27936 to update document. ### Why are the changes needed? correct document ### Does this PR introduce any user-facing change? no ### How was this patch tested? N/A Closes #27950 from cloud-fan/null. Authored-by: Wenchen Fan Signed-off-by: Takeshi Yamamuro (cherry picked from commit 8643e5d9c50294f59b01988d99d447a38776178e) Signed-off-by: Takeshi Yamamuro --- docs/sql-ref-ansi-compliance.md| 7 ++- .../spark/sql/catalyst/expressions/collectionOperations.scala | 6 +++--- sql/core/src/main/scala/org/apache/spark/sql/functions.scala | 4 3 files changed, 13 insertions(+), 4 deletions(-) diff --git a/docs/sql-ref-ansi-compliance.md b/docs/sql-ref-ansi-compliance.md index 27e60b4..bc5bde6 100644 --- a/docs/sql-ref-ansi-compliance.md +++ b/docs/sql-ref-ansi-compliance.md @@ -21,7 +21,7 @@ license: | Since Spark 3.0, Spark SQL introduces two experimental options to comply with the SQL standard: `spark.sql.ansi.enabled` and `spark.sql.storeAssignmentPolicy` (See a table below for details). -When `spark.sql.ansi.enabled` is set to `true`, Spark SQL follows the standard in basic behaviours (e.g., arithmetic operations, type conversion, and SQL parsing). +When `spark.sql.ansi.enabled` is set to `true`, Spark SQL follows the standard in basic behaviours (e.g., arithmetic operations, type conversion, SQL functions and SQL parsing). Moreover, Spark SQL has an independent option to control implicit casting behaviours when inserting rows in a table. The casting behaviours are defined as store assignment rules in the standard. @@ -140,6 +140,11 @@ SELECT * FROM t; {% endhighlight %} +### SQL Functions + +The behavior of some SQL functions can be different under ANSI mode (`spark.sql.ansi.enabled=true`). + - `size`: This function returns null for null input under ANSI mode. + ### SQL Keywords When `spark.sql.ansi.enabled` is true, Spark SQL will use the ANSI mode parser. diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala index 6d95909..8b61bc4 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala @@ -77,9 +77,9 @@ trait BinaryArrayExpressionWithImplicitCast extends BinaryExpression @ExpressionDescription( usage = """ _FUNC_(expr) - Returns the size of an array or a map. -The function returns -1 if its input is null and spark.sql.legacy.sizeOfNull is set to true. -If spark.sql.legacy.sizeOfNull is set to false, the function returns null for null input. -By default, the spark.sql.legacy.sizeOfNull parameter is set to true. +The function returns null for null input if spark.sql.legacy.sizeOfNull is set to false or +spark.sql.ansi.enabled is set to true. Otherwise, the function returns -1 for null input. +With the default settings, the function returns -1 for null input. """, examples = """ Examples: diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala index e2d3d55..69383d4 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala @@ -3957,6 +3957,10 @@ object functions { /** * Returns length of array or map. * + * The function returns null for null input if spark.sql.legacy.sizeOfNull is set to false or + * spark.sql.ansi.enabled is set to true. Otherwise, the function returns -1 for null input. + * With the default settings, the function returns -1 for null input. + * * @group collection_funcs * @since 1.5.0 */ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-31171][SQL][FOLLOWUP] update document
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 8643e5d [SPARK-31171][SQL][FOLLOWUP] update document 8643e5d is described below commit 8643e5d9c50294f59b01988d99d447a38776178e Author: Wenchen Fan AuthorDate: Thu Mar 19 07:29:31 2020 +0900 [SPARK-31171][SQL][FOLLOWUP] update document ### What changes were proposed in this pull request? A followup of https://github.com/apache/spark/pull/27936 to update document. ### Why are the changes needed? correct document ### Does this PR introduce any user-facing change? no ### How was this patch tested? N/A Closes #27950 from cloud-fan/null. Authored-by: Wenchen Fan Signed-off-by: Takeshi Yamamuro --- docs/sql-ref-ansi-compliance.md| 7 ++- .../spark/sql/catalyst/expressions/collectionOperations.scala | 6 +++--- sql/core/src/main/scala/org/apache/spark/sql/functions.scala | 4 3 files changed, 13 insertions(+), 4 deletions(-) diff --git a/docs/sql-ref-ansi-compliance.md b/docs/sql-ref-ansi-compliance.md index 27e60b4..bc5bde6 100644 --- a/docs/sql-ref-ansi-compliance.md +++ b/docs/sql-ref-ansi-compliance.md @@ -21,7 +21,7 @@ license: | Since Spark 3.0, Spark SQL introduces two experimental options to comply with the SQL standard: `spark.sql.ansi.enabled` and `spark.sql.storeAssignmentPolicy` (See a table below for details). -When `spark.sql.ansi.enabled` is set to `true`, Spark SQL follows the standard in basic behaviours (e.g., arithmetic operations, type conversion, and SQL parsing). +When `spark.sql.ansi.enabled` is set to `true`, Spark SQL follows the standard in basic behaviours (e.g., arithmetic operations, type conversion, SQL functions and SQL parsing). Moreover, Spark SQL has an independent option to control implicit casting behaviours when inserting rows in a table. The casting behaviours are defined as store assignment rules in the standard. @@ -140,6 +140,11 @@ SELECT * FROM t; {% endhighlight %} +### SQL Functions + +The behavior of some SQL functions can be different under ANSI mode (`spark.sql.ansi.enabled=true`). + - `size`: This function returns null for null input under ANSI mode. + ### SQL Keywords When `spark.sql.ansi.enabled` is true, Spark SQL will use the ANSI mode parser. diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala index 6d95909..8b61bc4 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala @@ -77,9 +77,9 @@ trait BinaryArrayExpressionWithImplicitCast extends BinaryExpression @ExpressionDescription( usage = """ _FUNC_(expr) - Returns the size of an array or a map. -The function returns -1 if its input is null and spark.sql.legacy.sizeOfNull is set to true. -If spark.sql.legacy.sizeOfNull is set to false, the function returns null for null input. -By default, the spark.sql.legacy.sizeOfNull parameter is set to true. +The function returns null for null input if spark.sql.legacy.sizeOfNull is set to false or +spark.sql.ansi.enabled is set to true. Otherwise, the function returns -1 for null input. +With the default settings, the function returns -1 for null input. """, examples = """ Examples: diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala index 5603f20..6e189df 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala @@ -3980,6 +3980,10 @@ object functions { /** * Returns length of array or map. * + * The function returns null for null input if spark.sql.legacy.sizeOfNull is set to false or + * spark.sql.ansi.enabled is set to true. Otherwise, the function returns -1 for null input. + * With the default settings, the function returns -1 for null input. + * * @group collection_funcs * @since 1.5.0 */ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31150][SQL][FOLLOWUP] handle ' as escape for text
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new aba7a09 [SPARK-31150][SQL][FOLLOWUP] handle ' as escape for text aba7a09 is described below commit aba7a09da53425481893ce6d21281dc85874c619 Author: Kent Yao AuthorDate: Thu Mar 19 07:27:06 2020 +0900 [SPARK-31150][SQL][FOLLOWUP] handle ' as escape for text ### What changes were proposed in this pull request? pattern `''` means literal `'` ```sql select date_format(to_timestamp("1904-01-23 15:02:01", 'y-MM-dd HH:mm:ss'), "y-MM-dd HH:mm:ss''S"); 5377-02-14 06:27:19'00519 ``` https://github.com/apache/spark/commit/0946a9514f56565c78b0555383c1ece14aaf2b7b missed this case and this pr add it back. ### Why are the changes needed? bugfix ### Does this PR introduce any user-facing change? no ### How was this patch tested? add ut Closes #27949 from yaooqinn/SPARK-31150-2. Authored-by: Kent Yao Signed-off-by: Takeshi Yamamuro (cherry picked from commit 3d695954e53038f978bebcb3e798fa8728d1) Signed-off-by: Takeshi Yamamuro --- .../catalyst/util/DateTimeFormatterHelper.scala| 4 +++ .../test/resources/sql-tests/inputs/datetime.sql | 5 .../resources/sql-tests/results/datetime.sql.out | 34 +- 3 files changed, 42 insertions(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala index 72bae28..4ed618e 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala @@ -117,6 +117,10 @@ private object DateTimeFormatterHelper { pattern: String): DateTimeFormatterBuilder = { val builder = createBuilder() pattern.split("'").zipWithIndex.foreach { + // Split string starting with the regex itself which is `'` here will produce an extra empty + // string at res(0). So when the first element here is empty string we do not need append `'` + // literal to the DateTimeFormatterBuilder. + case ("", idx) if idx != 0 => builder.appendLiteral("'") case (pattenPart, idx) if idx % 2 == 0 => var rest = pattenPart while (rest.nonEmpty) { diff --git a/sql/core/src/test/resources/sql-tests/inputs/datetime.sql b/sql/core/src/test/resources/sql-tests/inputs/datetime.sql index a06cdfd..2c4ed64 100644 --- a/sql/core/src/test/resources/sql-tests/inputs/datetime.sql +++ b/sql/core/src/test/resources/sql-tests/inputs/datetime.sql @@ -107,3 +107,8 @@ select to_timestamp("S2019-10-06", "'S'-MM-dd"); select date_format(timestamp '2019-10-06', '-MM-dd uuee'); select date_format(timestamp '2019-10-06', '-MM-dd uucc'); select date_format(timestamp '2019-10-06', '-MM-dd '); + +select to_timestamp("2019-10-06T10:11:12'12", "-MM-dd'T'HH:mm:ss''"); -- middle +select to_timestamp("2019-10-06T10:11:12'", "-MM-dd'T'HH:mm:ss''"); -- tail +select to_timestamp("'2019-10-06T10:11:12", "''-MM-dd'T'HH:mm:ss"); -- head +select to_timestamp("P2019-10-06T10:11:12", "'P'-MM-dd'T'HH:mm:ss"); -- head but as single quote diff --git a/sql/core/src/test/resources/sql-tests/results/datetime.sql.out b/sql/core/src/test/resources/sql-tests/results/datetime.sql.out index 714412f..f440b5f 100755 --- a/sql/core/src/test/resources/sql-tests/results/datetime.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/datetime.sql.out @@ -1,5 +1,5 @@ -- Automatically generated by SQLQueryTestSuite --- Number of queries: 73 +-- Number of queries: 77 -- !query @@ -601,3 +601,35 @@ select date_format(timestamp '2019-10-06', '-MM-dd ') struct -- !query output 2019-10-06 Sunday + + +-- !query +select to_timestamp("2019-10-06T10:11:12'12", "-MM-dd'T'HH:mm:ss''") +-- !query schema +struct +-- !query output +2019-10-06 10:11:12.12 + + +-- !query +select to_timestamp("2019-10-06T10:11:12'", "-MM-dd'T'HH:mm:ss''") +-- !query schema +struct +-- !query output +2019-10-06 10:11:12 + + +-- !query +select to_timestamp("'2019-10-06T10:11:12", "''-MM-dd'T'HH:mm:ss") +-- !query schema +struct +-- !query output +2019-10-06 10:11:12 + + +-- !query +select to_timestamp("P2019-10-06T10:11:12", "'P'-MM-dd'T'HH:mm:ss") +-- !query schema +struct +-- !query output +2019-10-06 10:11:12 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-31150][SQL][FOLLOWUP] handle ' as escape for text
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3d69595 [SPARK-31150][SQL][FOLLOWUP] handle ' as escape for text 3d69595 is described below commit 3d695954e53038f978bebcb3e798fa8728d1 Author: Kent Yao AuthorDate: Thu Mar 19 07:27:06 2020 +0900 [SPARK-31150][SQL][FOLLOWUP] handle ' as escape for text ### What changes were proposed in this pull request? pattern `''` means literal `'` ```sql select date_format(to_timestamp("1904-01-23 15:02:01", 'y-MM-dd HH:mm:ss'), "y-MM-dd HH:mm:ss''S"); 5377-02-14 06:27:19'00519 ``` https://github.com/apache/spark/commit/0946a9514f56565c78b0555383c1ece14aaf2b7b missed this case and this pr add it back. ### Why are the changes needed? bugfix ### Does this PR introduce any user-facing change? no ### How was this patch tested? add ut Closes #27949 from yaooqinn/SPARK-31150-2. Authored-by: Kent Yao Signed-off-by: Takeshi Yamamuro --- .../catalyst/util/DateTimeFormatterHelper.scala| 4 +++ .../test/resources/sql-tests/inputs/datetime.sql | 5 .../resources/sql-tests/results/datetime.sql.out | 34 +- 3 files changed, 42 insertions(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala index 72bae28..4ed618e 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala @@ -117,6 +117,10 @@ private object DateTimeFormatterHelper { pattern: String): DateTimeFormatterBuilder = { val builder = createBuilder() pattern.split("'").zipWithIndex.foreach { + // Split string starting with the regex itself which is `'` here will produce an extra empty + // string at res(0). So when the first element here is empty string we do not need append `'` + // literal to the DateTimeFormatterBuilder. + case ("", idx) if idx != 0 => builder.appendLiteral("'") case (pattenPart, idx) if idx % 2 == 0 => var rest = pattenPart while (rest.nonEmpty) { diff --git a/sql/core/src/test/resources/sql-tests/inputs/datetime.sql b/sql/core/src/test/resources/sql-tests/inputs/datetime.sql index a06cdfd..2c4ed64 100644 --- a/sql/core/src/test/resources/sql-tests/inputs/datetime.sql +++ b/sql/core/src/test/resources/sql-tests/inputs/datetime.sql @@ -107,3 +107,8 @@ select to_timestamp("S2019-10-06", "'S'-MM-dd"); select date_format(timestamp '2019-10-06', '-MM-dd uuee'); select date_format(timestamp '2019-10-06', '-MM-dd uucc'); select date_format(timestamp '2019-10-06', '-MM-dd '); + +select to_timestamp("2019-10-06T10:11:12'12", "-MM-dd'T'HH:mm:ss''"); -- middle +select to_timestamp("2019-10-06T10:11:12'", "-MM-dd'T'HH:mm:ss''"); -- tail +select to_timestamp("'2019-10-06T10:11:12", "''-MM-dd'T'HH:mm:ss"); -- head +select to_timestamp("P2019-10-06T10:11:12", "'P'-MM-dd'T'HH:mm:ss"); -- head but as single quote diff --git a/sql/core/src/test/resources/sql-tests/results/datetime.sql.out b/sql/core/src/test/resources/sql-tests/results/datetime.sql.out index 714412f..f440b5f 100755 --- a/sql/core/src/test/resources/sql-tests/results/datetime.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/datetime.sql.out @@ -1,5 +1,5 @@ -- Automatically generated by SQLQueryTestSuite --- Number of queries: 73 +-- Number of queries: 77 -- !query @@ -601,3 +601,35 @@ select date_format(timestamp '2019-10-06', '-MM-dd ') struct -- !query output 2019-10-06 Sunday + + +-- !query +select to_timestamp("2019-10-06T10:11:12'12", "-MM-dd'T'HH:mm:ss''") +-- !query schema +struct +-- !query output +2019-10-06 10:11:12.12 + + +-- !query +select to_timestamp("2019-10-06T10:11:12'", "-MM-dd'T'HH:mm:ss''") +-- !query schema +struct +-- !query output +2019-10-06 10:11:12 + + +-- !query +select to_timestamp("'2019-10-06T10:11:12", "''-MM-dd'T'HH:mm:ss") +-- !query schema +struct +-- !query output +2019-10-06 10:11:12 + + +-- !query +select to_timestamp("P2019-10-06T10:11:12", "'P'-MM-dd'T'HH:mm:ss") +-- !query schema +struct +-- !query output +2019-10-06 10:11:12 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31151][SQL][DOC] Reorganize the migration guide of SQL
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new f83ef7d [SPARK-31151][SQL][DOC] Reorganize the migration guide of SQL f83ef7d is described below commit f83ef7d143aafbbdd1bb322567481f68db72195a Author: gatorsmile AuthorDate: Sun Mar 15 07:35:20 2020 +0900 [SPARK-31151][SQL][DOC] Reorganize the migration guide of SQL ### What changes were proposed in this pull request? The current migration guide of SQL is too long for most readers to find the needed info. This PR is to group the items in the migration guide of Spark SQL based on the corresponding components. Note. This PR does not change the contents of the migration guides. Attached figure is the screenshot after the change. ![screencapture-127-0-0-1-4000-sql-migration-guide-html-2020-03-14-12_00_40](https://user-images.githubusercontent.com/11567269/76688626-d3010200-65eb-11ea-9ce7-265bc90ebb2c.png) ### Why are the changes needed? The current migration guide of SQL is too long for most readers to find the needed info. ### Does this PR introduce any user-facing change? No ### How was this patch tested? N/A Closes #27909 from gatorsmile/migrationGuideReorg. Authored-by: gatorsmile Signed-off-by: Takeshi Yamamuro (cherry picked from commit 4d4c00c1b564b57d3016ce8c3bfcffaa6e58f012) Signed-off-by: Takeshi Yamamuro --- docs/sql-migration-guide.md | 287 +++- 1 file changed, 150 insertions(+), 137 deletions(-) diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index 19c744c..31d5c68 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -23,92 +23,119 @@ license: | {:toc} ## Upgrading from Spark SQL 2.4 to 3.0 - - Since Spark 3.0, when inserting a value into a table column with a different data type, the type coercion is performed as per ANSI SQL standard. Certain unreasonable type conversions such as converting `string` to `int` and `double` to `boolean` are disallowed. A runtime exception will be thrown if the value is out-of-range for the data type of the column. In Spark version 2.4 and earlier, type conversions during table insertion are allowed as long as they are valid `Cast`. When inse [...] - - In Spark 3.0, the deprecated methods `SQLContext.createExternalTable` and `SparkSession.createExternalTable` have been removed in favor of its replacement, `createTable`. - - - In Spark 3.0, the deprecated `HiveContext` class has been removed. Use `SparkSession.builder.enableHiveSupport()` instead. - - - Since Spark 3.0, configuration `spark.sql.crossJoin.enabled` become internal configuration, and is true by default, so by default spark won't raise exception on sql with implicit cross join. - - - In Spark version 2.4 and earlier, SQL queries such as `FROM ` or `FROM UNION ALL FROM ` are supported by accident. In hive-style `FROM SELECT `, the `SELECT` clause is not negligible. Neither Hive nor Presto support this syntax. Therefore we will treat these queries as invalid since Spark 3.0. +### Dataset/DataFrame APIs - Since Spark 3.0, the Dataset and DataFrame API `unionAll` is not deprecated any more. It is an alias for `union`. - - In Spark version 2.4 and earlier, the parser of JSON data source treats empty strings as null for some data types such as `IntegerType`. For `FloatType`, `DoubleType`, `DateType` and `TimestampType`, it fails on empty strings and throws exceptions. Since Spark 3.0, we disallow empty strings and will throw exceptions for data types except for `StringType` and `BinaryType`. The previous behaviour of allowing empty string can be restored by setting `spark.sql.legacy.json.allowEmptyStrin [...] - - - Since Spark 3.0, the `from_json` functions supports two modes - `PERMISSIVE` and `FAILFAST`. The modes can be set via the `mode` option. The default mode became `PERMISSIVE`. In previous versions, behavior of `from_json` did not conform to either `PERMISSIVE` nor `FAILFAST`, especially in processing of malformed JSON records. For example, the JSON string `{"a" 1}` with the schema `a INT` is converted to `null` by previous versions but Spark 3.0 converts it to `Row(null)`. - - - The `ADD JAR` command previously returned a result set with the single value 0. It now returns an empty result set. - - - In Spark version 2.4 and earlier, users can create map values with map type key via built-in function such as `CreateMap`, `MapFromArrays`, etc. Since Spark 3.0, it's not allowed to create map values with map type key with these built-in functions. Users can use `map_entries` function to convert map to array> as a workaround. In addition, users can still read map values with
[spark] branch branch-3.0 updated: [SPARK-31151][SQL][DOC] Reorganize the migration guide of SQL
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new f83ef7d [SPARK-31151][SQL][DOC] Reorganize the migration guide of SQL f83ef7d is described below commit f83ef7d143aafbbdd1bb322567481f68db72195a Author: gatorsmile AuthorDate: Sun Mar 15 07:35:20 2020 +0900 [SPARK-31151][SQL][DOC] Reorganize the migration guide of SQL ### What changes were proposed in this pull request? The current migration guide of SQL is too long for most readers to find the needed info. This PR is to group the items in the migration guide of Spark SQL based on the corresponding components. Note. This PR does not change the contents of the migration guides. Attached figure is the screenshot after the change. ![screencapture-127-0-0-1-4000-sql-migration-guide-html-2020-03-14-12_00_40](https://user-images.githubusercontent.com/11567269/76688626-d3010200-65eb-11ea-9ce7-265bc90ebb2c.png) ### Why are the changes needed? The current migration guide of SQL is too long for most readers to find the needed info. ### Does this PR introduce any user-facing change? No ### How was this patch tested? N/A Closes #27909 from gatorsmile/migrationGuideReorg. Authored-by: gatorsmile Signed-off-by: Takeshi Yamamuro (cherry picked from commit 4d4c00c1b564b57d3016ce8c3bfcffaa6e58f012) Signed-off-by: Takeshi Yamamuro --- docs/sql-migration-guide.md | 287 +++- 1 file changed, 150 insertions(+), 137 deletions(-) diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index 19c744c..31d5c68 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -23,92 +23,119 @@ license: | {:toc} ## Upgrading from Spark SQL 2.4 to 3.0 - - Since Spark 3.0, when inserting a value into a table column with a different data type, the type coercion is performed as per ANSI SQL standard. Certain unreasonable type conversions such as converting `string` to `int` and `double` to `boolean` are disallowed. A runtime exception will be thrown if the value is out-of-range for the data type of the column. In Spark version 2.4 and earlier, type conversions during table insertion are allowed as long as they are valid `Cast`. When inse [...] - - In Spark 3.0, the deprecated methods `SQLContext.createExternalTable` and `SparkSession.createExternalTable` have been removed in favor of its replacement, `createTable`. - - - In Spark 3.0, the deprecated `HiveContext` class has been removed. Use `SparkSession.builder.enableHiveSupport()` instead. - - - Since Spark 3.0, configuration `spark.sql.crossJoin.enabled` become internal configuration, and is true by default, so by default spark won't raise exception on sql with implicit cross join. - - - In Spark version 2.4 and earlier, SQL queries such as `FROM ` or `FROM UNION ALL FROM ` are supported by accident. In hive-style `FROM SELECT `, the `SELECT` clause is not negligible. Neither Hive nor Presto support this syntax. Therefore we will treat these queries as invalid since Spark 3.0. +### Dataset/DataFrame APIs - Since Spark 3.0, the Dataset and DataFrame API `unionAll` is not deprecated any more. It is an alias for `union`. - - In Spark version 2.4 and earlier, the parser of JSON data source treats empty strings as null for some data types such as `IntegerType`. For `FloatType`, `DoubleType`, `DateType` and `TimestampType`, it fails on empty strings and throws exceptions. Since Spark 3.0, we disallow empty strings and will throw exceptions for data types except for `StringType` and `BinaryType`. The previous behaviour of allowing empty string can be restored by setting `spark.sql.legacy.json.allowEmptyStrin [...] - - - Since Spark 3.0, the `from_json` functions supports two modes - `PERMISSIVE` and `FAILFAST`. The modes can be set via the `mode` option. The default mode became `PERMISSIVE`. In previous versions, behavior of `from_json` did not conform to either `PERMISSIVE` nor `FAILFAST`, especially in processing of malformed JSON records. For example, the JSON string `{"a" 1}` with the schema `a INT` is converted to `null` by previous versions but Spark 3.0 converts it to `Row(null)`. - - - The `ADD JAR` command previously returned a result set with the single value 0. It now returns an empty result set. - - - In Spark version 2.4 and earlier, users can create map values with map type key via built-in function such as `CreateMap`, `MapFromArrays`, etc. Since Spark 3.0, it's not allowed to create map values with map type key with these built-in functions. Users can use `map_entries` function to convert map to array> as a workaround. In addition, users can still read map values with
[spark] branch master updated (9628aca -> 4d4c00c)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9628aca [MINOR][DOCS] Fix [[...]] to `...` and ... in documentation add 4d4c00c [SPARK-31151][SQL][DOC] Reorganize the migration guide of SQL No new revisions were added by this update. Summary of changes: docs/sql-migration-guide.md | 287 +++- 1 file changed, 150 insertions(+), 137 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-31151][SQL][DOC] Reorganize the migration guide of SQL
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4d4c00c [SPARK-31151][SQL][DOC] Reorganize the migration guide of SQL 4d4c00c is described below commit 4d4c00c1b564b57d3016ce8c3bfcffaa6e58f012 Author: gatorsmile AuthorDate: Sun Mar 15 07:35:20 2020 +0900 [SPARK-31151][SQL][DOC] Reorganize the migration guide of SQL ### What changes were proposed in this pull request? The current migration guide of SQL is too long for most readers to find the needed info. This PR is to group the items in the migration guide of Spark SQL based on the corresponding components. Note. This PR does not change the contents of the migration guides. Attached figure is the screenshot after the change. ![screencapture-127-0-0-1-4000-sql-migration-guide-html-2020-03-14-12_00_40](https://user-images.githubusercontent.com/11567269/76688626-d3010200-65eb-11ea-9ce7-265bc90ebb2c.png) ### Why are the changes needed? The current migration guide of SQL is too long for most readers to find the needed info. ### Does this PR introduce any user-facing change? No ### How was this patch tested? N/A Closes #27909 from gatorsmile/migrationGuideReorg. Authored-by: gatorsmile Signed-off-by: Takeshi Yamamuro --- docs/sql-migration-guide.md | 287 +++- 1 file changed, 150 insertions(+), 137 deletions(-) diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index 7cca43e..d6b663d 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -26,92 +26,119 @@ license: | - Since Spark 3.1, grouping_id() returns long values. In Spark version 3.0 and earlier, this function returns int values. To restore the behavior before Spark 3.0, you can set `spark.sql.legacy.integerGroupingId` to `true`. ## Upgrading from Spark SQL 2.4 to 3.0 - - Since Spark 3.0, when inserting a value into a table column with a different data type, the type coercion is performed as per ANSI SQL standard. Certain unreasonable type conversions such as converting `string` to `int` and `double` to `boolean` are disallowed. A runtime exception will be thrown if the value is out-of-range for the data type of the column. In Spark version 2.4 and earlier, type conversions during table insertion are allowed as long as they are valid `Cast`. When inse [...] - - In Spark 3.0, the deprecated methods `SQLContext.createExternalTable` and `SparkSession.createExternalTable` have been removed in favor of its replacement, `createTable`. - - - In Spark 3.0, the deprecated `HiveContext` class has been removed. Use `SparkSession.builder.enableHiveSupport()` instead. - - - Since Spark 3.0, configuration `spark.sql.crossJoin.enabled` become internal configuration, and is true by default, so by default spark won't raise exception on sql with implicit cross join. - - - In Spark version 2.4 and earlier, SQL queries such as `FROM ` or `FROM UNION ALL FROM ` are supported by accident. In hive-style `FROM SELECT `, the `SELECT` clause is not negligible. Neither Hive nor Presto support this syntax. Therefore we will treat these queries as invalid since Spark 3.0. +### Dataset/DataFrame APIs - Since Spark 3.0, the Dataset and DataFrame API `unionAll` is not deprecated any more. It is an alias for `union`. - - In Spark version 2.4 and earlier, the parser of JSON data source treats empty strings as null for some data types such as `IntegerType`. For `FloatType`, `DoubleType`, `DateType` and `TimestampType`, it fails on empty strings and throws exceptions. Since Spark 3.0, we disallow empty strings and will throw exceptions for data types except for `StringType` and `BinaryType`. The previous behaviour of allowing empty string can be restored by setting `spark.sql.legacy.json.allowEmptyStrin [...] - - - Since Spark 3.0, the `from_json` functions supports two modes - `PERMISSIVE` and `FAILFAST`. The modes can be set via the `mode` option. The default mode became `PERMISSIVE`. In previous versions, behavior of `from_json` did not conform to either `PERMISSIVE` nor `FAILFAST`, especially in processing of malformed JSON records. For example, the JSON string `{"a" 1}` with the schema `a INT` is converted to `null` by previous versions but Spark 3.0 converts it to `Row(null)`. - - - The `ADD JAR` command previously returned a result set with the single value 0. It now returns an empty result set. - - - In Spark version 2.4 and earlier, users can create map values with map type key via built-in function such as `CreateMap`, `MapFromArrays`, etc. Since Spark 3.0, it's not allowed to create map values with map type key with these built-in functions. Users can use `map_entries
[spark] branch branch-3.0 updated: [SPARK-31151][SQL][DOC] Reorganize the migration guide of SQL
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new f83ef7d [SPARK-31151][SQL][DOC] Reorganize the migration guide of SQL f83ef7d is described below commit f83ef7d143aafbbdd1bb322567481f68db72195a Author: gatorsmile AuthorDate: Sun Mar 15 07:35:20 2020 +0900 [SPARK-31151][SQL][DOC] Reorganize the migration guide of SQL ### What changes were proposed in this pull request? The current migration guide of SQL is too long for most readers to find the needed info. This PR is to group the items in the migration guide of Spark SQL based on the corresponding components. Note. This PR does not change the contents of the migration guides. Attached figure is the screenshot after the change. ![screencapture-127-0-0-1-4000-sql-migration-guide-html-2020-03-14-12_00_40](https://user-images.githubusercontent.com/11567269/76688626-d3010200-65eb-11ea-9ce7-265bc90ebb2c.png) ### Why are the changes needed? The current migration guide of SQL is too long for most readers to find the needed info. ### Does this PR introduce any user-facing change? No ### How was this patch tested? N/A Closes #27909 from gatorsmile/migrationGuideReorg. Authored-by: gatorsmile Signed-off-by: Takeshi Yamamuro (cherry picked from commit 4d4c00c1b564b57d3016ce8c3bfcffaa6e58f012) Signed-off-by: Takeshi Yamamuro --- docs/sql-migration-guide.md | 287 +++- 1 file changed, 150 insertions(+), 137 deletions(-) diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index 19c744c..31d5c68 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -23,92 +23,119 @@ license: | {:toc} ## Upgrading from Spark SQL 2.4 to 3.0 - - Since Spark 3.0, when inserting a value into a table column with a different data type, the type coercion is performed as per ANSI SQL standard. Certain unreasonable type conversions such as converting `string` to `int` and `double` to `boolean` are disallowed. A runtime exception will be thrown if the value is out-of-range for the data type of the column. In Spark version 2.4 and earlier, type conversions during table insertion are allowed as long as they are valid `Cast`. When inse [...] - - In Spark 3.0, the deprecated methods `SQLContext.createExternalTable` and `SparkSession.createExternalTable` have been removed in favor of its replacement, `createTable`. - - - In Spark 3.0, the deprecated `HiveContext` class has been removed. Use `SparkSession.builder.enableHiveSupport()` instead. - - - Since Spark 3.0, configuration `spark.sql.crossJoin.enabled` become internal configuration, and is true by default, so by default spark won't raise exception on sql with implicit cross join. - - - In Spark version 2.4 and earlier, SQL queries such as `FROM ` or `FROM UNION ALL FROM ` are supported by accident. In hive-style `FROM SELECT `, the `SELECT` clause is not negligible. Neither Hive nor Presto support this syntax. Therefore we will treat these queries as invalid since Spark 3.0. +### Dataset/DataFrame APIs - Since Spark 3.0, the Dataset and DataFrame API `unionAll` is not deprecated any more. It is an alias for `union`. - - In Spark version 2.4 and earlier, the parser of JSON data source treats empty strings as null for some data types such as `IntegerType`. For `FloatType`, `DoubleType`, `DateType` and `TimestampType`, it fails on empty strings and throws exceptions. Since Spark 3.0, we disallow empty strings and will throw exceptions for data types except for `StringType` and `BinaryType`. The previous behaviour of allowing empty string can be restored by setting `spark.sql.legacy.json.allowEmptyStrin [...] - - - Since Spark 3.0, the `from_json` functions supports two modes - `PERMISSIVE` and `FAILFAST`. The modes can be set via the `mode` option. The default mode became `PERMISSIVE`. In previous versions, behavior of `from_json` did not conform to either `PERMISSIVE` nor `FAILFAST`, especially in processing of malformed JSON records. For example, the JSON string `{"a" 1}` with the schema `a INT` is converted to `null` by previous versions but Spark 3.0 converts it to `Row(null)`. - - - The `ADD JAR` command previously returned a result set with the single value 0. It now returns an empty result set. - - - In Spark version 2.4 and earlier, users can create map values with map type key via built-in function such as `CreateMap`, `MapFromArrays`, etc. Since Spark 3.0, it's not allowed to create map values with map type key with these built-in functions. Users can use `map_entries` function to convert map to array> as a workaround. In addition, users can still read map values with
[spark] branch branch-3.0 updated: [SPARK-30962][SQL][DOC] Documentation for Alter table command phase 2
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new b8e2cb3 [SPARK-30962][SQL][DOC] Documentation for Alter table command phase 2 b8e2cb3 is described below commit b8e2cb32cbc75601d6d7a841362676cf2f273bda Author: Qianyang Yu AuthorDate: Wed Mar 11 08:47:30 2020 +0900 [SPARK-30962][SQL][DOC] Documentation for Alter table command phase 2 ### What changes were proposed in this pull request? ### Why are the changes needed? Based on [JIRA 30962](https://issues.apache.org/jira/browse/SPARK-30962), we want to add all the support `Alter Table` syntax for V1 table. ### Does this PR introduce any user-facing change? Yes ### How was this patch tested? Before: The documentation looks like [Alter Table](https://github.com/apache/spark/pull/25590) After: https://user-images.githubusercontent.com/7550280/75824837-168c7e00-5d59-11ea-9751-d1dab0f5a892.png;> https://user-images.githubusercontent.com/7550280/75824859-21dfa980-5d59-11ea-8b49-3adf6eb55fc6.png;> https://user-images.githubusercontent.com/7550280/75824884-2e640200-5d59-11ea-81ef-d77d0a8efee2.png;> https://user-images.githubusercontent.com/7550280/75824910-39b72d80-5d59-11ea-84d0-bffa2499f086.png;> https://user-images.githubusercontent.com/7550280/75824937-45a2ef80-5d59-11ea-932c-314924856834.png;> https://user-images.githubusercontent.com/7550280/75824965-4cc9fd80-5d59-11ea-815b-8c1ebad310b1.png;> https://user-images.githubusercontent.com/7550280/75824978-518eb180-5d59-11ea-8a55-2fa26376b9c1.png;> https://user-images.githubusercontent.com/7550280/75825001-5bb0b000-5d59-11ea-8dd9-dcfbfa1b4330.png;> Notes: Those syntaxes are not supported by v1 Table. - `ALTER TABLE .. RENAME COLUMN` - `ALTER TABLE ... DROP (COLUMN | COLUMNS)` - `ALTER TABLE ... (ALTER | CHANGE) COLUMN? alterColumnAction` only support change comments, not other actions: `datatype, position, (SET | DROP) NOT NULL` - `ALTER TABLE .. CHANGE COLUMN?` - `ALTER TABLE REPLACE COLUMNS` - `ALTER TABLE ... RECOVER PARTITIONS` - Closes #27779 from kevinyu98/spark-30962-alterT. Authored-by: Qianyang Yu Signed-off-by: Takeshi Yamamuro (cherry picked from commit 0f54dc7c03ed975ecb7f776a0151b9325d21e85c) Signed-off-by: Takeshi Yamamuro --- docs/sql-ref-syntax-ddl-alter-table.md | 213 - 1 file changed, 210 insertions(+), 3 deletions(-) diff --git a/docs/sql-ref-syntax-ddl-alter-table.md b/docs/sql-ref-syntax-ddl-alter-table.md index 373fa8d..2dd808b 100644 --- a/docs/sql-ref-syntax-ddl-alter-table.md +++ b/docs/sql-ref-syntax-ddl-alter-table.md @@ -23,14 +23,13 @@ license: | `ALTER TABLE` statement changes the schema or properties of a table. ### RENAME -`ALTER TABLE RENAME` statement changes the table name of an existing table in the database. +`ALTER TABLE RENAME TO` statement changes the table name of an existing table in the database. Syntax {% highlight sql %} ALTER TABLE table_identifier RENAME TO table_identifier ALTER TABLE table_identifier partition_spec RENAME TO partition_spec - {% endhighlight %} Parameters @@ -83,6 +82,109 @@ ALTER TABLE table_identifier ADD COLUMNS ( col_spec [ , col_spec ... ] ) +### ALTER OR CHANGE COLUMN +`ALTER TABLE ALTER COLUMN` or `ALTER TABLE CHANGE COLUMN` statement changes column's comment. + + Syntax +{% highlight sql %} +ALTER TABLE table_identifier { ALTER | CHANGE } [ COLUMN ] col_spec alterColumnAction +{% endhighlight %} + + Parameters + + table_identifier + +Specifies a table name, which may be optionally qualified with a database name. +Syntax: + +[ database_name. ] table_name + + + + + + COLUMN col_spec + Specifies the column to be altered or be changed. + + + + alterColumnAction + + Change the comment string. + Syntax: + +COMMENT STRING + + + + + +### ADD AND DROP PARTITION + + ADD PARTITION +`ALTER TABLE ADD` statement adds partition to the partitioned table. + +# Syntax +{% highlight sql %} +ALTER TABLE table_identifier ADD [IF NOT EXISTS] +( partition_spec [ partition_spec ... ] ) +{% endhighlight %} + +# Parameters + + table_identifier + +Specifies a table name, which may be optionally qualified with a database name. +Syntax: + +[ database_name. ] table_name + + + + + + partition_spec + +Partition to be added. +Syntax: + +PARTITION ( partition_col_name = partition_col_val [ , ... ] ) + + + + + DROP PARTITION +`ALTER TABLE DROP` statement drops th
[spark] branch master updated: [SPARK-30962][SQL][DOC] Documentation for Alter table command phase 2
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 0f54dc7 [SPARK-30962][SQL][DOC] Documentation for Alter table command phase 2 0f54dc7 is described below commit 0f54dc7c03ed975ecb7f776a0151b9325d21e85c Author: Qianyang Yu AuthorDate: Wed Mar 11 08:47:30 2020 +0900 [SPARK-30962][SQL][DOC] Documentation for Alter table command phase 2 ### What changes were proposed in this pull request? ### Why are the changes needed? Based on [JIRA 30962](https://issues.apache.org/jira/browse/SPARK-30962), we want to add all the support `Alter Table` syntax for V1 table. ### Does this PR introduce any user-facing change? Yes ### How was this patch tested? Before: The documentation looks like [Alter Table](https://github.com/apache/spark/pull/25590) After: https://user-images.githubusercontent.com/7550280/75824837-168c7e00-5d59-11ea-9751-d1dab0f5a892.png;> https://user-images.githubusercontent.com/7550280/75824859-21dfa980-5d59-11ea-8b49-3adf6eb55fc6.png;> https://user-images.githubusercontent.com/7550280/75824884-2e640200-5d59-11ea-81ef-d77d0a8efee2.png;> https://user-images.githubusercontent.com/7550280/75824910-39b72d80-5d59-11ea-84d0-bffa2499f086.png;> https://user-images.githubusercontent.com/7550280/75824937-45a2ef80-5d59-11ea-932c-314924856834.png;> https://user-images.githubusercontent.com/7550280/75824965-4cc9fd80-5d59-11ea-815b-8c1ebad310b1.png;> https://user-images.githubusercontent.com/7550280/75824978-518eb180-5d59-11ea-8a55-2fa26376b9c1.png;> https://user-images.githubusercontent.com/7550280/75825001-5bb0b000-5d59-11ea-8dd9-dcfbfa1b4330.png;> Notes: Those syntaxes are not supported by v1 Table. - `ALTER TABLE .. RENAME COLUMN` - `ALTER TABLE ... DROP (COLUMN | COLUMNS)` - `ALTER TABLE ... (ALTER | CHANGE) COLUMN? alterColumnAction` only support change comments, not other actions: `datatype, position, (SET | DROP) NOT NULL` - `ALTER TABLE .. CHANGE COLUMN?` - `ALTER TABLE REPLACE COLUMNS` - `ALTER TABLE ... RECOVER PARTITIONS` - Closes #27779 from kevinyu98/spark-30962-alterT. Authored-by: Qianyang Yu Signed-off-by: Takeshi Yamamuro --- docs/sql-ref-syntax-ddl-alter-table.md | 213 - 1 file changed, 210 insertions(+), 3 deletions(-) diff --git a/docs/sql-ref-syntax-ddl-alter-table.md b/docs/sql-ref-syntax-ddl-alter-table.md index 373fa8d..2dd808b 100644 --- a/docs/sql-ref-syntax-ddl-alter-table.md +++ b/docs/sql-ref-syntax-ddl-alter-table.md @@ -23,14 +23,13 @@ license: | `ALTER TABLE` statement changes the schema or properties of a table. ### RENAME -`ALTER TABLE RENAME` statement changes the table name of an existing table in the database. +`ALTER TABLE RENAME TO` statement changes the table name of an existing table in the database. Syntax {% highlight sql %} ALTER TABLE table_identifier RENAME TO table_identifier ALTER TABLE table_identifier partition_spec RENAME TO partition_spec - {% endhighlight %} Parameters @@ -83,6 +82,109 @@ ALTER TABLE table_identifier ADD COLUMNS ( col_spec [ , col_spec ... ] ) +### ALTER OR CHANGE COLUMN +`ALTER TABLE ALTER COLUMN` or `ALTER TABLE CHANGE COLUMN` statement changes column's comment. + + Syntax +{% highlight sql %} +ALTER TABLE table_identifier { ALTER | CHANGE } [ COLUMN ] col_spec alterColumnAction +{% endhighlight %} + + Parameters + + table_identifier + +Specifies a table name, which may be optionally qualified with a database name. +Syntax: + +[ database_name. ] table_name + + + + + + COLUMN col_spec + Specifies the column to be altered or be changed. + + + + alterColumnAction + + Change the comment string. + Syntax: + +COMMENT STRING + + + + + +### ADD AND DROP PARTITION + + ADD PARTITION +`ALTER TABLE ADD` statement adds partition to the partitioned table. + +# Syntax +{% highlight sql %} +ALTER TABLE table_identifier ADD [IF NOT EXISTS] +( partition_spec [ partition_spec ... ] ) +{% endhighlight %} + +# Parameters + + table_identifier + +Specifies a table name, which may be optionally qualified with a database name. +Syntax: + +[ database_name. ] table_name + + + + + + partition_spec + +Partition to be added. +Syntax: + +PARTITION ( partition_col_name = partition_col_val [ , ... ] ) + + + + + DROP PARTITION +`ALTER TABLE DROP` statement drops the partition of the table. + +# Syntax +{% highlight sql %} +ALTER TABLE table_identifier DROP [ IF EXISTS ] partition_spec
[spark] branch master updated (2e3adad -> 71c73d5)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2e3adad [SPARK-31061][SQL] Provide ability to alter the provider of a table add 71c73d5 [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID No new revisions were added by this update. Summary of changes: docs/sql-migration-guide.md| 3 ++ .../spark/sql/catalyst/analysis/Analyzer.scala | 10 ++-- .../spark/sql/catalyst/expressions/grouping.scala | 28 +++ .../plans/logical/basicLogicalOperators.scala | 13 -- .../org/apache/spark/sql/internal/SQLConf.scala| 9 .../analysis/ResolveGroupingAnalyticsSuite.scala | 54 -- .../sql-tests/results/group-analytics.sql.out | 8 ++-- .../sql-tests/results/grouping_set.sql.out | 4 +- .../results/postgreSQL/groupingsets.sql.out| 2 +- .../results/udf/udf-group-analytics.sql.out| 8 ++-- .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 31 + 11 files changed, 117 insertions(+), 53 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-30998][SQL][2.4] ClassCastException when a generator having nested inner generators
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new f4c8c48 [SPARK-30998][SQL][2.4] ClassCastException when a generator having nested inner generators f4c8c48 is described below commit f4c8c4892197b8c5425a8013a09e9b379444e6fc Author: Takeshi Yamamuro AuthorDate: Tue Mar 3 23:47:40 2020 +0900 [SPARK-30998][SQL][2.4] ClassCastException when a generator having nested inner generators ### What changes were proposed in this pull request? A query below failed in branch-2.4; ``` scala> sql("select array(array(1, 2), array(3)) ar").select(explode(explode($"ar"))).show() 20/03/01 13:51:56 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)/ 1] java.lang.ClassCastException: scala.collection.mutable.ArrayOps$ofRef cannot be cast to org.apache.spark.sql.catalyst.util.ArrayData at org.apache.spark.sql.catalyst.expressions.ExplodeBase.eval(generators.scala:313) at org.apache.spark.sql.execution.GenerateExec.$anonfun$doExecute$8(GenerateExec.scala:108) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490) at scala.collection.Iterator$ConcatIterator.hasNext(Iterator.scala:222) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) ... ``` This pr modified the `hasNestedGenerator` code in `ExtractGenerator` for correctly catching nested inner generators. This backport PR comes from https://github.com/apache/spark/pull/27750# ### Why are the changes needed? A bug fix. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Added tests. Closes #27769 from maropu/SPARK-20998-BRANCH-2.4. Authored-by: Takeshi Yamamuro Signed-off-by: Takeshi Yamamuro --- .../apache/spark/sql/catalyst/analysis/Analyzer.scala | 16 +--- .../sql/catalyst/analysis/AnalysisErrorSuite.scala| 19 +++ .../org/apache/spark/sql/GeneratorFunctionSuite.scala | 8 3 files changed, 40 insertions(+), 3 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index 0fedf7f..61f77be 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -1681,10 +1681,20 @@ class Analyzer( } private def hasNestedGenerator(expr: NamedExpression): Boolean = { + def hasInnerGenerator(g: Generator): Boolean = g match { +// Since `GeneratorOuter` is just a wrapper of generators, we skip it here +case go: GeneratorOuter => + hasInnerGenerator(go.child) +case _ => + g.children.exists { _.find { +case _: Generator => true +case _ => false + }.isDefined } + } CleanupAliases.trimNonTopLevelAliases(expr) match { -case UnresolvedAlias(_: Generator, _) => false -case Alias(_: Generator, _) => false -case MultiAlias(_: Generator, _) => false +case UnresolvedAlias(g: Generator, _) => hasInnerGenerator(g) +case Alias(g: Generator, _) => hasInnerGenerator(g) +case MultiAlias(g: Generator, _) => hasInnerGenerator(g) case other => hasGenerator(other) } } diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala index 45319aa..337902f 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala @@ -395,6 +395,25 @@ class AnalysisErrorSuite extends AnalysisTest { ) errorTest( +"SPARK-30998: unsupported nested inner generators", +{ + val nestedListRelation = LocalRelation( +AttributeReference("nestedList", ArrayType(ArrayType(IntegerType)))()) + nestedListRelation.select(Explode(Explode($"nestedList"))) +}, +"Generators are not supported when it's nested in expressions, but got: " + + "explode(explode(nestedList))" :: Nil + ) + + errorTest( +"SPARK-30998: unsupported nested inner generators for aggregates", +testRelation.select(Explode(Explode( + CreateArray(CreateArray(min($"a") :: max($"a") :: Nil) :: N
[spark] branch branch-2.4 updated: [SPARK-30998][SQL][2.4] ClassCastException when a generator having nested inner generators
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new f4c8c48 [SPARK-30998][SQL][2.4] ClassCastException when a generator having nested inner generators f4c8c48 is described below commit f4c8c4892197b8c5425a8013a09e9b379444e6fc Author: Takeshi Yamamuro AuthorDate: Tue Mar 3 23:47:40 2020 +0900 [SPARK-30998][SQL][2.4] ClassCastException when a generator having nested inner generators ### What changes were proposed in this pull request? A query below failed in branch-2.4; ``` scala> sql("select array(array(1, 2), array(3)) ar").select(explode(explode($"ar"))).show() 20/03/01 13:51:56 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)/ 1] java.lang.ClassCastException: scala.collection.mutable.ArrayOps$ofRef cannot be cast to org.apache.spark.sql.catalyst.util.ArrayData at org.apache.spark.sql.catalyst.expressions.ExplodeBase.eval(generators.scala:313) at org.apache.spark.sql.execution.GenerateExec.$anonfun$doExecute$8(GenerateExec.scala:108) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490) at scala.collection.Iterator$ConcatIterator.hasNext(Iterator.scala:222) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) ... ``` This pr modified the `hasNestedGenerator` code in `ExtractGenerator` for correctly catching nested inner generators. This backport PR comes from https://github.com/apache/spark/pull/27750# ### Why are the changes needed? A bug fix. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Added tests. Closes #27769 from maropu/SPARK-20998-BRANCH-2.4. Authored-by: Takeshi Yamamuro Signed-off-by: Takeshi Yamamuro --- .../apache/spark/sql/catalyst/analysis/Analyzer.scala | 16 +--- .../sql/catalyst/analysis/AnalysisErrorSuite.scala| 19 +++ .../org/apache/spark/sql/GeneratorFunctionSuite.scala | 8 3 files changed, 40 insertions(+), 3 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index 0fedf7f..61f77be 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -1681,10 +1681,20 @@ class Analyzer( } private def hasNestedGenerator(expr: NamedExpression): Boolean = { + def hasInnerGenerator(g: Generator): Boolean = g match { +// Since `GeneratorOuter` is just a wrapper of generators, we skip it here +case go: GeneratorOuter => + hasInnerGenerator(go.child) +case _ => + g.children.exists { _.find { +case _: Generator => true +case _ => false + }.isDefined } + } CleanupAliases.trimNonTopLevelAliases(expr) match { -case UnresolvedAlias(_: Generator, _) => false -case Alias(_: Generator, _) => false -case MultiAlias(_: Generator, _) => false +case UnresolvedAlias(g: Generator, _) => hasInnerGenerator(g) +case Alias(g: Generator, _) => hasInnerGenerator(g) +case MultiAlias(g: Generator, _) => hasInnerGenerator(g) case other => hasGenerator(other) } } diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala index 45319aa..337902f 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala @@ -395,6 +395,25 @@ class AnalysisErrorSuite extends AnalysisTest { ) errorTest( +"SPARK-30998: unsupported nested inner generators", +{ + val nestedListRelation = LocalRelation( +AttributeReference("nestedList", ArrayType(ArrayType(IntegerType)))()) + nestedListRelation.select(Explode(Explode($"nestedList"))) +}, +"Generators are not supported when it's nested in expressions, but got: " + + "explode(explode(nestedList))" :: Nil + ) + + errorTest( +"SPARK-30998: unsupported nested inner generators for aggregates", +testRelation.select(Explode(Explode( + CreateArray(CreateArray(min($"a") :: max($"a") :: Nil) :: N
[spark] branch branch-3.0 updated: [SPARK-30998][SQL] ClassCastException when a generator having nested inner generators
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new ded0a72 [SPARK-30998][SQL] ClassCastException when a generator having nested inner generators ded0a72 is described below commit ded0a72d81c1d34753be8a156126312506fb50b1 Author: Takeshi Yamamuro AuthorDate: Tue Mar 3 19:00:33 2020 +0900 [SPARK-30998][SQL] ClassCastException when a generator having nested inner generators ### What changes were proposed in this pull request? A query below failed in the master; ``` scala> sql("select array(array(1, 2), array(3)) ar").select(explode(explode($"ar"))).show() 20/03/01 13:51:56 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)/ 1] java.lang.ClassCastException: scala.collection.mutable.ArrayOps$ofRef cannot be cast to org.apache.spark.sql.catalyst.util.ArrayData at org.apache.spark.sql.catalyst.expressions.ExplodeBase.eval(generators.scala:313) at org.apache.spark.sql.execution.GenerateExec.$anonfun$doExecute$8(GenerateExec.scala:108) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490) at scala.collection.Iterator$ConcatIterator.hasNext(Iterator.scala:222) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) ... ``` This pr modified the `hasNestedGenerator` code in `ExtractGenerator` for correctly catching nested inner generators. ### Why are the changes needed? A bug fix. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Added tests. Closes #27750 from maropu/HandleNestedGenerators. Authored-by: Takeshi Yamamuro Signed-off-by: Takeshi Yamamuro (cherry picked from commit 313e62c376acab30e546df253b28452a664d3e73) Signed-off-by: Takeshi Yamamuro --- .../apache/spark/sql/catalyst/analysis/Analyzer.scala | 16 +--- .../sql/catalyst/analysis/AnalysisErrorSuite.scala| 19 +++ .../org/apache/spark/sql/GeneratorFunctionSuite.scala | 8 3 files changed, 40 insertions(+), 3 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index 3d79799..486b952 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -2164,10 +2164,20 @@ class Analyzer( } private def hasNestedGenerator(expr: NamedExpression): Boolean = { + def hasInnerGenerator(g: Generator): Boolean = g match { +// Since `GeneratorOuter` is just a wrapper of generators, we skip it here +case go: GeneratorOuter => + hasInnerGenerator(go.child) +case _ => + g.children.exists { _.find { +case _: Generator => true +case _ => false + }.isDefined } + } CleanupAliases.trimNonTopLevelAliases(expr) match { -case UnresolvedAlias(_: Generator, _) => false -case Alias(_: Generator, _) => false -case MultiAlias(_: Generator, _) => false +case UnresolvedAlias(g: Generator, _) => hasInnerGenerator(g) +case Alias(g: Generator, _) => hasInnerGenerator(g) +case MultiAlias(g: Generator, _) => hasInnerGenerator(g) case other => hasGenerator(other) } } diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala index 8f62b0b..3db1053 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala @@ -434,6 +434,25 @@ class AnalysisErrorSuite extends AnalysisTest { ) errorTest( +"SPARK-30998: unsupported nested inner generators", +{ + val nestedListRelation = LocalRelation( +AttributeReference("nestedList", ArrayType(ArrayType(IntegerType)))()) + nestedListRelation.select(Explode(Explode($"nestedList"))) +}, +"Generators are not supported when it's nested in expressions, but got: " + + "explode(explode(nestedList))" :: Nil + ) + + errorTest( +"SPARK-30998: unsupported nested inner generators for aggregates", +testRelation.select(Explode(Explode( + CreateArray(CreateArray(min($"a") :: max($"a&q
[spark] branch master updated (1fac06c -> 313e62c)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 1fac06c Revert "[SPARK-30808][SQL] Enable Java 8 time API in Thrift server" add 313e62c [SPARK-30998][SQL] ClassCastException when a generator having nested inner generators No new revisions were added by this update. Summary of changes: .../apache/spark/sql/catalyst/analysis/Analyzer.scala | 16 +--- .../sql/catalyst/analysis/AnalysisErrorSuite.scala| 19 +++ .../org/apache/spark/sql/GeneratorFunctionSuite.scala | 8 3 files changed, 40 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-30956][SQL][TESTS] Use intercept instead of try-catch to assert failures in IntervalUtilsSuite
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 933e576 [SPARK-30956][SQL][TESTS] Use intercept instead of try-catch to assert failures in IntervalUtilsSuite 933e576 is described below commit 933e576aab0a40e53f275ae960fc45b7ed2d6f06 Author: Kent Yao AuthorDate: Thu Feb 27 23:12:35 2020 +0900 [SPARK-30956][SQL][TESTS] Use intercept instead of try-catch to assert failures in IntervalUtilsSuite ### What changes were proposed in this pull request? In this PR, I addressed the comment from https://github.com/apache/spark/pull/27672#discussion_r383719562 to use `intercept` instead of `try-catch` block to assert failures in the IntervalUtilsSuite ### Why are the changes needed? improve tests ### Does this PR introduce any user-facing change? no ### How was this patch tested? Nah Closes #27700 from yaooqinn/intervaltest. Authored-by: Kent Yao Signed-off-by: Takeshi Yamamuro (cherry picked from commit 2d2706cb86ddccd2fc60378b0f47a437ec354017) Signed-off-by: Takeshi Yamamuro --- .../sql/catalyst/util/IntervalUtilsSuite.scala | 119 + 1 file changed, 26 insertions(+), 93 deletions(-) diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/IntervalUtilsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/IntervalUtilsSuite.scala index e7c3163..1628a61 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/IntervalUtilsSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/IntervalUtilsSuite.scala @@ -35,27 +35,17 @@ class IntervalUtilsSuite extends SparkFunSuite with SQLHelper { assert(safeStringToInterval(UTF8String.fromString(input)) === expected) } - private def checkFromStringWithFunc( - input: String, - months: Int, - days: Int, - us: Long, - func: CalendarInterval => CalendarInterval): Unit = { -val expected = new CalendarInterval(months, days, us) -assert(func(stringToInterval(UTF8String.fromString(input))) === expected) -assert(func(safeStringToInterval(UTF8String.fromString(input))) === expected) + private def checkFromInvalidString(input: String, errorMsg: String): Unit = { +failFuncWithInvalidInput(input, errorMsg, s => stringToInterval(UTF8String.fromString(s))) +assert(safeStringToInterval(UTF8String.fromString(input)) === null) } - private def checkFromInvalidString(input: String, errorMsg: String): Unit = { -try { - stringToInterval(UTF8String.fromString(input)) - fail("Expected to throw an exception for the invalid input") -} catch { - case e: IllegalArgumentException => -val msg = e.getMessage -assert(msg.contains(errorMsg)) + private def failFuncWithInvalidInput( + input: String, errorMsg: String, converter: String => CalendarInterval): Unit = { +withClue("Expected to throw an exception for the invalid input") { + val e = intercept[IllegalArgumentException](converter(input)) + assert(e.getMessage.contains(errorMsg)) } -assert(safeStringToInterval(UTF8String.fromString(input)) === null) } private def testSingleUnit( @@ -87,7 +77,6 @@ class IntervalUtilsSuite extends SparkFunSuite with SQLHelper { } } - test("string to interval: multiple units") { Seq( "-1 MONTH 1 day -1 microseconds" -> new CalendarInterval(-1, 1, -1), @@ -145,22 +134,9 @@ class IntervalUtilsSuite extends SparkFunSuite with SQLHelper { assert(fromYearMonthString("99-10") === new CalendarInterval(99 * 12 + 10, 0, 0L)) assert(fromYearMonthString("+99-10") === new CalendarInterval(99 * 12 + 10, 0, 0L)) assert(fromYearMonthString("-8-10") === new CalendarInterval(-8 * 12 - 10, 0, 0L)) - -try { - fromYearMonthString("99-15") - fail("Expected to throw an exception for the invalid input") -} catch { - case e: IllegalArgumentException => -assert(e.getMessage.contains("month 15 outside range")) -} - -try { - fromYearMonthString("9a9-15") - fail("Expected to throw an exception for the invalid input") -} catch { - case e: IllegalArgumentException => -assert(e.getMessage.contains("Interval string does not match year-month format")) -} +failFuncWithInvalidInput("99-15", "month 15 outside range", fromYearMonthString) +failFuncWithInvalidInput("9a9-15", "Interval string does not match year-month format", + fromYearMonthString) } tes
[spark] branch master updated (22dfd15 -> 2d2706c)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 22dfd15 [SPARK-30937][DOC] Group Hive upgrade guides together add 2d2706c [SPARK-30956][SQL][TESTS] Use intercept instead of try-catch to assert failures in IntervalUtilsSuite No new revisions were added by this update. Summary of changes: .../sql/catalyst/util/IntervalUtilsSuite.scala | 119 + 1 file changed, 26 insertions(+), 93 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-30844][SQL] Static partition should also follow StoreAssignmentPolicy when insert into table
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new f30f50a [SPARK-30844][SQL] Static partition should also follow StoreAssignmentPolicy when insert into table f30f50a is described below commit f30f50a76f4b9fb5e652620563fb9055c5f30521 Author: yi.wu AuthorDate: Sun Feb 23 17:46:19 2020 +0900 [SPARK-30844][SQL] Static partition should also follow StoreAssignmentPolicy when insert into table ### What changes were proposed in this pull request? Make static partition also follows `StoreAssignmentPolicy` when insert into table: if `StoreAssignmentPolicy=LEGACY`, using `Cast`; if `StoreAssignmentPolicy=ANSI | STRIC`, using `AnsiCast`; E.g., for the table `t` created by: ``` create table t(a int, b string) using parquet partitioned by (a) ``` and insert values with `StoreAssignmentPolicy=ANSI` using: ``` insert into t partition(a='ansi') values('ansi') ``` Before this PR: ``` +++ | b| a| +++ |ansi|null| +++ ``` After this PR, insert will fail by: ``` java.lang.NumberFormatException: invalid input syntax for type numeric: ansi ``` (It should be better if we could use `TableOutputResolver.checkField` to fully follow `StoreAssignmentPolicy`. But since we lost the data type of static partition's value at first place, it's hard to use `TableOutputResolver.checkField`.) ### Why are the changes needed? I think we should follow `StoreAssignmentPolicy` when insert into table for any columns, including static partition. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Added new test. Closes #27597 from Ngone51/fix-static-partition. Authored-by: yi.wu Signed-off-by: Takeshi Yamamuro (cherry picked from commit 9c2eadc7268844d49ec41da818002c99bb56addf) Signed-off-by: Takeshi Yamamuro --- .../execution/datasources/DataSourceStrategy.scala | 13 - .../spark/sql/sources/DataSourceAnalysisSuite.scala | 10 -- .../org/apache/spark/sql/sources/InsertSuite.scala | 21 + 3 files changed, 41 insertions(+), 3 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala index e3a0a0a..2d902b5 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala @@ -39,6 +39,7 @@ import org.apache.spark.sql.catalyst.rules.Rule import org.apache.spark.sql.execution.{RowDataSourceScanExec, SparkPlan} import org.apache.spark.sql.execution.command._ import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.internal.SQLConf.StoreAssignmentPolicy import org.apache.spark.sql.sources._ import org.apache.spark.sql.types._ import org.apache.spark.unsafe.types.UTF8String @@ -104,7 +105,17 @@ case class DataSourceAnalysis(conf: SQLConf) extends Rule[LogicalPlan] with Cast None } else if (potentialSpecs.size == 1) { val partValue = potentialSpecs.head._2 -Some(Alias(cast(Literal(partValue), field.dataType), field.name)()) +conf.storeAssignmentPolicy match { + // SPARK-30844: try our best to follow StoreAssignmentPolicy for static partition + // values but not completely follow because we can't do static type checking due to + // the reason that the parser has erased the type info of static partition values + // and converted them to string. + case StoreAssignmentPolicy.ANSI | StoreAssignmentPolicy.STRICT => +Some(Alias(AnsiCast(Literal(partValue), field.dataType, + Option(conf.sessionLocalTimeZone)), field.name)()) + case _ => +Some(Alias(cast(Literal(partValue), field.dataType), field.name)()) +} } else { throw new AnalysisException( s"Partition column ${field.name} have multiple values specified, " + diff --git a/sql/core/src/test/scala/org/apache/spark/sql/sources/DataSourceAnalysisSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/sources/DataSourceAnalysisSuite.scala index e1022e3..a6c5090 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/sources/DataSourceAnalysisSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/sources/DataSourceAnalysisSuite.scala @@ -22,9 +22,10 @@ import org.scalatest.BeforeAndAfterAll import org.apache.spark.Spark
[spark] branch master updated (25f5bfa -> 9c2eadc)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 25f5bfa [SPARK-30903][SQL] Fail fast on duplicate columns when analyze columns add 9c2eadc [SPARK-30844][SQL] Static partition should also follow StoreAssignmentPolicy when insert into table No new revisions were added by this update. Summary of changes: .../execution/datasources/DataSourceStrategy.scala | 13 - .../spark/sql/sources/DataSourceAnalysisSuite.scala | 10 -- .../org/apache/spark/sql/sources/InsertSuite.scala | 21 + 3 files changed, 41 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (25f5bfa -> 9c2eadc)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 25f5bfa [SPARK-30903][SQL] Fail fast on duplicate columns when analyze columns add 9c2eadc [SPARK-30844][SQL] Static partition should also follow StoreAssignmentPolicy when insert into table No new revisions were added by this update. Summary of changes: .../execution/datasources/DataSourceStrategy.scala | 13 - .../spark/sql/sources/DataSourceAnalysisSuite.scala | 10 -- .../org/apache/spark/sql/sources/InsertSuite.scala | 21 + 3 files changed, 41 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-30903][SQL] Fail fast on duplicate columns when analyze columns
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 4a82ead [SPARK-30903][SQL] Fail fast on duplicate columns when analyze columns 4a82ead is described below commit 4a82ead147c944c8d4828bbbd4a7e3ec3d3e1135 Author: yi.wu AuthorDate: Sun Feb 23 09:52:54 2020 +0900 [SPARK-30903][SQL] Fail fast on duplicate columns when analyze columns ### What changes were proposed in this pull request? Add new `CommandCheck` rule and fail fast when detects duplicate columns in `AnalyzeColumnCommand`. ### Why are the changes needed? To avoid duplicate statistics computation for the same column in `AnalyzeColumnCommand`. ### Does this PR introduce any user-facing change? Yes. User now get exception when input duplicate columns. ### How was this patch tested? Added new test. Closes #27651 from Ngone51/fail_on_dup_cols. Authored-by: yi.wu Signed-off-by: Takeshi Yamamuro (cherry picked from commit 25f5bfaa6e624da7f491e770a2383038fc6009e1) Signed-off-by: Takeshi Yamamuro --- .../spark/sql/execution/command/CommandCheck.scala | 38 ++ .../sql/internal/BaseSessionStateBuilder.scala | 2 ++ .../spark/sql/StatisticsCollectionSuite.scala | 17 ++ .../spark/sql/hive/HiveSessionStateBuilder.scala | 2 ++ 4 files changed, 59 insertions(+) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandCheck.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandCheck.scala new file mode 100644 index 000..dedace4 --- /dev/null +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandCheck.scala @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command + +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.util.SchemaUtils + +/** + * Checks legitimization of various execution commands. + */ +case class CommandCheck(conf: SQLConf) extends (LogicalPlan => Unit) { + + override def apply(plan: LogicalPlan): Unit = { +plan.foreach { + case AnalyzeColumnCommand(_, colsOpt, allColumns) if !allColumns => +colsOpt.foreach(SchemaUtils.checkColumnNameDuplication( + _, "in analyze columns.", conf.caseSensitiveAnalysis)) + + case _ => +} + } +} diff --git a/sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala b/sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala index 2137fe2..20e1b56 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala @@ -28,6 +28,7 @@ import org.apache.spark.sql.catalyst.rules.Rule import org.apache.spark.sql.connector.catalog.CatalogManager import org.apache.spark.sql.execution.{ColumnarRule, QueryExecution, SparkOptimizer, SparkPlanner, SparkSqlParser} import org.apache.spark.sql.execution.analysis.DetectAmbiguousSelfJoin +import org.apache.spark.sql.execution.command.CommandCheck import org.apache.spark.sql.execution.datasources._ import org.apache.spark.sql.execution.datasources.v2.{TableCapabilityCheck, V2SessionCatalog} import org.apache.spark.sql.streaming.StreamingQueryManager @@ -190,6 +191,7 @@ abstract class BaseSessionStateBuilder( PreReadCheck +: HiveOnlyCheck +: TableCapabilityCheck +: +CommandCheck(conf) +: customCheckRules } diff --git a/sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala index e9ceab6..30b15a8 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala +++ b/sql/c
[spark] branch master updated (bcce1b1 -> 25f5bfa)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bcce1b1 [SPARK-30904][SQL] Thrift RowBasedSet serialization throws NullPointerException on NULL BigDecimal add 25f5bfa [SPARK-30903][SQL] Fail fast on duplicate columns when analyze columns No new revisions were added by this update. Summary of changes: .../CommandCheck.scala}| 23 ++ .../sql/internal/BaseSessionStateBuilder.scala | 2 ++ .../spark/sql/StatisticsCollectionSuite.scala | 17 .../spark/sql/hive/HiveSessionStateBuilder.scala | 2 ++ 4 files changed, 36 insertions(+), 8 deletions(-) copy sql/core/src/main/scala/org/apache/spark/sql/execution/{streaming/continuous/WriteToContinuousDataSource.scala => command/CommandCheck.scala} (60%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (bcce1b1 -> 25f5bfa)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bcce1b1 [SPARK-30904][SQL] Thrift RowBasedSet serialization throws NullPointerException on NULL BigDecimal add 25f5bfa [SPARK-30903][SQL] Fail fast on duplicate columns when analyze columns No new revisions were added by this update. Summary of changes: .../CommandCheck.scala}| 23 ++ .../sql/internal/BaseSessionStateBuilder.scala | 2 ++ .../spark/sql/StatisticsCollectionSuite.scala | 17 .../spark/sql/hive/HiveSessionStateBuilder.scala | 2 ++ 4 files changed, 36 insertions(+), 8 deletions(-) copy sql/core/src/main/scala/org/apache/spark/sql/execution/{streaming/continuous/WriteToContinuousDataSource.scala => command/CommandCheck.scala} (60%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [MINOR][SQL] Fix error position of NOSCAN
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new a415d07 [MINOR][SQL] Fix error position of NOSCAN a415d07 is described below commit a415d07c90ad46c9d88d78e956cb5680b213ce71 Author: yi.wu AuthorDate: Fri Feb 21 15:21:53 2020 +0900 [MINOR][SQL] Fix error position of NOSCAN ### What changes were proposed in this pull request? Point to correct position when miswrite `NOSCAN` detects. ### Why are the changes needed? Before: ``` [info] org.apache.spark.sql.catalyst.parser.ParseException: Expected `NOSCAN` instead of `SCAN`(line 1, pos 0) [info] [info] == SQL == [info] ANALYZE TABLE analyze_partition_with_null PARTITION (name) COMPUTE STATISTICS SCAN [info] ^^^ ``` After: ``` [info] org.apache.spark.sql.catalyst.parser.ParseException: Expected `NOSCAN` instead of `SCAN`(line 1, pos 78) [info] [info] == SQL == [info] ANALYZE TABLE analyze_partition_with_null PARTITION (name) COMPUTE STATISTICS SCAN [info] --^^^ ``` ### Does this PR introduce any user-facing change? Yes, user will see better error message. ### How was this patch tested? Manually test. Closes #27662 from Ngone51/fix_noscan_reference. Authored-by: yi.wu Signed-off-by: Takeshi Yamamuro (cherry picked from commit 4d356554a61024c7d3dc450accec1b3639c37e19) Signed-off-by: Takeshi Yamamuro --- .../main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala index 62e5685..36c1647 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala @@ -3165,7 +3165,8 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging } if (ctx.identifier != null && ctx.identifier.getText.toLowerCase(Locale.ROOT) != "noscan") { - throw new ParseException(s"Expected `NOSCAN` instead of `${ctx.identifier.getText}`", ctx) + throw new ParseException(s"Expected `NOSCAN` instead of `${ctx.identifier.getText}`", +ctx.identifier()) } val tableName = visitMultipartIdentifier(ctx.multipartIdentifier()) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4d5166f -> 4d35655)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4d5166f [SPARK-30880][DOCS] Delete Sphinx Makefile cruft add 4d35655 [MINOR][SQL] Fix error position of NOSCAN No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8629597 -> d5b92b2)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8629597 [SPARK-30639][BUILD] Upgrade Jersey to 2.30 add d5b92b2 [SPARK-30579][DOC] Document ORDER BY Clause of SELECT statement in SQL Reference No new revisions were added by this update. Summary of changes: docs/sql-ref-syntax-qry-select-orderby.md | 123 +- 1 file changed, 122 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8629597 -> d5b92b2)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8629597 [SPARK-30639][BUILD] Upgrade Jersey to 2.30 add d5b92b2 [SPARK-30579][DOC] Document ORDER BY Clause of SELECT statement in SQL Reference No new revisions were added by this update. Summary of changes: docs/sql-ref-syntax-qry-select-orderby.md | 123 +- 1 file changed, 122 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3228d72 -> 4847f73)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3228d72 [SPARK-30603][SQL] Move RESERVED_PROPERTIES from SupportsNamespaces and TableCatalog to CatalogV2Util add 4847f73 [SPARK-30298][SQL] Respect aliases in output partitioning of projects and aggregates No new revisions were added by this update. Summary of changes: .../execution/AliasAwareOutputPartitioning.scala | 55 ++ .../execution/aggregate/HashAggregateExec.scala| 4 +- .../aggregate/ObjectHashAggregateExec.scala| 4 +- .../execution/aggregate/SortAggregateExec.scala| 6 +- .../sql/execution/basicPhysicalOperators.scala | 5 +- .../apache/spark/sql/execution/PlannerSuite.scala | 88 ++ .../spark/sql/sources/BucketedReadSuite.scala | 14 7 files changed, 166 insertions(+), 10 deletions(-) create mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/AliasAwareOutputPartitioning.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f35f352 -> d0bf447)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f35f352 [SPARK-30543][ML][PYSPARK][R] RandomForest add Param bootstrap to control sampling method add d0bf447 [SPARK-30575][DOCS][FOLLOWUP] Fix typos in documents No new revisions were added by this update. Summary of changes: docs/sql-ref-syntax-qry-select-groupby.md | 2 +- docs/sql-ref-syntax-qry-select-having.md | 4 ++-- docs/sql-ref-syntax-qry-select-where.md | 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a3a42b3 -> 5a55a5a)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a3a42b3 [SPARK-27986][SQL][FOLLOWUP] Respect filter in sql/toString of AggregateExpression add 5a55a5a [SPARK-30518][SQL] Precision and scale should be same for values between -1.0 and 1.0 in Decimal No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/sql/types/Decimal.scala| 8 +--- .../catalyst/expressions/ArithmeticExpressionSuite.scala | 8 .../scala/org/apache/spark/sql/types/DecimalSuite.scala| 14 +++--- 3 files changed, 16 insertions(+), 14 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (883ae33 -> a3a42b3)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 883ae33 [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework add a3a42b3 [SPARK-27986][SQL][FOLLOWUP] Respect filter in sql/toString of AggregateExpression No new revisions were added by this update. Summary of changes: .../expressions/aggregate/interfaces.scala | 19 +-- .../spark/sql/execution/aggregate/AggUtils.scala | 23 ++-- .../sql-tests/results/group-by-filter.sql.out | 62 +++--- .../results/postgreSQL/aggregates_part3.sql.out| 4 +- .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 33 +++- 5 files changed, 86 insertions(+), 55 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d42cf45 -> 8a926e4)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d42cf45 [SPARK-30246][CORE] OneForOneStreamManager might leak memory in connectionTerminated add 8a926e4 [SPARK-26736][SQL] Partition pruning through nondeterministic expressions in Hive tables No new revisions were added by this update. Summary of changes: .../src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala | 2 +- ...stic condition - query test-0-56a1c59bd13c2a83a91eb0ec658fcecc} | 0 .../scala/org/apache/spark/sql/hive/execution/PruningSuite.scala | 7 +++ 3 files changed, 8 insertions(+), 1 deletion(-) copy sql/hive/src/test/resources/golden/{Partition pruning - left only 1 partition - query test-0-3adc3a7f76b2abd059904ba81a595db3 => Partition pruning - with filter containing non-deterministic condition - query test-0-56a1c59bd13c2a83a91eb0ec658fcecc} (100%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d42cf45 -> 8a926e4)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d42cf45 [SPARK-30246][CORE] OneForOneStreamManager might leak memory in connectionTerminated add 8a926e4 [SPARK-26736][SQL] Partition pruning through nondeterministic expressions in Hive tables No new revisions were added by this update. Summary of changes: .../src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala | 2 +- ...stic condition - query test-0-56a1c59bd13c2a83a91eb0ec658fcecc} | 0 .../scala/org/apache/spark/sql/hive/execution/PruningSuite.scala | 7 +++ 3 files changed, 8 insertions(+), 1 deletion(-) copy sql/hive/src/test/resources/golden/{Partition pruning - left only 1 partition - query test-0-3adc3a7f76b2abd059904ba81a595db3 => Partition pruning - with filter containing non-deterministic condition - query test-0-56a1c59bd13c2a83a91eb0ec658fcecc} (100%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (240840f -> 5f6cd61)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 240840f [SPARK-30515][SQL] Refactor SimplifyBinaryComparison to reduce the time complexity add 5f6cd61 [SPARK-29708][SQL] Correct aggregated values when grouping sets are duplicated No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/Analyzer.scala | 2 +- .../plans/logical/basicLogicalOperators.scala | 30 ++ .../resources/sql-tests/inputs/grouping_set.sql| 6 +++ .../sql-tests/inputs/postgreSQL/groupingsets.sql | 1 - .../sql-tests/results/grouping_set.sql.out | 47 -- .../results/postgreSQL/groupingsets.sql.out| 6 ++- 6 files changed, 78 insertions(+), 14 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (51d2917 -> 240840f)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 51d2917 [SPARK-30505][DOCS] Deprecate Avro option `ignoreExtension` in sql-data-sources-avro.md add 240840f [SPARK-30515][SQL] Refactor SimplifyBinaryComparison to reduce the time complexity No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/optimizer/expressions.scala | 49 +++--- 1 file changed, 25 insertions(+), 24 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (51d2917 -> 240840f)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 51d2917 [SPARK-30505][DOCS] Deprecate Avro option `ignoreExtension` in sql-data-sources-avro.md add 240840f [SPARK-30515][SQL] Refactor SimplifyBinaryComparison to reduce the time complexity No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/optimizer/expressions.scala | 49 +++--- 1 file changed, 25 insertions(+), 24 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (1846b02 -> 88fc8db)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 1846b02 [SPARK-30500][SPARK-30501][SQL] Remove SQL configs deprecated in Spark 2.1 and 2.3 add 88fc8db [SPARK-30482][SQL][CORE][TESTS] Add sub-class of `AppenderSkeleton` reusable in tests No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/SparkFunSuite.scala | 21 +-- .../sql/catalyst/analysis/ResolveHintsSuite.scala | 15 +--- .../catalyst/expressions/CodeGenerationSuite.scala | 21 ++- .../catalyst/optimizer/OptimizerLoggingSuite.scala | 43 +- .../scala/org/apache/spark/sql/JoinHintSuite.scala | 15 +--- .../sql/execution/datasources/csv/CSVSuite.scala | 17 ++--- .../apache/spark/sql/internal/SQLConfSuite.scala | 12 +- 7 files changed, 41 insertions(+), 103 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (1846b02 -> 88fc8db)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 1846b02 [SPARK-30500][SPARK-30501][SQL] Remove SQL configs deprecated in Spark 2.1 and 2.3 add 88fc8db [SPARK-30482][SQL][CORE][TESTS] Add sub-class of `AppenderSkeleton` reusable in tests No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/SparkFunSuite.scala | 21 +-- .../sql/catalyst/analysis/ResolveHintsSuite.scala | 15 +--- .../catalyst/expressions/CodeGenerationSuite.scala | 21 ++- .../catalyst/optimizer/OptimizerLoggingSuite.scala | 43 +- .../scala/org/apache/spark/sql/JoinHintSuite.scala | 15 +--- .../sql/execution/datasources/csv/CSVSuite.scala | 17 ++--- .../apache/spark/sql/internal/SQLConfSuite.scala | 12 +- 7 files changed, 41 insertions(+), 103 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b389b8c -> 81e1a21)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b389b8c [SPARK-30188][SQL] Resolve the failed unit tests when enable AQE add 81e1a21 [SPARK-30234][SQL][DOCS][FOLOWUP] Update Documentation for ADD FILE and LIST FILE No new revisions were added by this update. Summary of changes: docs/sql-ref-syntax-aux-resource-mgmt-add-file.md | 9 + docs/sql-ref-syntax-aux-resource-mgmt-list-file.md | 2 +- 2 files changed, 6 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b389b8c -> 81e1a21)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b389b8c [SPARK-30188][SQL] Resolve the failed unit tests when enable AQE add 81e1a21 [SPARK-30234][SQL][DOCS][FOLOWUP] Update Documentation for ADD FILE and LIST FILE No new revisions were added by this update. Summary of changes: docs/sql-ref-syntax-aux-resource-mgmt-add-file.md | 9 + docs/sql-ref-syntax-aux-resource-mgmt-list-file.md | 2 +- 2 files changed, 6 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d6532c7 -> b942832)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d6532c7 [SPARK-30448][CORE] accelerator aware scheduling enforce cores as limiting resource add b942832 [SPARK-30343][SQL] Skip unnecessary checks in RewriteDistinctAggregates No new revisions were added by this update. Summary of changes: .../sql/catalyst/optimizer/RewriteDistinctAggregates.scala | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (bcf07cb -> 418f7dc)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bcf07cb [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax add 418f7dc [SPARK-30447][SQL] Constant propagation nullability issue No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/optimizer/expressions.scala | 41 -- .../optimizer/ConstantPropagationSuite.scala | 25 - .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 9 + 3 files changed, 63 insertions(+), 12 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated (e52ae4e -> 6ac3659)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git. from e52ae4e [SPARK-30450][INFRA][FOLLOWUP][2.4] Fix git folder regex for windows file separator add 6ac3659 [SPARK-30410][SQL][2.4] Calculating size of table with large number of partitions causes flooding logs No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/execution/command/CommandUtils.scala | 10 +++--- .../spark/sql/execution/datasources/InMemoryFileIndex.scala| 6 +- 2 files changed, 12 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d7c7e37 -> 9535776)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d7c7e37 [SPARK-30381][ML] Refactor GBT to reuse treePoints for all trees add 9535776 [SPARK-30302][SQL] Complete info for show create table for views No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/catalog/interface.scala | 9 +- .../spark/sql/execution/command/tables.scala | 43 +++-- .../sql-tests/inputs/show-create-table.sql | 31 +++ .../sql-tests/results/show-create-table.sql.out| 103 - 4 files changed, 174 insertions(+), 12 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d7c7e37 -> 9535776)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d7c7e37 [SPARK-30381][ML] Refactor GBT to reuse treePoints for all trees add 9535776 [SPARK-30302][SQL] Complete info for show create table for views No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/catalog/interface.scala | 9 +- .../spark/sql/execution/command/tables.scala | 43 +++-- .../sql-tests/inputs/show-create-table.sql | 31 +++ .../sql-tests/results/show-create-table.sql.out| 103 - 4 files changed, 174 insertions(+), 12 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ed8a260 -> ed73ed8)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ed8a260 [SPARK-30450][INFRA] Exclude .git folder for python linter add ed73ed8 [SPARK-28825][SQL][DOC] Documentation for Explain Command No new revisions were added by this update. Summary of changes: docs/sql-ref-syntax-qry-explain.md | 119 - 1 file changed, 118 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ed8a260 -> ed73ed8)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ed8a260 [SPARK-30450][INFRA] Exclude .git folder for python linter add ed73ed8 [SPARK-28825][SQL][DOC] Documentation for Explain Command No new revisions were added by this update. Summary of changes: docs/sql-ref-syntax-qry-explain.md | 119 - 1 file changed, 118 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org