[spark] branch branch-3.0 updated: [SPARK-31556][SQL][DOCS] Document LIKE clause in SQL Reference

2020-04-28 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new da8c7b8  [SPARK-31556][SQL][DOCS] Document LIKE clause in SQL Reference
da8c7b8 is described below

commit da8c7b8ceffa1566ae35280a2d1c3abcbff47542
Author: Huaxin Gao 
AuthorDate: Wed Apr 29 09:17:23 2020 +0900

[SPARK-31556][SQL][DOCS] Document LIKE clause in SQL Reference

### What changes were proposed in this pull request?
Document LIKE clause in SQL Reference

### Why are the changes needed?
To make SQL Reference complete

### Does this PR introduce any user-facing change?
Yes

https://user-images.githubusercontent.com/13592258/80294346-5babab80-871d-11ea-8ac9-51bbab0aca88.png;>

https://user-images.githubusercontent.com/13592258/80294347-5ea69c00-871d-11ea-8c51-7a90ee20f7da.png;>

https://user-images.githubusercontent.com/13592258/80294351-61a18c80-871d-11ea-9e75-e3345d2f52f5.png;>

### How was this patch tested?
Manually build and check

Closes #28332 from huaxingao/where_clause.

Authored-by: Huaxin Gao 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit d34cb59fb311c3d700e4f4f877b61b17cea313ee)
Signed-off-by: Takeshi Yamamuro 
---
 docs/_data/menu-sql.yaml  |   2 +
 docs/sql-ref-syntax-aux-show-databases.md |  13 +++-
 docs/sql-ref-syntax-aux-show-functions.md |   8 +-
 docs/sql-ref-syntax-aux-show-table.md |  14 ++--
 docs/sql-ref-syntax-aux-show-tables.md|  10 +--
 docs/sql-ref-syntax-aux-show-views.md |  12 +--
 docs/sql-ref-syntax-qry-explain.md|   2 +-
 docs/sql-ref-syntax-qry-select-like.md| 120 ++
 8 files changed, 154 insertions(+), 27 deletions(-)

diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml
index 1097079..dfe4cfa 100644
--- a/docs/_data/menu-sql.yaml
+++ b/docs/_data/menu-sql.yaml
@@ -170,6 +170,8 @@
   url: sql-ref-syntax-qry-select-inline-table.html
 - text: Common Table Expression
   url: sql-ref-syntax-qry-select-cte.html
+- text: LIKE Predicate
+  url: sql-ref-syntax-qry-select-like.html
 - text: Window Function
   url: sql-ref-syntax-qry-window.html
 - text: EXPLAIN
diff --git a/docs/sql-ref-syntax-aux-show-databases.md 
b/docs/sql-ref-syntax-aux-show-databases.md
index 0ed3452..3599009 100644
--- a/docs/sql-ref-syntax-aux-show-databases.md
+++ b/docs/sql-ref-syntax-aux-show-databases.md
@@ -29,16 +29,21 @@ and mean the same thing.
 ### Syntax
 
 {% highlight sql %}
-SHOW { DATABASES | SCHEMAS } [ LIKE string_pattern ]
+SHOW { DATABASES | SCHEMAS } [ LIKE regex_pattern ]
 {% endhighlight %}
 
 ### Parameters
 
 
-  LIKE string_pattern
+  regex_pattern
   
-Specifies a string pattern that is used to match the databases in the 
system. In 
-the specified string pattern '*' matches any number of 
characters.
+Specifies a regular expression pattern that is used to filter the results 
of the
+statement.
+
+  Only * and | are allowed as wildcard 
pattern.
+  Excluding * and |, the remaining pattern 
follows the regular expression semantics.
+  The leading and trailing blanks are trimmed in the input pattern 
before processing. The pattern match is case-insensitive.
+
   
 
 
diff --git a/docs/sql-ref-syntax-aux-show-functions.md 
b/docs/sql-ref-syntax-aux-show-functions.md
index da33d99..ed22a3a 100644
--- a/docs/sql-ref-syntax-aux-show-functions.md
+++ b/docs/sql-ref-syntax-aux-show-functions.md
@@ -58,12 +58,12 @@ SHOW [ function_kind ] FUNCTIONS ( [ LIKE ] function_name | 
regex_pattern )
   
   regex_pattern
   
-Specifies a regular expression pattern that is used to limit the results 
of the
+Specifies a regular expression pattern that is used to filter the results 
of the
 statement.
 
-  Only `*` and `|` are allowed as wildcard pattern.
-  Excluding `*` and `|` the remaining pattern follows the regex 
semantics.
-  The leading and trailing blanks are trimmed in the input pattern 
before processing. 
+  Only * and | are allowed as wildcard 
pattern.
+  Excluding * and |, the remaining pattern 
follows the regular expression semantics.
+  The leading and trailing blanks are trimmed in the input pattern 
before processing. The pattern match is case-insensitive.
 
   
 
diff --git a/docs/sql-ref-syntax-aux-show-table.md 
b/docs/sql-ref-syntax-aux-show-table.md
index 1aa44d3..c688a99 100644
--- a/docs/sql-ref-syntax-aux-show-table.md
+++ b/docs/sql-ref-syntax-aux-show-table.md
@@ -33,7 +33,7 @@ cannot be used with a partition specification.
 ### Syntax
 
 {% highlight sql %}
-SHOW TABLE EXTENDED [ IN | FROM database_na

[spark] branch branch-3.0 updated: [SPARK-31556][SQL][DOCS] Document LIKE clause in SQL Reference

2020-04-28 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new da8c7b8  [SPARK-31556][SQL][DOCS] Document LIKE clause in SQL Reference
da8c7b8 is described below

commit da8c7b8ceffa1566ae35280a2d1c3abcbff47542
Author: Huaxin Gao 
AuthorDate: Wed Apr 29 09:17:23 2020 +0900

[SPARK-31556][SQL][DOCS] Document LIKE clause in SQL Reference

### What changes were proposed in this pull request?
Document LIKE clause in SQL Reference

### Why are the changes needed?
To make SQL Reference complete

### Does this PR introduce any user-facing change?
Yes

https://user-images.githubusercontent.com/13592258/80294346-5babab80-871d-11ea-8ac9-51bbab0aca88.png;>

https://user-images.githubusercontent.com/13592258/80294347-5ea69c00-871d-11ea-8c51-7a90ee20f7da.png;>

https://user-images.githubusercontent.com/13592258/80294351-61a18c80-871d-11ea-9e75-e3345d2f52f5.png;>

### How was this patch tested?
Manually build and check

Closes #28332 from huaxingao/where_clause.

Authored-by: Huaxin Gao 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit d34cb59fb311c3d700e4f4f877b61b17cea313ee)
Signed-off-by: Takeshi Yamamuro 
---
 docs/_data/menu-sql.yaml  |   2 +
 docs/sql-ref-syntax-aux-show-databases.md |  13 +++-
 docs/sql-ref-syntax-aux-show-functions.md |   8 +-
 docs/sql-ref-syntax-aux-show-table.md |  14 ++--
 docs/sql-ref-syntax-aux-show-tables.md|  10 +--
 docs/sql-ref-syntax-aux-show-views.md |  12 +--
 docs/sql-ref-syntax-qry-explain.md|   2 +-
 docs/sql-ref-syntax-qry-select-like.md| 120 ++
 8 files changed, 154 insertions(+), 27 deletions(-)

diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml
index 1097079..dfe4cfa 100644
--- a/docs/_data/menu-sql.yaml
+++ b/docs/_data/menu-sql.yaml
@@ -170,6 +170,8 @@
   url: sql-ref-syntax-qry-select-inline-table.html
 - text: Common Table Expression
   url: sql-ref-syntax-qry-select-cte.html
+- text: LIKE Predicate
+  url: sql-ref-syntax-qry-select-like.html
 - text: Window Function
   url: sql-ref-syntax-qry-window.html
 - text: EXPLAIN
diff --git a/docs/sql-ref-syntax-aux-show-databases.md 
b/docs/sql-ref-syntax-aux-show-databases.md
index 0ed3452..3599009 100644
--- a/docs/sql-ref-syntax-aux-show-databases.md
+++ b/docs/sql-ref-syntax-aux-show-databases.md
@@ -29,16 +29,21 @@ and mean the same thing.
 ### Syntax
 
 {% highlight sql %}
-SHOW { DATABASES | SCHEMAS } [ LIKE string_pattern ]
+SHOW { DATABASES | SCHEMAS } [ LIKE regex_pattern ]
 {% endhighlight %}
 
 ### Parameters
 
 
-  LIKE string_pattern
+  regex_pattern
   
-Specifies a string pattern that is used to match the databases in the 
system. In 
-the specified string pattern '*' matches any number of 
characters.
+Specifies a regular expression pattern that is used to filter the results 
of the
+statement.
+
+  Only * and | are allowed as wildcard 
pattern.
+  Excluding * and |, the remaining pattern 
follows the regular expression semantics.
+  The leading and trailing blanks are trimmed in the input pattern 
before processing. The pattern match is case-insensitive.
+
   
 
 
diff --git a/docs/sql-ref-syntax-aux-show-functions.md 
b/docs/sql-ref-syntax-aux-show-functions.md
index da33d99..ed22a3a 100644
--- a/docs/sql-ref-syntax-aux-show-functions.md
+++ b/docs/sql-ref-syntax-aux-show-functions.md
@@ -58,12 +58,12 @@ SHOW [ function_kind ] FUNCTIONS ( [ LIKE ] function_name | 
regex_pattern )
   
   regex_pattern
   
-Specifies a regular expression pattern that is used to limit the results 
of the
+Specifies a regular expression pattern that is used to filter the results 
of the
 statement.
 
-  Only `*` and `|` are allowed as wildcard pattern.
-  Excluding `*` and `|` the remaining pattern follows the regex 
semantics.
-  The leading and trailing blanks are trimmed in the input pattern 
before processing. 
+  Only * and | are allowed as wildcard 
pattern.
+  Excluding * and |, the remaining pattern 
follows the regular expression semantics.
+  The leading and trailing blanks are trimmed in the input pattern 
before processing. The pattern match is case-insensitive.
 
   
 
diff --git a/docs/sql-ref-syntax-aux-show-table.md 
b/docs/sql-ref-syntax-aux-show-table.md
index 1aa44d3..c688a99 100644
--- a/docs/sql-ref-syntax-aux-show-table.md
+++ b/docs/sql-ref-syntax-aux-show-table.md
@@ -33,7 +33,7 @@ cannot be used with a partition specification.
 ### Syntax
 
 {% highlight sql %}
-SHOW TABLE EXTENDED [ IN | FROM database_na

[spark] branch master updated (dcc0902 -> d34cb59)

2020-04-28 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from dcc0902  [SPARK-29458][SQL][DOCS] Add a paragraph for scalar function 
in sql getting started
 add d34cb59  [SPARK-31556][SQL][DOCS] Document LIKE clause in SQL Reference

No new revisions were added by this update.

Summary of changes:
 docs/_data/menu-sql.yaml  |   2 +
 docs/sql-ref-syntax-aux-show-databases.md |  13 +++-
 docs/sql-ref-syntax-aux-show-functions.md |   8 +-
 docs/sql-ref-syntax-aux-show-table.md |  14 ++--
 docs/sql-ref-syntax-aux-show-tables.md|  10 +--
 docs/sql-ref-syntax-aux-show-views.md |  12 +--
 docs/sql-ref-syntax-qry-explain.md|   2 +-
 docs/sql-ref-syntax-qry-select-like.md| 120 ++
 8 files changed, 154 insertions(+), 27 deletions(-)
 create mode 100644 docs/sql-ref-syntax-qry-select-like.md


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-31556][SQL][DOCS] Document LIKE clause in SQL Reference

2020-04-28 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d34cb59  [SPARK-31556][SQL][DOCS] Document LIKE clause in SQL Reference
d34cb59 is described below

commit d34cb59fb311c3d700e4f4f877b61b17cea313ee
Author: Huaxin Gao 
AuthorDate: Wed Apr 29 09:17:23 2020 +0900

[SPARK-31556][SQL][DOCS] Document LIKE clause in SQL Reference

### What changes were proposed in this pull request?
Document LIKE clause in SQL Reference

### Why are the changes needed?
To make SQL Reference complete

### Does this PR introduce any user-facing change?
Yes

https://user-images.githubusercontent.com/13592258/80294346-5babab80-871d-11ea-8ac9-51bbab0aca88.png;>

https://user-images.githubusercontent.com/13592258/80294347-5ea69c00-871d-11ea-8c51-7a90ee20f7da.png;>

https://user-images.githubusercontent.com/13592258/80294351-61a18c80-871d-11ea-9e75-e3345d2f52f5.png;>

### How was this patch tested?
Manually build and check

Closes #28332 from huaxingao/where_clause.

Authored-by: Huaxin Gao 
Signed-off-by: Takeshi Yamamuro 
---
 docs/_data/menu-sql.yaml  |   2 +
 docs/sql-ref-syntax-aux-show-databases.md |  13 +++-
 docs/sql-ref-syntax-aux-show-functions.md |   8 +-
 docs/sql-ref-syntax-aux-show-table.md |  14 ++--
 docs/sql-ref-syntax-aux-show-tables.md|  10 +--
 docs/sql-ref-syntax-aux-show-views.md |  12 +--
 docs/sql-ref-syntax-qry-explain.md|   2 +-
 docs/sql-ref-syntax-qry-select-like.md| 120 ++
 8 files changed, 154 insertions(+), 27 deletions(-)

diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml
index 1097079..dfe4cfa 100644
--- a/docs/_data/menu-sql.yaml
+++ b/docs/_data/menu-sql.yaml
@@ -170,6 +170,8 @@
   url: sql-ref-syntax-qry-select-inline-table.html
 - text: Common Table Expression
   url: sql-ref-syntax-qry-select-cte.html
+- text: LIKE Predicate
+  url: sql-ref-syntax-qry-select-like.html
 - text: Window Function
   url: sql-ref-syntax-qry-window.html
 - text: EXPLAIN
diff --git a/docs/sql-ref-syntax-aux-show-databases.md 
b/docs/sql-ref-syntax-aux-show-databases.md
index 0ed3452..3599009 100644
--- a/docs/sql-ref-syntax-aux-show-databases.md
+++ b/docs/sql-ref-syntax-aux-show-databases.md
@@ -29,16 +29,21 @@ and mean the same thing.
 ### Syntax
 
 {% highlight sql %}
-SHOW { DATABASES | SCHEMAS } [ LIKE string_pattern ]
+SHOW { DATABASES | SCHEMAS } [ LIKE regex_pattern ]
 {% endhighlight %}
 
 ### Parameters
 
 
-  LIKE string_pattern
+  regex_pattern
   
-Specifies a string pattern that is used to match the databases in the 
system. In 
-the specified string pattern '*' matches any number of 
characters.
+Specifies a regular expression pattern that is used to filter the results 
of the
+statement.
+
+  Only * and | are allowed as wildcard 
pattern.
+  Excluding * and |, the remaining pattern 
follows the regular expression semantics.
+  The leading and trailing blanks are trimmed in the input pattern 
before processing. The pattern match is case-insensitive.
+
   
 
 
diff --git a/docs/sql-ref-syntax-aux-show-functions.md 
b/docs/sql-ref-syntax-aux-show-functions.md
index da33d99..ed22a3a 100644
--- a/docs/sql-ref-syntax-aux-show-functions.md
+++ b/docs/sql-ref-syntax-aux-show-functions.md
@@ -58,12 +58,12 @@ SHOW [ function_kind ] FUNCTIONS ( [ LIKE ] function_name | 
regex_pattern )
   
   regex_pattern
   
-Specifies a regular expression pattern that is used to limit the results 
of the
+Specifies a regular expression pattern that is used to filter the results 
of the
 statement.
 
-  Only `*` and `|` are allowed as wildcard pattern.
-  Excluding `*` and `|` the remaining pattern follows the regex 
semantics.
-  The leading and trailing blanks are trimmed in the input pattern 
before processing. 
+  Only * and | are allowed as wildcard 
pattern.
+  Excluding * and |, the remaining pattern 
follows the regular expression semantics.
+  The leading and trailing blanks are trimmed in the input pattern 
before processing. The pattern match is case-insensitive.
 
   
 
diff --git a/docs/sql-ref-syntax-aux-show-table.md 
b/docs/sql-ref-syntax-aux-show-table.md
index 1aa44d3..c688a99 100644
--- a/docs/sql-ref-syntax-aux-show-table.md
+++ b/docs/sql-ref-syntax-aux-show-table.md
@@ -33,7 +33,7 @@ cannot be used with a partition specification.
 ### Syntax
 
 {% highlight sql %}
-SHOW TABLE EXTENDED [ IN | FROM database_name ] LIKE 
'identifier_with_wildcards'
+SHOW TABLE EXTENDED [ IN | FROM database_name ] LIKE regex_pattern
 [ parti

[spark] branch branch-3.0 updated: [SPARK-31491][SQL][DOCS] Re-arrange Data Types page to document Floating Point Special Values

2020-04-24 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 37002fe  [SPARK-31491][SQL][DOCS] Re-arrange Data Types page to 
document Floating Point Special Values
37002fe is described below

commit 37002fe69a58ba071ada798842d7e77c4cd6e47e
Author: Huaxin Gao 
AuthorDate: Sat Apr 25 09:02:16 2020 +0900

[SPARK-31491][SQL][DOCS] Re-arrange Data Types page to document Floating 
Point Special Values

### What changes were proposed in this pull request?
Re-arrange Data Types page to document Floating Point Special Values

### Why are the changes needed?
To complete SQL Reference

### Does this PR introduce any user-facing change?
Yes

- add Floating Point Special Values in Data Types page
- move NaN Semantics to Data Types page

https://user-images.githubusercontent.com/13592258/80233996-3da25600-860c-11ea-8285-538efc16e431.png;>

https://user-images.githubusercontent.com/13592258/80234001-4004b000-860c-11ea-8954-72f63c92d50d.png;>

https://user-images.githubusercontent.com/13592258/80234006-41ce7380-860c-11ea-96bf-15e1aa2102ff.png;>

### How was this patch tested?
Manually build and check

Closes #28264 from huaxingao/datatypes.

Authored-by: Huaxin Gao 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 054bef94ca7e84ff8e2e27af65e00e183f7be6da)
Signed-off-by: Takeshi Yamamuro 
---
 docs/_data/menu-sql.yaml  |   2 -
 docs/sql-ref-datatypes.md | 119 ++
 docs/sql-ref-nan-semantics.md |  29 --
 3 files changed, 119 insertions(+), 31 deletions(-)

diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml
index 26cca61..1097079 100644
--- a/docs/_data/menu-sql.yaml
+++ b/docs/_data/menu-sql.yaml
@@ -84,8 +84,6 @@
   url: sql-ref-literals.html
 - text: Null Semantics
   url: sql-ref-null-semantics.html
-- text: NaN Semantics
-  url: sql-ref-nan-semantics.html
 - text: ANSI Compliance
   url: sql-ref-ansi-compliance.html
   subitems:
diff --git a/docs/sql-ref-datatypes.md b/docs/sql-ref-datatypes.md
index 150e194..0d49f6f 100644
--- a/docs/sql-ref-datatypes.md
+++ b/docs/sql-ref-datatypes.md
@@ -19,6 +19,8 @@ license: |
   limitations under the License.
 ---
 
+### Supported Data Types
+
 Spark SQL and DataFrames support the following data types:
 
 * Numeric types
@@ -706,3 +708,120 @@ The following table shows the type names as well as 
aliases used in Spark SQL pa
 
 
 
+
+### Floating Point Special Values
+
+Spark SQL supports several special floating point values in a case-insensitive 
manner:
+
+ * Inf/+Inf/Infinity/+Infinity: positive infinity
+   * ```FloatType```: equivalent to Scala Float.PositiveInfinity.
+   * ```DoubleType```: equivalent to Scala 
Double.PositiveInfinity.
+ * -Inf/-Infinity: negative infinity
+   * ```FloatType```: equivalent to Scala Float.NegativeInfinity.
+   * ```DoubleType```: equivalent to Scala 
Double.NegativeInfinity.
+ * NaN: not a number
+   * ```FloatType```: equivalent to Scala Float.NaN.
+   * ```DoubleType```:  equivalent to Scala Double.NaN.
+
+ Positive/Negative Infinity Semantics
+
+There is special handling for positive and negative infinity. They have the 
following semantics:
+
+ * Positive infinity multiplied by any positive value returns positive 
infinity.
+ * Negative infinity multiplied by any positive value returns negative 
infinity.
+ * Positive infinity multiplied by any negative value returns negative 
infinity.
+ * Negative infinity multiplied by any negative value returns positive 
infinity.
+ * Positive/negative infinity multiplied by 0 returns NaN.
+ * Positive/negative infinity is equal to itself.
+ * In aggregations, all positive infinity values are grouped together. 
Similarly, all negative infinity values are grouped together.
+ * Positive infinity and negative infinity are treated as normal values in 
join keys.
+ * Positive infinity sorts lower than NaN and higher than any other values.
+ * Negative infinity sorts lower than any other values.
+
+ NaN Semantics
+
+There is special handling for not-a-number (NaN) when dealing with `float` or 
`double` types that
+do not exactly match standard floating point semantics.
+Specifically:
+
+ * NaN = NaN returns true.
+ * In aggregations, all NaN values are grouped together.
+ * NaN is treated as a normal value in join keys.
+ * NaN values go last when in ascending order, larger than any other numeric 
value.
+
+ Examples
+
+{% highlight sql %}
+SELECT double('infinity') AS col;
+++
+| col|
+++
+|Infinity|
+++
+
+SELECT float('-inf') AS col;
++-+
+|  col|
++-+
+|-Infinity|
++-+
+
+SELECT float('NaN') AS col;
++-

[spark] branch master updated: [SPARK-31491][SQL][DOCS] Re-arrange Data Types page to document Floating Point Special Values

2020-04-24 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 054bef9  [SPARK-31491][SQL][DOCS] Re-arrange Data Types page to 
document Floating Point Special Values
054bef9 is described below

commit 054bef94ca7e84ff8e2e27af65e00e183f7be6da
Author: Huaxin Gao 
AuthorDate: Sat Apr 25 09:02:16 2020 +0900

[SPARK-31491][SQL][DOCS] Re-arrange Data Types page to document Floating 
Point Special Values

### What changes were proposed in this pull request?
Re-arrange Data Types page to document Floating Point Special Values

### Why are the changes needed?
To complete SQL Reference

### Does this PR introduce any user-facing change?
Yes

- add Floating Point Special Values in Data Types page
- move NaN Semantics to Data Types page

https://user-images.githubusercontent.com/13592258/80233996-3da25600-860c-11ea-8285-538efc16e431.png;>

https://user-images.githubusercontent.com/13592258/80234001-4004b000-860c-11ea-8954-72f63c92d50d.png;>

https://user-images.githubusercontent.com/13592258/80234006-41ce7380-860c-11ea-96bf-15e1aa2102ff.png;>

### How was this patch tested?
Manually build and check

Closes #28264 from huaxingao/datatypes.

Authored-by: Huaxin Gao 
Signed-off-by: Takeshi Yamamuro 
---
 docs/_data/menu-sql.yaml  |   2 -
 docs/sql-ref-datatypes.md | 119 ++
 docs/sql-ref-nan-semantics.md |  29 --
 3 files changed, 119 insertions(+), 31 deletions(-)

diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml
index 26cca61..1097079 100644
--- a/docs/_data/menu-sql.yaml
+++ b/docs/_data/menu-sql.yaml
@@ -84,8 +84,6 @@
   url: sql-ref-literals.html
 - text: Null Semantics
   url: sql-ref-null-semantics.html
-- text: NaN Semantics
-  url: sql-ref-nan-semantics.html
 - text: ANSI Compliance
   url: sql-ref-ansi-compliance.html
   subitems:
diff --git a/docs/sql-ref-datatypes.md b/docs/sql-ref-datatypes.md
index 150e194..0d49f6f 100644
--- a/docs/sql-ref-datatypes.md
+++ b/docs/sql-ref-datatypes.md
@@ -19,6 +19,8 @@ license: |
   limitations under the License.
 ---
 
+### Supported Data Types
+
 Spark SQL and DataFrames support the following data types:
 
 * Numeric types
@@ -706,3 +708,120 @@ The following table shows the type names as well as 
aliases used in Spark SQL pa
 
 
 
+
+### Floating Point Special Values
+
+Spark SQL supports several special floating point values in a case-insensitive 
manner:
+
+ * Inf/+Inf/Infinity/+Infinity: positive infinity
+   * ```FloatType```: equivalent to Scala Float.PositiveInfinity.
+   * ```DoubleType```: equivalent to Scala 
Double.PositiveInfinity.
+ * -Inf/-Infinity: negative infinity
+   * ```FloatType```: equivalent to Scala Float.NegativeInfinity.
+   * ```DoubleType```: equivalent to Scala 
Double.NegativeInfinity.
+ * NaN: not a number
+   * ```FloatType```: equivalent to Scala Float.NaN.
+   * ```DoubleType```:  equivalent to Scala Double.NaN.
+
+ Positive/Negative Infinity Semantics
+
+There is special handling for positive and negative infinity. They have the 
following semantics:
+
+ * Positive infinity multiplied by any positive value returns positive 
infinity.
+ * Negative infinity multiplied by any positive value returns negative 
infinity.
+ * Positive infinity multiplied by any negative value returns negative 
infinity.
+ * Negative infinity multiplied by any negative value returns positive 
infinity.
+ * Positive/negative infinity multiplied by 0 returns NaN.
+ * Positive/negative infinity is equal to itself.
+ * In aggregations, all positive infinity values are grouped together. 
Similarly, all negative infinity values are grouped together.
+ * Positive infinity and negative infinity are treated as normal values in 
join keys.
+ * Positive infinity sorts lower than NaN and higher than any other values.
+ * Negative infinity sorts lower than any other values.
+
+ NaN Semantics
+
+There is special handling for not-a-number (NaN) when dealing with `float` or 
`double` types that
+do not exactly match standard floating point semantics.
+Specifically:
+
+ * NaN = NaN returns true.
+ * In aggregations, all NaN values are grouped together.
+ * NaN is treated as a normal value in join keys.
+ * NaN values go last when in ascending order, larger than any other numeric 
value.
+
+ Examples
+
+{% highlight sql %}
+SELECT double('infinity') AS col;
+++
+| col|
+++
+|Infinity|
+++
+
+SELECT float('-inf') AS col;
++-+
+|  col|
++-+
+|-Infinity|
++-+
+
+SELECT float('NaN') AS col;
++---+
+|col|
++---+
+|NaN|
++---+
+
+SELECT double('infinity') * 0 AS col;
++---+
+|col|
++---+
+|NaN|
++---+
+
+SELE

[spark] branch branch-2.4 updated: [SPARK-31532][SQL] Builder should not propagate static sql configs to the existing active or default SparkSession

2020-04-24 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new a2a0c52  [SPARK-31532][SQL] Builder should not propagate static sql 
configs to the existing active or default SparkSession
a2a0c52 is described below

commit a2a0c52d7b21ef1e5f06cee6c8c83ad82f8b1b0b
Author: Kent Yao 
AuthorDate: Sat Apr 25 08:53:00 2020 +0900

[SPARK-31532][SQL] Builder should not propagate static sql configs to the 
existing active or default SparkSession

### What changes were proposed in this pull request?

SparkSessionBuilder shoud not propagate static sql configurations to the 
existing active/default SparkSession
This seems a long-standing bug.

```scala
scala> spark.sql("set spark.sql.warehouse.dir").show
+++
| key|   value|
+++
|spark.sql.warehou...|file:/Users/kenty...|
+++

scala> spark.sql("set spark.sql.warehouse.dir=2");
org.apache.spark.sql.AnalysisException: Cannot modify the value of a static 
config: spark.sql.warehouse.dir;
  at 
org.apache.spark.sql.RuntimeConfig.requireNonStaticConf(RuntimeConfig.scala:154)
  at org.apache.spark.sql.RuntimeConfig.set(RuntimeConfig.scala:42)
  at 
org.apache.spark.sql.execution.command.SetCommand.$anonfun$x$7$6(SetCommand.scala:100)
  at 
org.apache.spark.sql.execution.command.SetCommand.run(SetCommand.scala:156)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
  at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229)
  at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3644)
  at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
  at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
  at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
  at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3642)
  at org.apache.spark.sql.Dataset.(Dataset.scala:229)
  at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
  at 
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:607)
  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:602)
  ... 47 elided

scala> import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.SparkSession

scala> SparkSession.builder.config("spark.sql.warehouse.dir", "xyz").get
getClass   getOrCreate

scala> SparkSession.builder.config("spark.sql.warehouse.dir", 
"xyz").getOrCreate
20/04/23 23:49:13 WARN SparkSession$Builder: Using an existing 
SparkSession; some configuration may not take effect.
res7: org.apache.spark.sql.SparkSession = 
org.apache.spark.sql.SparkSession6403d574

scala> spark.sql("set spark.sql.warehouse.dir").show
++-+
| key|value|
++-+
|spark.sql.warehou...|  xyz|
++-+

scala>
OptionsAttachments
```

### Why are the changes needed?
bugfix as shown in the previous section

### Does this PR introduce any user-facing change?

Yes, static SQL configurations with SparkSession.builder.config do not 
propagate to any existing or new SparkSession instances.

### How was this patch tested?

new ut.

Closes #28316 from yaooqinn/SPARK-31532.

Authored-by: Kent Yao 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 8424f552293677717da7411ed43e68e73aa7f0d6)
Signed-off-by: Takeshi Yamamuro 
---
 .../scala/org/apache/spark/sql/SparkSession.scala  | 28 +
 .../spark/sql/SparkSessionBuilderSuite.scala   | 49 +-
 2 files changed, 67 insertions(+), 10 deletions(-)

diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scal

[spark] branch branch-3.0 updated: [SPARK-31532][SQL] Builder should not propagate static sql configs to the existing active or default SparkSession

2020-04-24 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 0dbd69c  [SPARK-31532][SQL] Builder should not propagate static sql 
configs to the existing active or default SparkSession
0dbd69c is described below

commit 0dbd69c61492f537bc0326d6ad86b616577f46df
Author: Kent Yao 
AuthorDate: Sat Apr 25 08:53:00 2020 +0900

[SPARK-31532][SQL] Builder should not propagate static sql configs to the 
existing active or default SparkSession

### What changes were proposed in this pull request?

SparkSessionBuilder shoud not propagate static sql configurations to the 
existing active/default SparkSession
This seems a long-standing bug.

```scala
scala> spark.sql("set spark.sql.warehouse.dir").show
+++
| key|   value|
+++
|spark.sql.warehou...|file:/Users/kenty...|
+++

scala> spark.sql("set spark.sql.warehouse.dir=2");
org.apache.spark.sql.AnalysisException: Cannot modify the value of a static 
config: spark.sql.warehouse.dir;
  at 
org.apache.spark.sql.RuntimeConfig.requireNonStaticConf(RuntimeConfig.scala:154)
  at org.apache.spark.sql.RuntimeConfig.set(RuntimeConfig.scala:42)
  at 
org.apache.spark.sql.execution.command.SetCommand.$anonfun$x$7$6(SetCommand.scala:100)
  at 
org.apache.spark.sql.execution.command.SetCommand.run(SetCommand.scala:156)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
  at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229)
  at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3644)
  at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
  at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
  at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
  at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3642)
  at org.apache.spark.sql.Dataset.(Dataset.scala:229)
  at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
  at 
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:607)
  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:602)
  ... 47 elided

scala> import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.SparkSession

scala> SparkSession.builder.config("spark.sql.warehouse.dir", "xyz").get
getClass   getOrCreate

scala> SparkSession.builder.config("spark.sql.warehouse.dir", 
"xyz").getOrCreate
20/04/23 23:49:13 WARN SparkSession$Builder: Using an existing 
SparkSession; some configuration may not take effect.
res7: org.apache.spark.sql.SparkSession = 
org.apache.spark.sql.SparkSession6403d574

scala> spark.sql("set spark.sql.warehouse.dir").show
++-+
| key|value|
++-+
|spark.sql.warehou...|  xyz|
++-+

scala>
OptionsAttachments
```

### Why are the changes needed?
bugfix as shown in the previous section

### Does this PR introduce any user-facing change?

Yes, static SQL configurations with SparkSession.builder.config do not 
propagate to any existing or new SparkSession instances.

### How was this patch tested?

new ut.

Closes #28316 from yaooqinn/SPARK-31532.

Authored-by: Kent Yao 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 8424f552293677717da7411ed43e68e73aa7f0d6)
Signed-off-by: Takeshi Yamamuro 
---
 .../scala/org/apache/spark/sql/SparkSession.scala  | 28 +
 .../spark/sql/SparkSessionBuilderSuite.scala   | 49 +-
 2 files changed, 67 insertions(+), 10 deletions(-)

diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scal

[spark] branch branch-2.4 updated: [SPARK-31532][SQL] Builder should not propagate static sql configs to the existing active or default SparkSession

2020-04-24 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new a2a0c52  [SPARK-31532][SQL] Builder should not propagate static sql 
configs to the existing active or default SparkSession
a2a0c52 is described below

commit a2a0c52d7b21ef1e5f06cee6c8c83ad82f8b1b0b
Author: Kent Yao 
AuthorDate: Sat Apr 25 08:53:00 2020 +0900

[SPARK-31532][SQL] Builder should not propagate static sql configs to the 
existing active or default SparkSession

### What changes were proposed in this pull request?

SparkSessionBuilder shoud not propagate static sql configurations to the 
existing active/default SparkSession
This seems a long-standing bug.

```scala
scala> spark.sql("set spark.sql.warehouse.dir").show
+++
| key|   value|
+++
|spark.sql.warehou...|file:/Users/kenty...|
+++

scala> spark.sql("set spark.sql.warehouse.dir=2");
org.apache.spark.sql.AnalysisException: Cannot modify the value of a static 
config: spark.sql.warehouse.dir;
  at 
org.apache.spark.sql.RuntimeConfig.requireNonStaticConf(RuntimeConfig.scala:154)
  at org.apache.spark.sql.RuntimeConfig.set(RuntimeConfig.scala:42)
  at 
org.apache.spark.sql.execution.command.SetCommand.$anonfun$x$7$6(SetCommand.scala:100)
  at 
org.apache.spark.sql.execution.command.SetCommand.run(SetCommand.scala:156)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
  at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229)
  at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3644)
  at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
  at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
  at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
  at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3642)
  at org.apache.spark.sql.Dataset.(Dataset.scala:229)
  at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
  at 
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:607)
  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:602)
  ... 47 elided

scala> import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.SparkSession

scala> SparkSession.builder.config("spark.sql.warehouse.dir", "xyz").get
getClass   getOrCreate

scala> SparkSession.builder.config("spark.sql.warehouse.dir", 
"xyz").getOrCreate
20/04/23 23:49:13 WARN SparkSession$Builder: Using an existing 
SparkSession; some configuration may not take effect.
res7: org.apache.spark.sql.SparkSession = 
org.apache.spark.sql.SparkSession6403d574

scala> spark.sql("set spark.sql.warehouse.dir").show
++-+
| key|value|
++-+
|spark.sql.warehou...|  xyz|
++-+

scala>
OptionsAttachments
```

### Why are the changes needed?
bugfix as shown in the previous section

### Does this PR introduce any user-facing change?

Yes, static SQL configurations with SparkSession.builder.config do not 
propagate to any existing or new SparkSession instances.

### How was this patch tested?

new ut.

Closes #28316 from yaooqinn/SPARK-31532.

Authored-by: Kent Yao 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 8424f552293677717da7411ed43e68e73aa7f0d6)
Signed-off-by: Takeshi Yamamuro 
---
 .../scala/org/apache/spark/sql/SparkSession.scala  | 28 +
 .../spark/sql/SparkSessionBuilderSuite.scala   | 49 +-
 2 files changed, 67 insertions(+), 10 deletions(-)

diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scal

[spark] branch master updated (6a57616 -> 8424f55)

2020-04-24 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6a57616  [SPARK-31364][SQL][TESTS] Benchmark Parquet Nested Field 
Predicate Pushdown
 add 8424f55  [SPARK-31532][SQL] Builder should not propagate static sql 
configs to the existing active or default SparkSession

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/SparkSession.scala  | 28 +
 .../spark/sql/SparkSessionBuilderSuite.scala   | 49 +-
 2 files changed, 67 insertions(+), 10 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (463c544 -> b10263b)

2020-04-24 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 463c544  [SPARK-31010][SQL][DOC][FOLLOW-UP] Improve deprecated warning 
message for untyped scala udf
 add b10263b  [SPARK-30724][SQL] Support 'LIKE ANY' and 'LIKE ALL' operators

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|   1 +
 .../spark/sql/catalyst/parser/AstBuilder.scala |  33 +++--
 .../catalyst/parser/ExpressionParserSuite.scala|  17 ++-
 .../test/resources/sql-tests/inputs/like-all.sql   |  39 ++
 .../test/resources/sql-tests/inputs/like-any.sql   |  39 ++
 .../resources/sql-tests/results/like-all.sql.out   | 140 
 .../resources/sql-tests/results/like-any.sql.out   | 146 +
 7 files changed, 405 insertions(+), 10 deletions(-)
 create mode 100644 sql/core/src/test/resources/sql-tests/inputs/like-all.sql
 create mode 100644 sql/core/src/test/resources/sql-tests/inputs/like-any.sql
 create mode 100644 
sql/core/src/test/resources/sql-tests/results/like-all.sql.out
 create mode 100644 
sql/core/src/test/resources/sql-tests/results/like-any.sql.out


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (463c544 -> b10263b)

2020-04-24 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 463c544  [SPARK-31010][SQL][DOC][FOLLOW-UP] Improve deprecated warning 
message for untyped scala udf
 add b10263b  [SPARK-30724][SQL] Support 'LIKE ANY' and 'LIKE ALL' operators

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|   1 +
 .../spark/sql/catalyst/parser/AstBuilder.scala |  33 +++--
 .../catalyst/parser/ExpressionParserSuite.scala|  17 ++-
 .../test/resources/sql-tests/inputs/like-all.sql   |  39 ++
 .../test/resources/sql-tests/inputs/like-any.sql   |  39 ++
 .../resources/sql-tests/results/like-all.sql.out   | 140 
 .../resources/sql-tests/results/like-any.sql.out   | 146 +
 7 files changed, 405 insertions(+), 10 deletions(-)
 create mode 100644 sql/core/src/test/resources/sql-tests/inputs/like-all.sql
 create mode 100644 sql/core/src/test/resources/sql-tests/inputs/like-any.sql
 create mode 100644 
sql/core/src/test/resources/sql-tests/results/like-all.sql.out
 create mode 100644 
sql/core/src/test/resources/sql-tests/results/like-any.sql.out


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-30724][SQL] Support 'LIKE ANY' and 'LIKE ALL' operators

2020-04-24 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b10263b  [SPARK-30724][SQL] Support 'LIKE ANY' and 'LIKE ALL' operators
b10263b is described below

commit b10263b8e5106409467e0115968bbaf0b9141cd1
Author: Yuming Wang 
AuthorDate: Fri Apr 24 22:20:32 2020 +0900

[SPARK-30724][SQL] Support 'LIKE ANY' and 'LIKE ALL' operators

### What changes were proposed in this pull request?

`LIKE ANY/SOME` and `LIKE ALL` operators are mostly used when we are 
matching a text field with numbers of patterns. For example:

Teradata / Hive 3.0 / Snowflake:
```sql
--like any
select 'foo' LIKE ANY ('%foo%','%bar%');

--like all
select 'foo' LIKE ALL ('%foo%','%bar%');
```
PostgreSQL:
```sql
-- like any
select 'foo' LIKE ANY (array['%foo%','%bar%']);

-- like all
select 'foo' LIKE ALL (array['%foo%','%bar%']);
```

This PR add support these two operators.

More details:

https://docs.teradata.com/reader/756LNiPSFdY~4JcCCcR5Cw/4~AyrPNmDN0Xk4SALLo6aQ
https://issues.apache.org/jira/browse/HIVE-15229
https://docs.snowflake.net/manuals/sql-reference/functions/like_any.html

### Why are the changes needed?

To smoothly migrate SQLs to Spark SQL.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Unit test.

Closes #27477 from wangyum/SPARK-30724.

Authored-by: Yuming Wang 
Signed-off-by: Takeshi Yamamuro 
---
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|   1 +
 .../spark/sql/catalyst/parser/AstBuilder.scala |  33 +++--
 .../catalyst/parser/ExpressionParserSuite.scala|  17 ++-
 .../test/resources/sql-tests/inputs/like-all.sql   |  39 ++
 .../test/resources/sql-tests/inputs/like-any.sql   |  39 ++
 .../resources/sql-tests/results/like-all.sql.out   | 140 
 .../resources/sql-tests/results/like-any.sql.out   | 146 +
 7 files changed, 405 insertions(+), 10 deletions(-)

diff --git 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
index d78f584..e49bc07 100644
--- 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
+++ 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
@@ -766,6 +766,7 @@ predicate
 | NOT? kind=IN '(' expression (',' expression)* ')'
 | NOT? kind=IN '(' query ')'
 | NOT? kind=RLIKE pattern=valueExpression
+| NOT? kind=LIKE quantifier=(ANY | SOME | ALL) ('('')' | '(' expression 
(',' expression)* ')')
 | NOT? kind=LIKE pattern=valueExpression (ESCAPE escapeChar=STRING)?
 | IS NOT? kind=NULL
 | IS NOT? kind=(TRUE | FALSE | UNKNOWN)
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
index ff362e7..e51b8f3 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
@@ -1373,7 +1373,7 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
* Add a predicate to the given expression. Supported expressions are:
* - (NOT) BETWEEN
* - (NOT) IN
-   * - (NOT) LIKE
+   * - (NOT) LIKE (ANY | SOME | ALL)
* - (NOT) RLIKE
* - IS (NOT) NULL.
* - IS (NOT) (TRUE | FALSE | UNKNOWN)
@@ -1391,6 +1391,14 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   case other => Seq(other)
 }
 
+def getLikeQuantifierExprs(expressions: 
java.util.List[ExpressionContext]): Seq[Expression] = {
+  if (expressions.isEmpty) {
+throw new ParseException("Expected something between '(' and ')'.", 
ctx)
+  } else {
+expressions.asScala.map(expression).map(p => invertIfNotDefined(new 
Like(e, p)))
+  }
+}
+
 // Create the predicate.
 ctx.kind.getType match {
   case SqlBaseParser.BETWEEN =>
@@ -1403,14 +1411,21 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   case SqlBaseParser.IN =>
 invertIfNotDefined(In(e, ctx.expression.asScala.map(expression)))
   case SqlBaseParser.LIKE =>
-val escapeChar = Option(ctx.escapeChar).map(string).map { str =>
-  if (str.length != 1) {
-throw new ParseException("Invalid escape string." +
-  "Escape string must contains only one character.", ctx)
-  }
-  str.charAt(0)
-}.getOrElse('\\')
-invertIfNotDef

[spark] branch branch-3.0 updated: [SPARK-31465][SQL][DOCS][FOLLOW-UP] Document Literal in SQL Reference

2020-04-23 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 0f02997  [SPARK-31465][SQL][DOCS][FOLLOW-UP] Document Literal in SQL 
Reference
0f02997 is described below

commit 0f0299749421bc7328c6b962a9305bf460f51ddf
Author: Huaxin Gao 
AuthorDate: Thu Apr 23 15:03:20 2020 +0900

[SPARK-31465][SQL][DOCS][FOLLOW-UP] Document Literal in SQL Reference

### What changes were proposed in this pull request?
Need to address a few more comments

### Why are the changes needed?
Fix a few problems

### Does this PR introduce any user-facing change?
Yes

### How was this patch tested?
Manually build and check

Closes #28306 from huaxingao/literal-folllowup.

Authored-by: Huaxin Gao 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit f543d6a1ee76de8cae417ff480fa9c0e0ce5343d)
Signed-off-by: Takeshi Yamamuro 
---
 docs/sql-ref-literals.md | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/docs/sql-ref-literals.md b/docs/sql-ref-literals.md
index 7cf078c..0088f79 100644
--- a/docs/sql-ref-literals.md
+++ b/docs/sql-ref-literals.md
@@ -437,11 +437,11 @@ SELECT TIMESTAMP '1997-01-31 09:26:56.123' AS col;
 |1997-01-31 09:26:56.123|
 +---+
 
-SELECT TIMESTAMP '1997-01-31 09:26:56.CST' AS col;
+SELECT TIMESTAMP '1997-01-31 09:26:56.UTC+08:00' AS col;
 +--+
 |  col |
 +--+
-|1997-01-31 07:26:56.66|
+|1997-01-30 17:26:56.66|
 +--+
 
 SELECT TIMESTAMP '1997-01' AS col;
@@ -508,7 +508,7 @@ SELECT INTERVAL -2 HOUR '3' MINUTE AS col;
 |-1 hours -57 minutes|
 ++
 
-SELECT INTERVAL 'INTERVAL 1 YEAR 2 DAYS 3 HOURS';
+SELECT INTERVAL '1 YEAR 2 DAYS 3 HOURS';
 +--+
 |   col|
 +--+
@@ -523,6 +523,13 @@ SELECT INTERVAL 1 YEARS 2 MONTH 3 WEEK 4 DAYS 5 HOUR 6 
MINUTES 7 SECOND 8
 |1 years 2 months 25 days 5 hours 6 minutes 7.008009 seconds|
 +---+
 
+SELECT INTERVAL '2-3' YEAR TO MONTH AS col;
+++
+| col|
+++
+|2 years 3 months|
+++
+
 SELECT INTERVAL '20 15:40:32.9989' DAY TO SECOND AS col;
 +-+
 |  col|


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (ca90e19 -> f543d6a)

2020-04-23 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ca90e19  [SPARK-31515][SQL] Canonicalize Cast should consider the 
value of needTimeZone
 add f543d6a  [SPARK-31465][SQL][DOCS][FOLLOW-UP] Document Literal in SQL 
Reference

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-literals.md | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (03fe9ee -> ca90e19)

2020-04-22 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 03fe9ee  [SPARK-31465][SQL][DOCS] Document Literal in SQL Reference
 add ca90e19  [SPARK-31515][SQL] Canonicalize Cast should consider the 
value of needTimeZone

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/expressions/Canonicalize.scala  | 10 +-
 .../org/apache/spark/sql/catalyst/expressions/Cast.scala  |  4 +++-
 .../spark/sql/catalyst/expressions/CanonicalizeSuite.scala| 11 ++-
 3 files changed, 22 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31515][SQL] Canonicalize Cast should consider the value of needTimeZone

2020-04-22 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 2ebef75  [SPARK-31515][SQL] Canonicalize Cast should consider the 
value of needTimeZone
2ebef75 is described below

commit 2ebef75ec654bdbb01a4fa6a85225a7503de84b7
Author: Yuanjian Li 
AuthorDate: Thu Apr 23 14:32:10 2020 +0900

[SPARK-31515][SQL] Canonicalize Cast should consider the value of 
needTimeZone

### What changes were proposed in this pull request?
Override the canonicalized fields with respect to the result of 
`needsTimeZone`.

### Why are the changes needed?
The current approach breaks sematic equal of two cast expressions that 
don't relate with datetime type. If we don't need to use `timeZone` information 
casting `from` type to `to` type, then the timeZoneId should not influence the 
canonicalize result.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
New UT added.

Closes #28288 from xuanyuanking/SPARK-31515.

Authored-by: Yuanjian Li 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit ca90e1932dcdc43748297c627ec857b6ea97dff7)
Signed-off-by: Takeshi Yamamuro 
---
 .../apache/spark/sql/catalyst/expressions/Canonicalize.scala  | 10 +-
 .../org/apache/spark/sql/catalyst/expressions/Cast.scala  |  4 +++-
 .../spark/sql/catalyst/expressions/CanonicalizeSuite.scala| 11 ++-
 3 files changed, 22 insertions(+), 3 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala
index 4d218b9..a803108 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala
@@ -27,6 +27,7 @@ package org.apache.spark.sql.catalyst.expressions
  * The following rules are applied:
  *  - Names and nullability hints for [[org.apache.spark.sql.types.DataType]]s 
are stripped.
  *  - Names for [[GetStructField]] are stripped.
+ *  - TimeZoneId for [[Cast]] and [[AnsiCast]] are stripped if `needsTimeZone` 
is false.
  *  - Commutative and associative operations ([[Add]] and [[Multiply]]) have 
their children ordered
  *by `hashCode`.
  *  - [[EqualTo]] and [[EqualNullSafe]] are reordered by `hashCode`.
@@ -35,7 +36,7 @@ package org.apache.spark.sql.catalyst.expressions
  */
 object Canonicalize {
   def execute(e: Expression): Expression = {
-expressionReorder(ignoreNamesTypes(e))
+expressionReorder(ignoreTimeZone(ignoreNamesTypes(e)))
   }
 
   /** Remove names and nullability from types, and names from 
`GetStructField`. */
@@ -46,6 +47,13 @@ object Canonicalize {
 case _ => e
   }
 
+  /** Remove TimeZoneId for Cast if needsTimeZone return false. */
+  private[expressions] def ignoreTimeZone(e: Expression): Expression = e match 
{
+case c: CastBase if c.timeZoneId.nonEmpty && !c.needsTimeZone =>
+  c.withTimeZone(null)
+case _ => e
+  }
+
   /** Collects adjacent commutative operations. */
   private def gatherCommutative(
   e: Expression,
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
index 8d82956..fa615d7 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
@@ -279,7 +279,7 @@ abstract class CastBase extends UnaryExpression with 
TimeZoneAwareExpression wit
   override lazy val resolved: Boolean =
 childrenResolved && checkInputDataTypes().isSuccess && (!needsTimeZone || 
timeZoneId.isDefined)
 
-  private[this] def needsTimeZone: Boolean = 
Cast.needsTimeZone(child.dataType, dataType)
+  def needsTimeZone: Boolean = Cast.needsTimeZone(child.dataType, dataType)
 
   // [[func]] assumes the input is no longer null because eval already does 
the null check.
   @inline private[this] def buildCast[T](a: Any, func: T => Any): Any = 
func(a.asInstanceOf[T])
@@ -1708,6 +1708,7 @@ abstract class CastBase extends UnaryExpression with 
TimeZoneAwareExpression wit
   """)
 case class Cast(child: Expression, dataType: DataType, timeZoneId: 
Option[String] = None)
   extends CastBase {
+
   override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression =
 copy(timeZoneId = Option(timeZoneId))
 
@@ -1724,6 +1725,7 @@ case class Cast(child: Expression, dataType: DataType, 
timeZoneId: Option[String
  */
 case class AnsiCast(child: Expression, dataType: DataType, timeZoneId: 
Option[Stri

[spark] branch branch-3.0 updated: [SPARK-31465][SQL][DOCS] Document Literal in SQL Reference

2020-04-22 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ed3e4bd  [SPARK-31465][SQL][DOCS] Document Literal in SQL Reference
ed3e4bd is described below

commit ed3e4bd09a6a183c3ba181eea7f2d47bde7fb1db
Author: Huaxin Gao 
AuthorDate: Thu Apr 23 14:12:10 2020 +0900

[SPARK-31465][SQL][DOCS] Document Literal in SQL Reference

### What changes were proposed in this pull request?
Document Literal in SQL Reference

### Why are the changes needed?
Make SQL Reference complete

### Does this PR introduce any user-facing change?
Yes
https://user-images.githubusercontent.com/13592258/80057912-9ecb0c00-84dc-11ea-881e-1415108d674f.png;>

https://user-images.githubusercontent.com/13592258/80057917-a12d6600-84dc-11ea-8884-81f2a94644d5.png;>

https://user-images.githubusercontent.com/13592258/80057922-a4c0ed00-84dc-11ea-9857-75db50f7b054.png;>

https://user-images.githubusercontent.com/13592258/80057927-a7234700-84dc-11ea-9124-45ae1f6143fd.png;>

https://user-images.githubusercontent.com/13592258/80057932-ab4f6480-84dc-11ea-8393-cf005af13ce9.png;>

https://user-images.githubusercontent.com/13592258/80057936-ad192800-84dc-11ea-8d78-9f071a82f1df.png;>

https://user-images.githubusercontent.com/13592258/80057940-b0141880-84dc-11ea-97a7-f787cad0ee03.png;>

https://user-images.githubusercontent.com/13592258/80057945-b30f0900-84dc-11ea-985f-c070609e2329.png;>

https://user-images.githubusercontent.com/13592258/80057949-b5716300-84dc-11ea-9452-3f51137fe03d.png;>

https://user-images.githubusercontent.com/13592258/80057957-b904ea00-84dc-11ea-8b12-a6f00362aa55.png;>

https://user-images.githubusercontent.com/13592258/80057962-bacead80-84dc-11ea-94da-916b1d1c1756.png;>

### How was this patch tested?
Manually build and check

Closes #28237 from huaxingao/literal.

Authored-by: Huaxin Gao 
    Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 03fe9ee428ebb4544f0f47e861bccea43e0cb325)
    Signed-off-by: Takeshi Yamamuro 
---
 docs/_data/menu-sql.yaml |   2 +
 docs/sql-ref-literals.md | 532 +++
 2 files changed, 534 insertions(+)

diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml
index a16e114..5d8c265 100644
--- a/docs/_data/menu-sql.yaml
+++ b/docs/_data/menu-sql.yaml
@@ -78,6 +78,8 @@
   subitems:
 - text: Data Types
   url: sql-ref-datatypes.html
+- text: Literals
+  url: sql-ref-literals.html
 - text: Null Semantics
   url: sql-ref-null-semantics.html
 - text: NaN Semantics
diff --git a/docs/sql-ref-literals.md b/docs/sql-ref-literals.md
new file mode 100644
index 000..7cf078c
--- /dev/null
+++ b/docs/sql-ref-literals.md
@@ -0,0 +1,532 @@
+---
+layout: global
+title: Literals
+displayTitle: Literals
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+ 
+ http://www.apache.org/licenses/LICENSE-2.0
+ 
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+
+A literal (also known as a constant) represents a fixed data value. Spark SQL 
supports the following literals:
+
+ * [String Literal](#string-literal)
+ * [Binary Literal](#binary-literal)
+ * [Null Literal](#null-literal)
+ * [Boolean Literal](#boolean-literal)
+ * [Numeric Literal](#numeric-literal)
+ * [Datetime Literal](#datetime-literal)
+ * [Interval Literal](#interval-literal)
+
+### String Literal
+
+A string literal is used to specify a character string value.
+
+ Syntax
+
+{% highlight sql %}
+'c [ ... ]' | "c [ ... ]"
+{% endhighlight %}
+
+ Parameters
+
+
+  c
+  
+One character from the character set. Use \ to escape special 
characters (e.g., ' or \).
+  
+
+
+ Examples
+
+{% highlight sql %}
+SELECT 'Hello, World!' AS col;
++-+
+|  col|
++-+
+|Hello, World!|
++-+
+
+SELECT "SPARK SQL" AS col;
++-+
+|  col|
++-+
+|Spark SQL|
++-+
+
+SELECT 'it\'s $10.' AS col;
++-+
+|  col|
++-+
+|It's

[spark] branch master updated: [SPARK-31465][SQL][DOCS] Document Literal in SQL Reference

2020-04-22 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 03fe9ee  [SPARK-31465][SQL][DOCS] Document Literal in SQL Reference
03fe9ee is described below

commit 03fe9ee428ebb4544f0f47e861bccea43e0cb325
Author: Huaxin Gao 
AuthorDate: Thu Apr 23 14:12:10 2020 +0900

[SPARK-31465][SQL][DOCS] Document Literal in SQL Reference

### What changes were proposed in this pull request?
Document Literal in SQL Reference

### Why are the changes needed?
Make SQL Reference complete

### Does this PR introduce any user-facing change?
Yes
https://user-images.githubusercontent.com/13592258/80057912-9ecb0c00-84dc-11ea-881e-1415108d674f.png;>

https://user-images.githubusercontent.com/13592258/80057917-a12d6600-84dc-11ea-8884-81f2a94644d5.png;>

https://user-images.githubusercontent.com/13592258/80057922-a4c0ed00-84dc-11ea-9857-75db50f7b054.png;>

https://user-images.githubusercontent.com/13592258/80057927-a7234700-84dc-11ea-9124-45ae1f6143fd.png;>

https://user-images.githubusercontent.com/13592258/80057932-ab4f6480-84dc-11ea-8393-cf005af13ce9.png;>

https://user-images.githubusercontent.com/13592258/80057936-ad192800-84dc-11ea-8d78-9f071a82f1df.png;>

https://user-images.githubusercontent.com/13592258/80057940-b0141880-84dc-11ea-97a7-f787cad0ee03.png;>

https://user-images.githubusercontent.com/13592258/80057945-b30f0900-84dc-11ea-985f-c070609e2329.png;>

https://user-images.githubusercontent.com/13592258/80057949-b5716300-84dc-11ea-9452-3f51137fe03d.png;>

https://user-images.githubusercontent.com/13592258/80057957-b904ea00-84dc-11ea-8b12-a6f00362aa55.png;>

https://user-images.githubusercontent.com/13592258/80057962-bacead80-84dc-11ea-94da-916b1d1c1756.png;>

### How was this patch tested?
Manually build and check

Closes #28237 from huaxingao/literal.

Authored-by: Huaxin Gao 
    Signed-off-by: Takeshi Yamamuro 
---
 docs/_data/menu-sql.yaml |   2 +
 docs/sql-ref-literals.md | 532 +++
 2 files changed, 534 insertions(+)

diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml
index a16e114..5d8c265 100644
--- a/docs/_data/menu-sql.yaml
+++ b/docs/_data/menu-sql.yaml
@@ -78,6 +78,8 @@
   subitems:
 - text: Data Types
   url: sql-ref-datatypes.html
+- text: Literals
+  url: sql-ref-literals.html
 - text: Null Semantics
   url: sql-ref-null-semantics.html
 - text: NaN Semantics
diff --git a/docs/sql-ref-literals.md b/docs/sql-ref-literals.md
new file mode 100644
index 000..7cf078c
--- /dev/null
+++ b/docs/sql-ref-literals.md
@@ -0,0 +1,532 @@
+---
+layout: global
+title: Literals
+displayTitle: Literals
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+ 
+ http://www.apache.org/licenses/LICENSE-2.0
+ 
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+
+A literal (also known as a constant) represents a fixed data value. Spark SQL 
supports the following literals:
+
+ * [String Literal](#string-literal)
+ * [Binary Literal](#binary-literal)
+ * [Null Literal](#null-literal)
+ * [Boolean Literal](#boolean-literal)
+ * [Numeric Literal](#numeric-literal)
+ * [Datetime Literal](#datetime-literal)
+ * [Interval Literal](#interval-literal)
+
+### String Literal
+
+A string literal is used to specify a character string value.
+
+ Syntax
+
+{% highlight sql %}
+'c [ ... ]' | "c [ ... ]"
+{% endhighlight %}
+
+ Parameters
+
+
+  c
+  
+One character from the character set. Use \ to escape special 
characters (e.g., ' or \).
+  
+
+
+ Examples
+
+{% highlight sql %}
+SELECT 'Hello, World!' AS col;
++-+
+|  col|
++-+
+|Hello, World!|
++-+
+
+SELECT "SPARK SQL" AS col;
++-+
+|  col|
++-+
+|Spark SQL|
++-+
+
+SELECT 'it\'s $10.' AS col;
++-+
+|  col|
++-+
+|It's $10.|
++-+
+{% endhighlight %}
+
+### Binary Literal
+
+A binary literal is used to specify a byte se

[spark] branch master updated: [SPARK-31477][SQL] Dump codegen and compile time in BenchmarkQueryTest

2020-04-18 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6bf5f01  [SPARK-31477][SQL] Dump codegen and compile time in 
BenchmarkQueryTest
6bf5f01 is described below

commit 6bf5f01a4a8b7708ce563e0a0e9a49e8ff89c71e
Author: gatorsmile 
AuthorDate: Sat Apr 18 20:59:45 2020 +0900

[SPARK-31477][SQL] Dump codegen and compile time in BenchmarkQueryTest

### What changes were proposed in this pull request?
This PR is to dump the codegen and compilation time for benchmark query 
tests.

### Why are the changes needed?
Measure the codegen and compilation time costs in TPC-DS queries

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Manual test in my local laptop:
```
23:13:12.845 WARN org.apache.spark.sql.TPCDSQuerySuite:
=== Metrics of Whole-stage Codegen ===
Total code generation time: 21.275102261 seconds
Total compilation time: 12.223771828 seconds
```

Closes #28252 from gatorsmile/testMastercode.

Authored-by: gatorsmile 
Signed-off-by: Takeshi Yamamuro 
---
 .../sql/catalyst/expressions/codegen/CodeGenerator.scala|  2 +-
 .../apache/spark/sql/execution/WholeStageCodegenExec.scala  |  2 +-
 .../scala/org/apache/spark/sql/BenchmarkQueryTest.scala | 13 +
 .../test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala | 13 +++--
 4 files changed, 22 insertions(+), 8 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
index 3042a27..1cc7836 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
@@ -1324,7 +1324,7 @@ object CodeGenerator extends Logging {
 
   // Reset compile time.
   // Visible for testing
-  def resetCompileTime: Unit = _compileTime.reset()
+  def resetCompileTime(): Unit = _compileTime.reset()
 
   /**
* Compile the Java source code into a Java class, using Janino.
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
index 9f6e4fc..0244542 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
@@ -586,7 +586,7 @@ object WholeStageCodegenExec {
 
   // Reset generation time of Java source code.
   // Visible for testing
-  def resetCodeGenTime: Unit = _codeGenTime.set(0L)
+  def resetCodeGenTime(): Unit = _codeGenTime.set(0L)
 }
 
 /**
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/BenchmarkQueryTest.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/BenchmarkQueryTest.scala
index 07afd41..2c3b37a 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/BenchmarkQueryTest.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/BenchmarkQueryTest.scala
@@ -20,6 +20,7 @@ package org.apache.spark.sql
 import org.apache.spark.internal.config.Tests.IS_TESTING
 import org.apache.spark.sql.catalyst.expressions.codegen.{ByteCodeStats, 
CodeFormatter, CodeGenerator}
 import org.apache.spark.sql.catalyst.rules.RuleExecutor
+import org.apache.spark.sql.catalyst.util.DateTimeConstants.NANOS_PER_SECOND
 import org.apache.spark.sql.execution.{SparkPlan, WholeStageCodegenExec}
 import org.apache.spark.sql.test.SharedSparkSession
 import org.apache.spark.util.Utils
@@ -36,7 +37,17 @@ abstract class BenchmarkQueryTest extends QueryTest with 
SharedSparkSession {
   protected override def afterAll(): Unit = {
 try {
   // For debugging dump some statistics about how much time was spent in 
various optimizer rules
+  // code generation, and compilation.
   logWarning(RuleExecutor.dumpTimeSpent())
+  val codeGenTime = WholeStageCodegenExec.codeGenTime.toDouble / 
NANOS_PER_SECOND
+  val compileTime = CodeGenerator.compileTime.toDouble / NANOS_PER_SECOND
+  val codegenInfo =
+s"""
+   |=== Metrics of Whole-stage Codegen ===
+   |Total code generation time: $codeGenTime seconds
+   |Total compile time: $compileTime seconds
+ """.stripMargin
+  logWarning(codegenInfo)
   spark.sessionState.catalog.reset()
 } finally {
   super.afterAll()
@@ -46,6 +57,8 @@ abstract class BenchmarkQueryTest extends QueryTest with 
SharedSparkSession {
   override def beforeAll(): Unit = {
 super.beforeAll()
 RuleExecutor.resetMetrics()
+

[spark] branch branch-3.0 updated: [SPARK-31390][SQL][DOCS] Document Window Function in SQL Syntax Section

2020-04-17 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 1139e9b  [SPARK-31390][SQL][DOCS] Document Window Function in SQL 
Syntax Section
1139e9b is described below

commit 1139e9b50150e3a99a9c8df0ed57d3fd2b391788
Author: Huaxin Gao 
AuthorDate: Sat Apr 18 09:31:52 2020 +0900

[SPARK-31390][SQL][DOCS] Document Window Function in SQL Syntax Section

### What changes were proposed in this pull request?
Document Window Function in SQL syntax

### Why are the changes needed?
Make SQL Reference complete

### Does this PR introduce any user-facing change?
Yes

https://user-images.githubusercontent.com/13592258/79531509-7bf5af00-8027-11ea-8291-a91b2e97a1b5.png;>

https://user-images.githubusercontent.com/13592258/79531514-7e580900-8027-11ea-8761-4c5a888c476f.png;>

https://user-images.githubusercontent.com/13592258/79531518-82842680-8027-11ea-876f-6375aa5b5ead.png;>

https://user-images.githubusercontent.com/13592258/79531521-844dea00-8027-11ea-8948-712f054d42ee.png;>

https://user-images.githubusercontent.com/13592258/79531528-8748da80-8027-11ea-9dae-a465286982ac.png;>

### How was this patch tested?
Manually build and check

Closes #28220 from huaxingao/sql-win-fun.

Authored-by: Huaxin Gao 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 142f43629c42ad750d9b506283191aa830d95c08)
Signed-off-by: Takeshi Yamamuro 
---
 docs/_data/menu-sql.yaml  |   2 +
 docs/sql-ref-syntax-qry-window.md | 190 +-
 2 files changed, 189 insertions(+), 3 deletions(-)

diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml
index 7827a0f..5042c2588 100644
--- a/docs/_data/menu-sql.yaml
+++ b/docs/_data/menu-sql.yaml
@@ -168,6 +168,8 @@
   url: sql-ref-syntax-qry-select-inline-table.html
 - text: Common Table Expression
   url: sql-ref-syntax-qry-select-cte.html
+- text: Window Function
+  url: sql-ref-syntax-qry-window.html
 - text: EXPLAIN
   url: sql-ref-syntax-qry-explain.html
 - text: Auxiliary Statements
diff --git a/docs/sql-ref-syntax-qry-window.md 
b/docs/sql-ref-syntax-qry-window.md
index 767f477..4ec1af7 100644
--- a/docs/sql-ref-syntax-qry-window.md
+++ b/docs/sql-ref-syntax-qry-window.md
@@ -1,7 +1,7 @@
 ---
 layout: global
-title: Windowing Analytic Functions
-displayTitle: Windowing Analytic Functions
+title: Window Functions
+displayTitle: Window Functions
 license: |
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
@@ -19,4 +19,188 @@ license: |
   limitations under the License.
 ---
 
-**This page is under construction**
+### Description
+
+Window functions operate on a group of rows, referred to as a window, and 
calculate a return value for each row based on the group of rows. Window 
functions are useful for processing tasks such as calculating a moving average, 
computing a cumulative statistic, or accessing the value of rows given the 
relative position of the current row.
+
+### Syntax
+
+{% highlight sql %}
+window_function OVER
+( [  { PARTITION | DISTRIBUTE } BY partition_col_name = partition_col_val ( [ 
, ... ] ) ]
+  { ORDER | SORT } BY expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , 
... ]
+  [ window_frame ] )
+{% endhighlight %}
+
+### Parameters
+
+
+  window_function
+  
+
+  Ranking Functions
+  
+  Syntax:
+
+  RANK | DENSE_RANK | PERCENT_RANK | NTILE | ROW_NUMBER
+
+
+
+  Analytic Functions
+  
+  Syntax:
+
+  CUME_DIST | LAG | LEAD
+
+
+
+  Aggregate Functions
+  
+  Syntax:
+
+  MAX | MIN | COUNT | SUM | AVG | ...
+
+
+Please refer to the Built-in Functions document 
for a complete list of Spark aggregate functions.
+
+  
+
+
+  window_frame
+  
+Specifies which row to start the window on and where to end it.
+Syntax:
+  { RANGE | ROWS } { frame_start | BETWEEN frame_start AND frame_end 
}
+  If frame_end is omitted it defaults to CURRENT ROW.
+  
+  frame_start and frame_end have the following 
syntax
+  Syntax:
+
+  UNBOUNDED PRECEDING | offset PRECEDING | CURRENT ROW | offset 
FOLLOWING | UNBOUNDED FOLLOWING
+
+offset:specifies the offset from the 
position of the current row.
+  
+  
+
+
+### Examples
+
+{% highlight sql %}
+CREATE TABLE employees (name STRING, dept STRING, salary INT, age INT);
+
+INSERT INTO employees VALUES ("Lisa", "Sales", 1,

[spark] branch branch-3.0 updated: [SPARK-31390][SQL][DOCS] Document Window Function in SQL Syntax Section

2020-04-17 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 1139e9b  [SPARK-31390][SQL][DOCS] Document Window Function in SQL 
Syntax Section
1139e9b is described below

commit 1139e9b50150e3a99a9c8df0ed57d3fd2b391788
Author: Huaxin Gao 
AuthorDate: Sat Apr 18 09:31:52 2020 +0900

[SPARK-31390][SQL][DOCS] Document Window Function in SQL Syntax Section

### What changes were proposed in this pull request?
Document Window Function in SQL syntax

### Why are the changes needed?
Make SQL Reference complete

### Does this PR introduce any user-facing change?
Yes

https://user-images.githubusercontent.com/13592258/79531509-7bf5af00-8027-11ea-8291-a91b2e97a1b5.png;>

https://user-images.githubusercontent.com/13592258/79531514-7e580900-8027-11ea-8761-4c5a888c476f.png;>

https://user-images.githubusercontent.com/13592258/79531518-82842680-8027-11ea-876f-6375aa5b5ead.png;>

https://user-images.githubusercontent.com/13592258/79531521-844dea00-8027-11ea-8948-712f054d42ee.png;>

https://user-images.githubusercontent.com/13592258/79531528-8748da80-8027-11ea-9dae-a465286982ac.png;>

### How was this patch tested?
Manually build and check

Closes #28220 from huaxingao/sql-win-fun.

Authored-by: Huaxin Gao 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 142f43629c42ad750d9b506283191aa830d95c08)
Signed-off-by: Takeshi Yamamuro 
---
 docs/_data/menu-sql.yaml  |   2 +
 docs/sql-ref-syntax-qry-window.md | 190 +-
 2 files changed, 189 insertions(+), 3 deletions(-)

diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml
index 7827a0f..5042c2588 100644
--- a/docs/_data/menu-sql.yaml
+++ b/docs/_data/menu-sql.yaml
@@ -168,6 +168,8 @@
   url: sql-ref-syntax-qry-select-inline-table.html
 - text: Common Table Expression
   url: sql-ref-syntax-qry-select-cte.html
+- text: Window Function
+  url: sql-ref-syntax-qry-window.html
 - text: EXPLAIN
   url: sql-ref-syntax-qry-explain.html
 - text: Auxiliary Statements
diff --git a/docs/sql-ref-syntax-qry-window.md 
b/docs/sql-ref-syntax-qry-window.md
index 767f477..4ec1af7 100644
--- a/docs/sql-ref-syntax-qry-window.md
+++ b/docs/sql-ref-syntax-qry-window.md
@@ -1,7 +1,7 @@
 ---
 layout: global
-title: Windowing Analytic Functions
-displayTitle: Windowing Analytic Functions
+title: Window Functions
+displayTitle: Window Functions
 license: |
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
@@ -19,4 +19,188 @@ license: |
   limitations under the License.
 ---
 
-**This page is under construction**
+### Description
+
+Window functions operate on a group of rows, referred to as a window, and 
calculate a return value for each row based on the group of rows. Window 
functions are useful for processing tasks such as calculating a moving average, 
computing a cumulative statistic, or accessing the value of rows given the 
relative position of the current row.
+
+### Syntax
+
+{% highlight sql %}
+window_function OVER
+( [  { PARTITION | DISTRIBUTE } BY partition_col_name = partition_col_val ( [ 
, ... ] ) ]
+  { ORDER | SORT } BY expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , 
... ]
+  [ window_frame ] )
+{% endhighlight %}
+
+### Parameters
+
+
+  window_function
+  
+
+  Ranking Functions
+  
+  Syntax:
+
+  RANK | DENSE_RANK | PERCENT_RANK | NTILE | ROW_NUMBER
+
+
+
+  Analytic Functions
+  
+  Syntax:
+
+  CUME_DIST | LAG | LEAD
+
+
+
+  Aggregate Functions
+  
+  Syntax:
+
+  MAX | MIN | COUNT | SUM | AVG | ...
+
+
+Please refer to the Built-in Functions document 
for a complete list of Spark aggregate functions.
+
+  
+
+
+  window_frame
+  
+Specifies which row to start the window on and where to end it.
+Syntax:
+  { RANGE | ROWS } { frame_start | BETWEEN frame_start AND frame_end 
}
+  If frame_end is omitted it defaults to CURRENT ROW.
+  
+  frame_start and frame_end have the following 
syntax
+  Syntax:
+
+  UNBOUNDED PRECEDING | offset PRECEDING | CURRENT ROW | offset 
FOLLOWING | UNBOUNDED FOLLOWING
+
+offset:specifies the offset from the 
position of the current row.
+  
+  
+
+
+### Examples
+
+{% highlight sql %}
+CREATE TABLE employees (name STRING, dept STRING, salary INT, age INT);
+
+INSERT INTO employees VALUES ("Lisa", "Sales", 1,

[spark] branch master updated (db7b865 -> 142f436)

2020-04-17 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from db7b865  [SPARK-31253][SQL][FOLLOW-UP] simplify the code of 
calculating size metrics of AQE shuffle
 add 142f436  [SPARK-31390][SQL][DOCS] Document Window Function in SQL 
Syntax Section

No new revisions were added by this update.

Summary of changes:
 docs/_data/menu-sql.yaml  |   2 +
 docs/sql-ref-syntax-qry-window.md | 190 +-
 2 files changed, 189 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (db7b865 -> 142f436)

2020-04-17 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from db7b865  [SPARK-31253][SQL][FOLLOW-UP] simplify the code of 
calculating size metrics of AQE shuffle
 add 142f436  [SPARK-31390][SQL][DOCS] Document Window Function in SQL 
Syntax Section

No new revisions were added by this update.

Summary of changes:
 docs/_data/menu-sql.yaml  |   2 +
 docs/sql-ref-syntax-qry-window.md | 190 +-
 2 files changed, 189 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-30564][SQL] Revert Block.length and use comment placeholders in HashAggregateExec

2020-04-16 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 020f3a3  [SPARK-30564][SQL] Revert Block.length and use comment 
placeholders in HashAggregateExec
020f3a3 is described below

commit 020f3a33dd711d05337bb42d5f65708a4aec2daa
Author: Peter Toth 
AuthorDate: Thu Apr 16 17:52:22 2020 +0900

[SPARK-30564][SQL] Revert Block.length and use comment placeholders in 
HashAggregateExec

### What changes were proposed in this pull request?
SPARK-21870 (cb0cddf#diff-06dc5de6163687b7810aa76e7e152a76R146-R149) caused 
significant performance regression in cases where the source code size is 
fairly large as `HashAggregateExec` uses `Block.length` to decide on splitting 
the code. The change in `length` makes sense as the comment and extra new lines 
shouldn't be taken into account when deciding on splitting, but the regular 
expression based approach is very slow and adds a big relative overhead to 
cases where the execution is  [...]
This PR:
- restores `Block.length` to its original form
- places comments in `HashAggragateExec` with 
`CodegenContext.registerComment` so as to appear only when comments are enabled 
(`spark.sql.codegen.comments=true`)

Before this PR:
```
deeply nested struct field r/w:   Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative


250 deep x 400 rows (read in-mem)  1137   1143  
 8  0.1   11368.3   0.0X
```

After this PR:
```
deeply nested struct field r/w:   Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative


250 deep x 400 rows (read in-mem)   167180  
 7  0.61674.3   0.1X
```
### Why are the changes needed?
To fix performance regression.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Existing UTs.

Closes #28083 from peter-toth/SPARK-30564-use-comment-placeholders.

Authored-by: Peter Toth 
Signed-off-by: Takeshi Yamamuro 

(cherry picked from commit 7ad6ba36f28b7a5ca548950dec6afcd61e5d68b9)

Signed-off-by: Takeshi Yamamuro 
---
 .../spark/sql/catalyst/expressions/codegen/javaCode.scala  |  8 
 .../spark/sql/execution/aggregate/HashAggregateExec.scala  | 14 +++---
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala
index dff2589..1c59c3c 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala
@@ -143,10 +143,10 @@ trait Block extends TreeNode[Block] with JavaCode {
 case _ => code.trim
   }
 
-  def length: Int = {
-// Returns a code length without comments
-CodeFormatter.stripExtraNewLinesAndComments(toString).length
-  }
+  // We could remove comments, extra whitespaces and newlines when calculating 
length as it is used
+  // only for codegen method splitting, but SPARK-30564 showed that this is a 
performance critical
+  // function so we decided not to do so.
+  def length: Int = toString.length
 
   def isEmpty: Boolean = toString.isEmpty
 
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
index 7a26fd7..87a4081 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
@@ -367,10 +367,10 @@ case class HashAggregateExec(
  """.stripMargin
   }
   code"""
- |// do aggregate for ${aggNames(i)}
- |// evaluate aggregate function
+ |${ctx.registerComment(s"do aggregate for ${aggNames(i)}")}
+ |${ctx.registerComment("evaluate aggregate function")}
  |${evaluateVariables(bufferEvalsForOneFunc)}
- |// update aggregation buffers
+ |${ctx.registerComment("update aggregation buffers")}
  |${updates.mkString("\n").trim}
""".stripMargin
 }
@@ -975,9 +975,

[spark] branch branch-3.0 updated: [SPARK-30564][SQL] Revert Block.length and use comment placeholders in HashAggregateExec

2020-04-16 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 020f3a3  [SPARK-30564][SQL] Revert Block.length and use comment 
placeholders in HashAggregateExec
020f3a3 is described below

commit 020f3a33dd711d05337bb42d5f65708a4aec2daa
Author: Peter Toth 
AuthorDate: Thu Apr 16 17:52:22 2020 +0900

[SPARK-30564][SQL] Revert Block.length and use comment placeholders in 
HashAggregateExec

### What changes were proposed in this pull request?
SPARK-21870 (cb0cddf#diff-06dc5de6163687b7810aa76e7e152a76R146-R149) caused 
significant performance regression in cases where the source code size is 
fairly large as `HashAggregateExec` uses `Block.length` to decide on splitting 
the code. The change in `length` makes sense as the comment and extra new lines 
shouldn't be taken into account when deciding on splitting, but the regular 
expression based approach is very slow and adds a big relative overhead to 
cases where the execution is  [...]
This PR:
- restores `Block.length` to its original form
- places comments in `HashAggragateExec` with 
`CodegenContext.registerComment` so as to appear only when comments are enabled 
(`spark.sql.codegen.comments=true`)

Before this PR:
```
deeply nested struct field r/w:   Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative


250 deep x 400 rows (read in-mem)  1137   1143  
 8  0.1   11368.3   0.0X
```

After this PR:
```
deeply nested struct field r/w:   Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative


250 deep x 400 rows (read in-mem)   167180  
 7  0.61674.3   0.1X
```
### Why are the changes needed?
To fix performance regression.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Existing UTs.

Closes #28083 from peter-toth/SPARK-30564-use-comment-placeholders.

Authored-by: Peter Toth 
Signed-off-by: Takeshi Yamamuro 

(cherry picked from commit 7ad6ba36f28b7a5ca548950dec6afcd61e5d68b9)

Signed-off-by: Takeshi Yamamuro 
---
 .../spark/sql/catalyst/expressions/codegen/javaCode.scala  |  8 
 .../spark/sql/execution/aggregate/HashAggregateExec.scala  | 14 +++---
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala
index dff2589..1c59c3c 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala
@@ -143,10 +143,10 @@ trait Block extends TreeNode[Block] with JavaCode {
 case _ => code.trim
   }
 
-  def length: Int = {
-// Returns a code length without comments
-CodeFormatter.stripExtraNewLinesAndComments(toString).length
-  }
+  // We could remove comments, extra whitespaces and newlines when calculating 
length as it is used
+  // only for codegen method splitting, but SPARK-30564 showed that this is a 
performance critical
+  // function so we decided not to do so.
+  def length: Int = toString.length
 
   def isEmpty: Boolean = toString.isEmpty
 
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
index 7a26fd7..87a4081 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
@@ -367,10 +367,10 @@ case class HashAggregateExec(
  """.stripMargin
   }
   code"""
- |// do aggregate for ${aggNames(i)}
- |// evaluate aggregate function
+ |${ctx.registerComment(s"do aggregate for ${aggNames(i)}")}
+ |${ctx.registerComment("evaluate aggregate function")}
  |${evaluateVariables(bufferEvalsForOneFunc)}
- |// update aggregation buffers
+ |${ctx.registerComment("update aggregation buffers")}
  |${updates.mkString("\n").trim}
""".stripMargin
 }
@@ -975,9 +975,

[spark] branch master updated (c76c31e -> 7ad6ba3)

2020-04-16 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c76c31e  [SPARK-31455][SQL] Fix rebasing of not-existed timestamps
 add 7ad6ba3  [SPARK-30564][SQL] Revert Block.length and use comment 
placeholders in HashAggregateExec

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/expressions/codegen/javaCode.scala  |  8 
 .../spark/sql/execution/aggregate/HashAggregateExec.scala  | 14 +++---
 2 files changed, 11 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (c76c31e -> 7ad6ba3)

2020-04-16 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c76c31e  [SPARK-31455][SQL] Fix rebasing of not-existed timestamps
 add 7ad6ba3  [SPARK-30564][SQL] Revert Block.length and use comment 
placeholders in HashAggregateExec

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/expressions/codegen/javaCode.scala  |  8 
 .../spark/sql/execution/aggregate/HashAggregateExec.scala  | 14 +++---
 2 files changed, 11 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31428][SQL][DOCS] Document Common Table Expression in SQL Reference

2020-04-15 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 4476c85  [SPARK-31428][SQL][DOCS] Document Common Table Expression in 
SQL Reference
4476c85 is described below

commit 4476c85775d231c8bb26399284c0baf4292bec7c
Author: Huaxin Gao 
AuthorDate: Thu Apr 16 08:34:26 2020 +0900

[SPARK-31428][SQL][DOCS] Document Common Table Expression in SQL Reference

### What changes were proposed in this pull request?
Document Common Table Expression in SQL Reference

### Why are the changes needed?
Make SQL Reference complete

### Does this PR introduce any user-facing change?
Yes
https://user-images.githubusercontent.com/13592258/79100257-f61def00-7d1a-11ea-8402-17017059232e.png;>

https://user-images.githubusercontent.com/13592258/79100260-f7e7b280-7d1a-11ea-9408-058c0851f0b6.png;>

https://user-images.githubusercontent.com/13592258/79100262-fa4a0c80-7d1a-11ea-8862-eb1d8960296b.png;>

Also link to Select page

https://user-images.githubusercontent.com/13592258/79082246-217fea00-7cd9-11ea-8d96-1a69769d1e19.png;>

### How was this patch tested?
Manually build and check

Closes #28196 from huaxingao/cte.

Authored-by: Huaxin Gao 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 92c1b246174948d0c1f4d0833e1ceac265b17be7)
Signed-off-by: Takeshi Yamamuro 
---
 docs/_data/menu-sql.yaml  |   2 +
 docs/sql-ref-syntax-qry-select-cte.md | 109 +-
 docs/sql-ref-syntax-qry-select.md |   3 +-
 3 files changed, 112 insertions(+), 2 deletions(-)

diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml
index badb98d..7827a0f 100644
--- a/docs/_data/menu-sql.yaml
+++ b/docs/_data/menu-sql.yaml
@@ -166,6 +166,8 @@
   url: sql-ref-syntax-qry-select-tvf.html
 - text: Inline Table
   url: sql-ref-syntax-qry-select-inline-table.html
+- text: Common Table Expression
+  url: sql-ref-syntax-qry-select-cte.html
 - text: EXPLAIN
   url: sql-ref-syntax-qry-explain.html
 - text: Auxiliary Statements
diff --git a/docs/sql-ref-syntax-qry-select-cte.md 
b/docs/sql-ref-syntax-qry-select-cte.md
index 2bd7748..2146f8e 100644
--- a/docs/sql-ref-syntax-qry-select-cte.md
+++ b/docs/sql-ref-syntax-qry-select-cte.md
@@ -19,4 +19,111 @@ license: |
   limitations under the License.
 ---
 
-**This page is under construction**
+### Description
+
+A common table expression (CTE) defines a temporary result set that a user can 
reference possibly multiple times within the scope of a SQL statement. A CTE is 
used mainly in a SELECT statement.
+
+### Syntax
+
+{% highlight sql %}
+WITH common_table_expression [ , ... ]
+{% endhighlight %}
+
+While `common_table_expression` is defined as
+{% highlight sql %}
+expression_name [ ( column_name [ , ... ] ) ] [ AS ] ( [ 
common_table_expression ] query )
+{% endhighlight %}
+
+### Parameters
+
+
+  expression_name
+  
+Specifies a name for the common table expression.
+  
+
+
+  query
+  
+A SELECT statement.
+  
+
+
+### Examples
+
+{% highlight sql %}
+-- CTE with multiple column aliases
+WITH t(x, y) AS (SELECT 1, 2)
+SELECT * FROM t WHERE x = 1 AND y = 2;
+  +---+---+
+  |  x|  y|
+  +---+---+
+  |  1|  2|
+  +---+---+
+
+-- CTE in CTE definition
+WITH t as (
+WITH t2 AS (SELECT 1)
+SELECT * FROM t2
+)
+SELECT * FROM t;
+  +---+
+  |  1|
+  +---+
+  |  1|
+  +---+
+
+-- CTE in subquery
+SELECT max(c) FROM (
+WITH t(c) AS (SELECT 1)
+SELECT * FROM t
+);
+  +--+
+  |max(c)|
+  +--+
+  | 1|
+  +--+
+
+-- CTE in subquery expression
+SELECT (
+WITH t AS (SELECT 1)
+SELECT * FROM t
+);
+  ++
+  |scalarsubquery()|
+  ++
+  |   1|
+  ++
+
+-- CTE in CREATE VIEW statement
+CREATE VIEW v AS
+WITH t(a, b, c, d) AS (SELECT 1, 2, 3, 4)
+SELECT * FROM t;
+SELECT * FROM v;
+  +---+---+---+---+
+  |  a|  b|  c|  d|
+  +---+---+---+---+
+  |  1|  2|  3|  4|
+  +---+---+---+---+
+
+-- If name conflict is detected in nested CTE, then AnalysisException is 
thrown by default.
+-- SET spark.sql.legacy.ctePrecedencePolicy = CORRECTED (which is recommended),
+-- inner CTE definitions take precedence over outer definitions.
+SET spark.sql.legacy.ctePrecedencePolicy = CORRECTED;
+WITH
+t AS (SELECT 1),
+t2 AS (
+WITH t AS (SELECT 2)
+SELECT * FROM t
+)
+SELECT * FROM t2;
+  +---+
+  |  2|
+  +---+
+  |  2|
+  +---+
+{% endhighlight %}
+
+### Related Statements
+
+ * [SELECT](sql-ref-syntax-qry-select.html)
diff --git a/docs/sql-ref-syntax-qry-select.md 
b/docs/sql-ref-syntax-qry-select.md
index 94f6

[spark] branch master updated: [SPARK-31428][SQL][DOCS] Document Common Table Expression in SQL Reference

2020-04-15 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 92c1b24  [SPARK-31428][SQL][DOCS] Document Common Table Expression in 
SQL Reference
92c1b24 is described below

commit 92c1b246174948d0c1f4d0833e1ceac265b17be7
Author: Huaxin Gao 
AuthorDate: Thu Apr 16 08:34:26 2020 +0900

[SPARK-31428][SQL][DOCS] Document Common Table Expression in SQL Reference

### What changes were proposed in this pull request?
Document Common Table Expression in SQL Reference

### Why are the changes needed?
Make SQL Reference complete

### Does this PR introduce any user-facing change?
Yes
https://user-images.githubusercontent.com/13592258/79100257-f61def00-7d1a-11ea-8402-17017059232e.png;>

https://user-images.githubusercontent.com/13592258/79100260-f7e7b280-7d1a-11ea-9408-058c0851f0b6.png;>

https://user-images.githubusercontent.com/13592258/79100262-fa4a0c80-7d1a-11ea-8862-eb1d8960296b.png;>

Also link to Select page

https://user-images.githubusercontent.com/13592258/79082246-217fea00-7cd9-11ea-8d96-1a69769d1e19.png;>

### How was this patch tested?
Manually build and check

Closes #28196 from huaxingao/cte.

Authored-by: Huaxin Gao 
Signed-off-by: Takeshi Yamamuro 
---
 docs/_data/menu-sql.yaml  |   2 +
 docs/sql-ref-syntax-qry-select-cte.md | 109 +-
 docs/sql-ref-syntax-qry-select.md |   3 +-
 3 files changed, 112 insertions(+), 2 deletions(-)

diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml
index badb98d..7827a0f 100644
--- a/docs/_data/menu-sql.yaml
+++ b/docs/_data/menu-sql.yaml
@@ -166,6 +166,8 @@
   url: sql-ref-syntax-qry-select-tvf.html
 - text: Inline Table
   url: sql-ref-syntax-qry-select-inline-table.html
+- text: Common Table Expression
+  url: sql-ref-syntax-qry-select-cte.html
 - text: EXPLAIN
   url: sql-ref-syntax-qry-explain.html
 - text: Auxiliary Statements
diff --git a/docs/sql-ref-syntax-qry-select-cte.md 
b/docs/sql-ref-syntax-qry-select-cte.md
index 2bd7748..2146f8e 100644
--- a/docs/sql-ref-syntax-qry-select-cte.md
+++ b/docs/sql-ref-syntax-qry-select-cte.md
@@ -19,4 +19,111 @@ license: |
   limitations under the License.
 ---
 
-**This page is under construction**
+### Description
+
+A common table expression (CTE) defines a temporary result set that a user can 
reference possibly multiple times within the scope of a SQL statement. A CTE is 
used mainly in a SELECT statement.
+
+### Syntax
+
+{% highlight sql %}
+WITH common_table_expression [ , ... ]
+{% endhighlight %}
+
+While `common_table_expression` is defined as
+{% highlight sql %}
+expression_name [ ( column_name [ , ... ] ) ] [ AS ] ( [ 
common_table_expression ] query )
+{% endhighlight %}
+
+### Parameters
+
+
+  expression_name
+  
+Specifies a name for the common table expression.
+  
+
+
+  query
+  
+A SELECT statement.
+  
+
+
+### Examples
+
+{% highlight sql %}
+-- CTE with multiple column aliases
+WITH t(x, y) AS (SELECT 1, 2)
+SELECT * FROM t WHERE x = 1 AND y = 2;
+  +---+---+
+  |  x|  y|
+  +---+---+
+  |  1|  2|
+  +---+---+
+
+-- CTE in CTE definition
+WITH t as (
+WITH t2 AS (SELECT 1)
+SELECT * FROM t2
+)
+SELECT * FROM t;
+  +---+
+  |  1|
+  +---+
+  |  1|
+  +---+
+
+-- CTE in subquery
+SELECT max(c) FROM (
+WITH t(c) AS (SELECT 1)
+SELECT * FROM t
+);
+  +--+
+  |max(c)|
+  +--+
+  | 1|
+  +--+
+
+-- CTE in subquery expression
+SELECT (
+WITH t AS (SELECT 1)
+SELECT * FROM t
+);
+  ++
+  |scalarsubquery()|
+  ++
+  |   1|
+  ++
+
+-- CTE in CREATE VIEW statement
+CREATE VIEW v AS
+WITH t(a, b, c, d) AS (SELECT 1, 2, 3, 4)
+SELECT * FROM t;
+SELECT * FROM v;
+  +---+---+---+---+
+  |  a|  b|  c|  d|
+  +---+---+---+---+
+  |  1|  2|  3|  4|
+  +---+---+---+---+
+
+-- If name conflict is detected in nested CTE, then AnalysisException is 
thrown by default.
+-- SET spark.sql.legacy.ctePrecedencePolicy = CORRECTED (which is recommended),
+-- inner CTE definitions take precedence over outer definitions.
+SET spark.sql.legacy.ctePrecedencePolicy = CORRECTED;
+WITH
+t AS (SELECT 1),
+t2 AS (
+WITH t AS (SELECT 2)
+SELECT * FROM t
+)
+SELECT * FROM t2;
+  +---+
+  |  2|
+  +---+
+  |  2|
+  +---+
+{% endhighlight %}
+
+### Related Statements
+
+ * [SELECT](sql-ref-syntax-qry-select.html)
diff --git a/docs/sql-ref-syntax-qry-select.md 
b/docs/sql-ref-syntax-qry-select.md
index 94f69d4..bc2cc02 100644
--- a/docs/sql-ref-syntax-qry-select.md
+++ b/docs/sql-ref-syntax-qry-select.md
@@ -53,7 +53

[spark] branch master updated: [SPARK-25154][SQL] Support NOT IN sub-queries inside nested OR conditions

2020-04-10 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f0e2fc3  [SPARK-25154][SQL] Support NOT IN sub-queries inside nested 
OR conditions
f0e2fc3 is described below

commit f0e2fc37d1dc2a85fd08c87add5106bb51305182
Author: Dilip Biswal 
AuthorDate: Sat Apr 11 08:28:11 2020 +0900

[SPARK-25154][SQL] Support NOT IN sub-queries inside nested OR conditions

### What changes were proposed in this pull request?

Currently NOT IN subqueries (predicated null aware subquery) are not 
allowed inside OR expressions. We currently catch this condition in 
checkAnalysis and throw an error.

This PR enhances the subquery rewrite to support this type of queries.

Query
```SQL
SELECT * FROM s1 WHERE a > 5 or b NOT IN (SELECT c FROM s2);
```
Optimized Plan
```SQL
== Optimized Logical Plan ==
Project [a#3, b#4]
+- Filter ((a#3 > 5) || NOT exists#7)
   +- Join ExistenceJoin(exists#7), ((b#4 = c#5) || isnull((b#4 = c#5)))
  :- HiveTableRelation `default`.`s1`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#3, b#4]
  +- Project [c#5]
 +- HiveTableRelation `default`.`s2`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c#5, d#6]
```
This is rework from #22141.
The original author of this PR is dilipbiswal.

Closes #22141

### Why are the changes needed?

For better usability.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Added new tests in SQLQueryTestSuite, RewriteSubquerySuite and 
SubquerySuite.
Output from DB2 as a reference:

[nested-not-db2.txt](https://github.com/apache/spark/files/2299945/nested-not-db2.txt)

Closes #28158 from maropu/pr22141.

Lead-authored-by: Dilip Biswal 
Co-authored-by: Takeshi Yamamuro 
Co-authored-by: Dilip Biswal 
Signed-off-by: Takeshi Yamamuro 
---
 .../sql/catalyst/analysis/CheckAnalysis.scala  |   4 -
 .../spark/sql/catalyst/expressions/subquery.scala  |  18 --
 .../spark/sql/catalyst/optimizer/subquery.scala|  23 +-
 .../sql/catalyst/analysis/AnalysisErrorSuite.scala |  15 -
 .../catalyst/optimizer/RewriteSubquerySuite.scala  |  19 +-
 .../apache/spark/sql/catalyst/plans/PlanTest.scala |   9 +-
 .../inputs/subquery/in-subquery/nested-not-in.sql  | 198 
 .../subquery/in-subquery/nested-not-in.sql.out | 332 +
 .../scala/org/apache/spark/sql/SubquerySuite.scala |   6 +-
 9 files changed, 580 insertions(+), 44 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
index 066dc6d..9e325d0 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
@@ -200,10 +200,6 @@ trait CheckAnalysis extends PredicateHelper {
   s"filter expression '${f.condition.sql}' " +
 s"of type ${f.condition.dataType.catalogString} is not a 
boolean.")
 
-  case Filter(condition, _) if 
hasNullAwarePredicateWithinNot(condition) =>
-failAnalysis("Null-aware predicate sub-queries cannot be used in 
nested " +
-  s"conditions: $condition")
-
   case j @ Join(_, _, _, Some(condition), _) if condition.dataType != 
BooleanType =>
 failAnalysis(
   s"join condition '${condition.sql}' " +
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala
index e33cff2..f46a1c6 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala
@@ -106,24 +106,6 @@ object SubExprUtils extends PredicateHelper {
   }
 
   /**
-   * Returns whether there are any null-aware predicate subqueries inside Not. 
If not, we could
-   * turn the null-aware predicate into not-null-aware predicate.
-   */
-  def hasNullAwarePredicateWithinNot(condition: Expression): Boolean = {
-splitConjunctivePredicates(condition).exists {
-  case _: Exists | Not(_: Exists) => false
-  case _: InSubquery | Not(_: InSubquery) => false
-  case e => e.find { x =>
-x.isInstanceOf[Not] && e.find {
-  case _: InSubquery => true
-  case _ => false
-}.isDefined
-  }.isDefined
-}
-
-  }
-
-  /**
* Returns an

[spark] branch master updated (2d3692e -> f0e2fc3)

2020-04-10 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2d3692e  [SPARK-31406][SQL][TEST] ThriftServerQueryTestSuite: Sharing 
test data and test tables among multiple test cases
 add f0e2fc3  [SPARK-25154][SQL] Support NOT IN sub-queries inside nested 
OR conditions

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/analysis/CheckAnalysis.scala  |   4 -
 .../spark/sql/catalyst/expressions/subquery.scala  |  18 --
 .../spark/sql/catalyst/optimizer/subquery.scala|  23 +-
 .../sql/catalyst/analysis/AnalysisErrorSuite.scala |  15 -
 .../catalyst/optimizer/RewriteSubquerySuite.scala  |  19 +-
 .../apache/spark/sql/catalyst/plans/PlanTest.scala |   9 +-
 .../inputs/subquery/in-subquery/nested-not-in.sql  | 198 
 .../subquery/in-subquery/nested-not-in.sql.out | 332 +
 .../scala/org/apache/spark/sql/SubquerySuite.scala |   6 +-
 9 files changed, 580 insertions(+), 44 deletions(-)
 create mode 100644 
sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/nested-not-in.sql
 create mode 100644 
sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/nested-not-in.sql.out


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31358][SQL][DOC] Document FILTER clauses of aggregate functions in SQL references

2020-04-06 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 51c80a4  [SPARK-31358][SQL][DOC] Document FILTER clauses of aggregate 
functions in SQL references
51c80a4 is described below

commit 51c80a48024242a51940c9c0aafdfd7e3a0c481f
Author: Takeshi Yamamuro 
AuthorDate: Mon Apr 6 21:36:51 2020 +0900

[SPARK-31358][SQL][DOC] Document FILTER clauses of aggregate functions in 
SQL references

### What changes were proposed in this pull request?

This PR intends to improve the SQL document of `GROUP BY`; it added the 
description about FILTER clauses of aggregate functions.

### Why are the changes needed?

To improve the SQL documents

### Does this PR introduce any user-facing change?

Yes.

https://user-images.githubusercontent.com/692303/78558612-e2234a80-784d-11ea-9353-b3feac4d57a7.png;
 width="500">

### How was this patch tested?

Manually checked.

Closes #28134 from maropu/SPARK-31358.

Authored-by: Takeshi Yamamuro 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit e24f0dcd2754a1db27e9b0f3cf27ee6d7229f717)
Signed-off-by: Takeshi Yamamuro 
---
 docs/sql-ref-syntax-qry-select-groupby.md | 44 +++
 docs/sql-ref-syntax-qry-select.md |  1 +
 2 files changed, 45 insertions(+)

diff --git a/docs/sql-ref-syntax-qry-select-groupby.md 
b/docs/sql-ref-syntax-qry-select-groupby.md
index 49a11ca..c461a18 100644
--- a/docs/sql-ref-syntax-qry-select-groupby.md
+++ b/docs/sql-ref-syntax-qry-select-groupby.md
@@ -21,6 +21,7 @@ license: |
 The GROUP BY clause is used to group the rows based on a set of 
specified grouping expressions and compute aggregations on
 the group of rows based on one or more specified aggregate functions. Spark 
also supports advanced aggregations to do multiple
 aggregations for the same input record set via `GROUPING SETS`, `CUBE`, 
`ROLLUP` clauses.
+When a FILTER clause is attached to an aggregate function, only the matching 
rows are passed to that function.
 
 ### Syntax
 {% highlight sql %}
@@ -30,6 +31,11 @@ GROUP BY group_expression [ , group_expression [ , ... ] ]
 GROUP BY GROUPING SETS (grouping_set [ , ...])
 {% endhighlight %}
 
+While aggregate functions are defined as
+{% highlight sql %}
+aggregate_name ( [ DISTINCT ] expression [ , ... ] ) [ FILTER ( WHERE 
boolean_expression ) ]
+{% endhighlight %}
+
 ### Parameters
 
   GROUPING SETS
@@ -70,6 +76,19 @@ GROUP BY GROUPING SETS (grouping_set [ , ...])
 ((warehouse, product), (warehouse), (product), ()).
 The N elements of a CUBE specification results in 2^N 
GROUPING SETS.
   
+  aggregate_name
+  
+Specifies an aggregate function name (MIN, MAX, COUNT, SUM, AVG, etc.).
+  
+  DISTINCT
+  
+Removes duplicates in input rows before they are passed to aggregate 
functions.
+  
+  FILTER
+  
+Filters the input rows for which the boolean_expression in 
the WHERE clause evaluates
+to true are passed to the aggregate function; other rows are discarded.
+  
 
 
 ### Examples
@@ -120,6 +139,31 @@ SELECT id, sum(quantity) AS sum, max(quantity) AS max FROM 
dealer GROUP BY id OR
   |300|13 |8  |
   +---+---+---+
 
+-- Count the number of distinct dealer cities per car_model.
+SELECT car_model, count(DISTINCT city) AS count FROM dealer GROUP BY car_model;
+
+  ++-+
+  |   car_model|count|
+  ++-+
+  | Honda Civic|3|
+  |   Honda CRV|2|
+  |Honda Accord|3|
+  ++-+
+
+-- Sum of only 'Honda Civic' and 'Honda CRV' quantities per dealership.
+SELECT id, sum(quantity) FILTER (
+WHERE car_model IN ('Honda Civic', 'Honda CRV')
+) AS `sum(quantity)` FROM dealer
+GROUP BY id ORDER BY id;
+
+   +---+-+
+   | id|sum(quantity)|
+   +---+-+
+   |100|   17|
+   |200|   23|
+   |300|5|
+   +---+-+
+
 -- Aggregations using multiple sets of grouping columns in a single statement.
 -- Following performs aggregations based on four sets of grouping columns.
 -- 1. city, car_model
diff --git a/docs/sql-ref-syntax-qry-select.md 
b/docs/sql-ref-syntax-qry-select.md
index e87c4a5..7ad1dd1 100644
--- a/docs/sql-ref-syntax-qry-select.md
+++ b/docs/sql-ref-syntax-qry-select.md
@@ -92,6 +92,7 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { named_expression 
[ , ... ] }
   
 Specifies the expressions that are used to group the rows. This is used in 
conjunction with aggregate functions
 (MIN, MAX, COUNT, SUM, AVG, etc.) to group rows based on the grouping 
expressions and aggregate values in each group.
+When a FILTER clause is attached to an aggregate function, only the 
matching rows are passed to that function.

[spark] branch master updated: [SPARK-31358][SQL][DOC] Document FILTER clauses of aggregate functions in SQL references

2020-04-06 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e24f0dc  [SPARK-31358][SQL][DOC] Document FILTER clauses of aggregate 
functions in SQL references
e24f0dc is described below

commit e24f0dcd2754a1db27e9b0f3cf27ee6d7229f717
Author: Takeshi Yamamuro 
AuthorDate: Mon Apr 6 21:36:51 2020 +0900

[SPARK-31358][SQL][DOC] Document FILTER clauses of aggregate functions in 
SQL references

### What changes were proposed in this pull request?

This PR intends to improve the SQL document of `GROUP BY`; it added the 
description about FILTER clauses of aggregate functions.

### Why are the changes needed?

To improve the SQL documents

### Does this PR introduce any user-facing change?

Yes.

https://user-images.githubusercontent.com/692303/78558612-e2234a80-784d-11ea-9353-b3feac4d57a7.png;
 width="500">

### How was this patch tested?

Manually checked.

Closes #28134 from maropu/SPARK-31358.

Authored-by: Takeshi Yamamuro 
Signed-off-by: Takeshi Yamamuro 
---
 docs/sql-ref-syntax-qry-select-groupby.md | 44 +++
 docs/sql-ref-syntax-qry-select.md |  1 +
 2 files changed, 45 insertions(+)

diff --git a/docs/sql-ref-syntax-qry-select-groupby.md 
b/docs/sql-ref-syntax-qry-select-groupby.md
index 49a11ca..c461a18 100644
--- a/docs/sql-ref-syntax-qry-select-groupby.md
+++ b/docs/sql-ref-syntax-qry-select-groupby.md
@@ -21,6 +21,7 @@ license: |
 The GROUP BY clause is used to group the rows based on a set of 
specified grouping expressions and compute aggregations on
 the group of rows based on one or more specified aggregate functions. Spark 
also supports advanced aggregations to do multiple
 aggregations for the same input record set via `GROUPING SETS`, `CUBE`, 
`ROLLUP` clauses.
+When a FILTER clause is attached to an aggregate function, only the matching 
rows are passed to that function.
 
 ### Syntax
 {% highlight sql %}
@@ -30,6 +31,11 @@ GROUP BY group_expression [ , group_expression [ , ... ] ]
 GROUP BY GROUPING SETS (grouping_set [ , ...])
 {% endhighlight %}
 
+While aggregate functions are defined as
+{% highlight sql %}
+aggregate_name ( [ DISTINCT ] expression [ , ... ] ) [ FILTER ( WHERE 
boolean_expression ) ]
+{% endhighlight %}
+
 ### Parameters
 
   GROUPING SETS
@@ -70,6 +76,19 @@ GROUP BY GROUPING SETS (grouping_set [ , ...])
 ((warehouse, product), (warehouse), (product), ()).
 The N elements of a CUBE specification results in 2^N 
GROUPING SETS.
   
+  aggregate_name
+  
+Specifies an aggregate function name (MIN, MAX, COUNT, SUM, AVG, etc.).
+  
+  DISTINCT
+  
+Removes duplicates in input rows before they are passed to aggregate 
functions.
+  
+  FILTER
+  
+Filters the input rows for which the boolean_expression in 
the WHERE clause evaluates
+to true are passed to the aggregate function; other rows are discarded.
+  
 
 
 ### Examples
@@ -120,6 +139,31 @@ SELECT id, sum(quantity) AS sum, max(quantity) AS max FROM 
dealer GROUP BY id OR
   |300|13 |8  |
   +---+---+---+
 
+-- Count the number of distinct dealer cities per car_model.
+SELECT car_model, count(DISTINCT city) AS count FROM dealer GROUP BY car_model;
+
+  ++-+
+  |   car_model|count|
+  ++-+
+  | Honda Civic|3|
+  |   Honda CRV|2|
+  |Honda Accord|3|
+  ++-+
+
+-- Sum of only 'Honda Civic' and 'Honda CRV' quantities per dealership.
+SELECT id, sum(quantity) FILTER (
+WHERE car_model IN ('Honda Civic', 'Honda CRV')
+) AS `sum(quantity)` FROM dealer
+GROUP BY id ORDER BY id;
+
+   +---+-+
+   | id|sum(quantity)|
+   +---+-+
+   |100|   17|
+   |200|   23|
+   |300|5|
+   +---+-+
+
 -- Aggregations using multiple sets of grouping columns in a single statement.
 -- Following performs aggregations based on four sets of grouping columns.
 -- 1. city, car_model
diff --git a/docs/sql-ref-syntax-qry-select.md 
b/docs/sql-ref-syntax-qry-select.md
index e87c4a5..7ad1dd1 100644
--- a/docs/sql-ref-syntax-qry-select.md
+++ b/docs/sql-ref-syntax-qry-select.md
@@ -92,6 +92,7 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { named_expression 
[ , ... ] }
   
 Specifies the expressions that are used to group the rows. This is used in 
conjunction with aggregate functions
 (MIN, MAX, COUNT, SUM, AVG, etc.) to group rows based on the grouping 
expressions and aggregate values in each group.
+When a FILTER clause is attached to an aggregate function, only the 
matching rows are passed to that function.
   
   HAVING
   


-
To unsubscribe, e-mail: com

[spark] branch branch-3.0 updated: [SPARK-31358][SQL][DOC] Document FILTER clauses of aggregate functions in SQL references

2020-04-06 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 51c80a4  [SPARK-31358][SQL][DOC] Document FILTER clauses of aggregate 
functions in SQL references
51c80a4 is described below

commit 51c80a48024242a51940c9c0aafdfd7e3a0c481f
Author: Takeshi Yamamuro 
AuthorDate: Mon Apr 6 21:36:51 2020 +0900

[SPARK-31358][SQL][DOC] Document FILTER clauses of aggregate functions in 
SQL references

### What changes were proposed in this pull request?

This PR intends to improve the SQL document of `GROUP BY`; it added the 
description about FILTER clauses of aggregate functions.

### Why are the changes needed?

To improve the SQL documents

### Does this PR introduce any user-facing change?

Yes.

https://user-images.githubusercontent.com/692303/78558612-e2234a80-784d-11ea-9353-b3feac4d57a7.png;
 width="500">

### How was this patch tested?

Manually checked.

Closes #28134 from maropu/SPARK-31358.

Authored-by: Takeshi Yamamuro 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit e24f0dcd2754a1db27e9b0f3cf27ee6d7229f717)
Signed-off-by: Takeshi Yamamuro 
---
 docs/sql-ref-syntax-qry-select-groupby.md | 44 +++
 docs/sql-ref-syntax-qry-select.md |  1 +
 2 files changed, 45 insertions(+)

diff --git a/docs/sql-ref-syntax-qry-select-groupby.md 
b/docs/sql-ref-syntax-qry-select-groupby.md
index 49a11ca..c461a18 100644
--- a/docs/sql-ref-syntax-qry-select-groupby.md
+++ b/docs/sql-ref-syntax-qry-select-groupby.md
@@ -21,6 +21,7 @@ license: |
 The GROUP BY clause is used to group the rows based on a set of 
specified grouping expressions and compute aggregations on
 the group of rows based on one or more specified aggregate functions. Spark 
also supports advanced aggregations to do multiple
 aggregations for the same input record set via `GROUPING SETS`, `CUBE`, 
`ROLLUP` clauses.
+When a FILTER clause is attached to an aggregate function, only the matching 
rows are passed to that function.
 
 ### Syntax
 {% highlight sql %}
@@ -30,6 +31,11 @@ GROUP BY group_expression [ , group_expression [ , ... ] ]
 GROUP BY GROUPING SETS (grouping_set [ , ...])
 {% endhighlight %}
 
+While aggregate functions are defined as
+{% highlight sql %}
+aggregate_name ( [ DISTINCT ] expression [ , ... ] ) [ FILTER ( WHERE 
boolean_expression ) ]
+{% endhighlight %}
+
 ### Parameters
 
   GROUPING SETS
@@ -70,6 +76,19 @@ GROUP BY GROUPING SETS (grouping_set [ , ...])
 ((warehouse, product), (warehouse), (product), ()).
 The N elements of a CUBE specification results in 2^N 
GROUPING SETS.
   
+  aggregate_name
+  
+Specifies an aggregate function name (MIN, MAX, COUNT, SUM, AVG, etc.).
+  
+  DISTINCT
+  
+Removes duplicates in input rows before they are passed to aggregate 
functions.
+  
+  FILTER
+  
+Filters the input rows for which the boolean_expression in 
the WHERE clause evaluates
+to true are passed to the aggregate function; other rows are discarded.
+  
 
 
 ### Examples
@@ -120,6 +139,31 @@ SELECT id, sum(quantity) AS sum, max(quantity) AS max FROM 
dealer GROUP BY id OR
   |300|13 |8  |
   +---+---+---+
 
+-- Count the number of distinct dealer cities per car_model.
+SELECT car_model, count(DISTINCT city) AS count FROM dealer GROUP BY car_model;
+
+  ++-+
+  |   car_model|count|
+  ++-+
+  | Honda Civic|3|
+  |   Honda CRV|2|
+  |Honda Accord|3|
+  ++-+
+
+-- Sum of only 'Honda Civic' and 'Honda CRV' quantities per dealership.
+SELECT id, sum(quantity) FILTER (
+WHERE car_model IN ('Honda Civic', 'Honda CRV')
+) AS `sum(quantity)` FROM dealer
+GROUP BY id ORDER BY id;
+
+   +---+-+
+   | id|sum(quantity)|
+   +---+-+
+   |100|   17|
+   |200|   23|
+   |300|5|
+   +---+-+
+
 -- Aggregations using multiple sets of grouping columns in a single statement.
 -- Following performs aggregations based on four sets of grouping columns.
 -- 1. city, car_model
diff --git a/docs/sql-ref-syntax-qry-select.md 
b/docs/sql-ref-syntax-qry-select.md
index e87c4a5..7ad1dd1 100644
--- a/docs/sql-ref-syntax-qry-select.md
+++ b/docs/sql-ref-syntax-qry-select.md
@@ -92,6 +92,7 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { named_expression 
[ , ... ] }
   
 Specifies the expressions that are used to group the rows. This is used in 
conjunction with aggregate functions
 (MIN, MAX, COUNT, SUM, AVG, etc.) to group rows based on the grouping 
expressions and aggregate values in each group.
+When a FILTER clause is attached to an aggregate function, only the 
matching rows are passed to that function.

[spark] branch master updated: [SPARK-31358][SQL][DOC] Document FILTER clauses of aggregate functions in SQL references

2020-04-06 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e24f0dc  [SPARK-31358][SQL][DOC] Document FILTER clauses of aggregate 
functions in SQL references
e24f0dc is described below

commit e24f0dcd2754a1db27e9b0f3cf27ee6d7229f717
Author: Takeshi Yamamuro 
AuthorDate: Mon Apr 6 21:36:51 2020 +0900

[SPARK-31358][SQL][DOC] Document FILTER clauses of aggregate functions in 
SQL references

### What changes were proposed in this pull request?

This PR intends to improve the SQL document of `GROUP BY`; it added the 
description about FILTER clauses of aggregate functions.

### Why are the changes needed?

To improve the SQL documents

### Does this PR introduce any user-facing change?

Yes.

https://user-images.githubusercontent.com/692303/78558612-e2234a80-784d-11ea-9353-b3feac4d57a7.png;
 width="500">

### How was this patch tested?

Manually checked.

Closes #28134 from maropu/SPARK-31358.

Authored-by: Takeshi Yamamuro 
Signed-off-by: Takeshi Yamamuro 
---
 docs/sql-ref-syntax-qry-select-groupby.md | 44 +++
 docs/sql-ref-syntax-qry-select.md |  1 +
 2 files changed, 45 insertions(+)

diff --git a/docs/sql-ref-syntax-qry-select-groupby.md 
b/docs/sql-ref-syntax-qry-select-groupby.md
index 49a11ca..c461a18 100644
--- a/docs/sql-ref-syntax-qry-select-groupby.md
+++ b/docs/sql-ref-syntax-qry-select-groupby.md
@@ -21,6 +21,7 @@ license: |
 The GROUP BY clause is used to group the rows based on a set of 
specified grouping expressions and compute aggregations on
 the group of rows based on one or more specified aggregate functions. Spark 
also supports advanced aggregations to do multiple
 aggregations for the same input record set via `GROUPING SETS`, `CUBE`, 
`ROLLUP` clauses.
+When a FILTER clause is attached to an aggregate function, only the matching 
rows are passed to that function.
 
 ### Syntax
 {% highlight sql %}
@@ -30,6 +31,11 @@ GROUP BY group_expression [ , group_expression [ , ... ] ]
 GROUP BY GROUPING SETS (grouping_set [ , ...])
 {% endhighlight %}
 
+While aggregate functions are defined as
+{% highlight sql %}
+aggregate_name ( [ DISTINCT ] expression [ , ... ] ) [ FILTER ( WHERE 
boolean_expression ) ]
+{% endhighlight %}
+
 ### Parameters
 
   GROUPING SETS
@@ -70,6 +76,19 @@ GROUP BY GROUPING SETS (grouping_set [ , ...])
 ((warehouse, product), (warehouse), (product), ()).
 The N elements of a CUBE specification results in 2^N 
GROUPING SETS.
   
+  aggregate_name
+  
+Specifies an aggregate function name (MIN, MAX, COUNT, SUM, AVG, etc.).
+  
+  DISTINCT
+  
+Removes duplicates in input rows before they are passed to aggregate 
functions.
+  
+  FILTER
+  
+Filters the input rows for which the boolean_expression in 
the WHERE clause evaluates
+to true are passed to the aggregate function; other rows are discarded.
+  
 
 
 ### Examples
@@ -120,6 +139,31 @@ SELECT id, sum(quantity) AS sum, max(quantity) AS max FROM 
dealer GROUP BY id OR
   |300|13 |8  |
   +---+---+---+
 
+-- Count the number of distinct dealer cities per car_model.
+SELECT car_model, count(DISTINCT city) AS count FROM dealer GROUP BY car_model;
+
+  ++-+
+  |   car_model|count|
+  ++-+
+  | Honda Civic|3|
+  |   Honda CRV|2|
+  |Honda Accord|3|
+  ++-+
+
+-- Sum of only 'Honda Civic' and 'Honda CRV' quantities per dealership.
+SELECT id, sum(quantity) FILTER (
+WHERE car_model IN ('Honda Civic', 'Honda CRV')
+) AS `sum(quantity)` FROM dealer
+GROUP BY id ORDER BY id;
+
+   +---+-+
+   | id|sum(quantity)|
+   +---+-+
+   |100|   17|
+   |200|   23|
+   |300|5|
+   +---+-+
+
 -- Aggregations using multiple sets of grouping columns in a single statement.
 -- Following performs aggregations based on four sets of grouping columns.
 -- 1. city, car_model
diff --git a/docs/sql-ref-syntax-qry-select.md 
b/docs/sql-ref-syntax-qry-select.md
index e87c4a5..7ad1dd1 100644
--- a/docs/sql-ref-syntax-qry-select.md
+++ b/docs/sql-ref-syntax-qry-select.md
@@ -92,6 +92,7 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { named_expression 
[ , ... ] }
   
 Specifies the expressions that are used to group the rows. This is used in 
conjunction with aggregate functions
 (MIN, MAX, COUNT, SUM, AVG, etc.) to group rows based on the grouping 
expressions and aggregate values in each group.
+When a FILTER clause is attached to an aggregate function, only the 
matching rows are passed to that function.
   
   HAVING
   


-
To unsubscribe, e-mail: com

[spark] branch branch-3.0 updated: [SPARK-31326][SQL][DOCS] Create Function docs structure for SQL Reference

2020-04-02 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 308a8fd  [SPARK-31326][SQL][DOCS] Create Function docs structure for 
SQL Reference
308a8fd is described below

commit 308a8fd3c67704b7fce0067a199707f46c6e6f1e
Author: Huaxin Gao 
AuthorDate: Fri Apr 3 14:36:03 2020 +0900

[SPARK-31326][SQL][DOCS] Create Function docs structure for SQL Reference

### What changes were proposed in this pull request?
Create Function docs structure for SQL Reference...

### Why are the changes needed?
so the Function docs can be added later, also want to get a consensus about 
what to document for Functions in SQL Reference.

### Does this PR introduce any user-facing change?
Yes
https://user-images.githubusercontent.com/13592258/78220451-68b6e100-7476-11ea-9a21-733b41652785.png;>

https://user-images.githubusercontent.com/13592258/78220460-6ce2fe80-7476-11ea-887c-defefd55c19d.png;>

https://user-images.githubusercontent.com/13592258/78220463-6f455880-7476-11ea-81fc-fd4137db7c3f.png;>

### How was this patch tested?
Manually build and check

Closes #28099 from huaxingao/function.

Authored-by: Huaxin Gao 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 4e45c07f5dbc4b178c41449320b7405b20aa05e9)
Signed-off-by: Takeshi Yamamuro 
---
 docs/_data/menu-sql.yaml| 21 +
 docs/sql-ref-functions-builtin-aggregate.md | 10 +-
 ...regate.md => sql-ref-functions-builtin-array.md} |  6 +++---
 ...te.md => sql-ref-functions-builtin-date-time.md} |  6 +++---
 docs/sql-ref-functions-builtin.md   | 17 +
 ...n-aggregate.md => sql-ref-functions-udf-hive.md} | 10 +-
 docs/sql-ref-functions-udf.md   | 17 +
 docs/sql-ref-functions.md   | 13 +
 8 files changed, 60 insertions(+), 40 deletions(-)

diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml
index 6534c50..500895a 100644
--- a/docs/_data/menu-sql.yaml
+++ b/docs/_data/menu-sql.yaml
@@ -225,5 +225,26 @@
   url: sql-ref-syntax-aux-resource-mgmt-list-file.html
 - text: LIST JAR
   url: sql-ref-syntax-aux-resource-mgmt-list-jar.html
+- text: Functions
+  url: sql-ref-functions.html
+  subitems:
+  - text: Build-in Functions
+url: sql-ref-functions-builtin.html
+subitems:
+- text: Build-in Aggregate Functions
+  url: sql-ref-functions-builtin-aggregate.html
+- text: Build-in Array Functions
+  url: sql-ref-functions-builtin-array.html
+- text: Build-in Date Time Functions
+  url: sql-ref-functions-builtin-date-time.html
+  - text: UDFs (User-Defined Functions)
+url: sql-ref-functions-udf.html
+subitems:
+- text: Scalar UDFs (User-Defined Functions)
+  url: sql-ref-functions-udf-scalar.html
+- text: UDAFs (User-Defined Aggregate Functions)
+  url: sql-ref-functions-udf-aggregate.html
+- text: Integration with Hive UDFs/UDAFs/UDTFs
+  url: sql-ref-functions-udf-hive.html
 - text: Datetime Pattern
   url: sql-ref-datetime-pattern.html
diff --git a/docs/sql-ref-functions-builtin-aggregate.md 
b/docs/sql-ref-functions-builtin-aggregate.md
index 3fcd782..d595436 100644
--- a/docs/sql-ref-functions-builtin-aggregate.md
+++ b/docs/sql-ref-functions-builtin-aggregate.md
@@ -1,7 +1,7 @@
 ---
 layout: global
-title: Builtin Aggregate Functions
-displayTitle: Builtin Aggregate Functions
+title: Built-in Aggregate Functions
+displayTitle: Built-in Aggregate Functions
 license: |
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
@@ -9,9 +9,9 @@ license: |
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at
- 
+
  http://www.apache.org/licenses/LICENSE-2.0
- 
+
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -19,4 +19,4 @@ license: |
   limitations under the License.
 ---
 
-**This page is under construction**
+Aggregate functions
\ No newline at end of file
diff --git a/docs/sql-ref-functions-builtin-aggregate.md 
b/docs/sql-ref-functions-builtin-array.md
similarity index 87%
copy from docs/sql-ref-functions-builtin-aggregate.md
copy to docs/sql-ref-functions-b

[spark] branch master updated (820bb99 -> 4e45c07)

2020-04-02 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 820bb99  [SPARK-31328][SQL] Fix rebasing of overlapped local 
timestamps during daylight saving time
 add 4e45c07  [SPARK-31326][SQL][DOCS] Create Function docs structure for 
SQL Reference

No new revisions were added by this update.

Summary of changes:
 docs/_data/menu-sql.yaml| 21 +
 docs/sql-ref-functions-builtin-aggregate.md | 10 +-
 ...ueries.md => sql-ref-functions-builtin-array.md} |  6 +++---
 ...ar.md => sql-ref-functions-builtin-date-time.md} |  6 +++---
 docs/sql-ref-functions-builtin.md   | 17 +
 ...aux-analyze.md => sql-ref-functions-udf-hive.md} |  6 +++---
 docs/sql-ref-functions-udf.md   | 17 +
 docs/sql-ref-functions.md   | 13 +
 8 files changed, 58 insertions(+), 38 deletions(-)
 copy docs/{sql-ref-syntax-qry-select-subqueries.md => 
sql-ref-functions-builtin-array.md} (90%)
 copy docs/{sql-ref-functions-builtin-scalar.md => 
sql-ref-functions-builtin-date-time.md} (88%)
 copy docs/{sql-ref-syntax-aux-analyze.md => sql-ref-functions-udf-hive.md} 
(85%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31305][SQL][DOCS] Add a page to list all commands in SQL Reference

2020-03-31 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 01b26c4  [SPARK-31305][SQL][DOCS] Add a page to list all commands in 
SQL Reference
01b26c4 is described below

commit 01b26c49009d8136f1f962e87ce7e35db43533ab
Author: Huaxin Gao 
AuthorDate: Wed Apr 1 08:42:15 2020 +0900

[SPARK-31305][SQL][DOCS] Add a page to list all commands in SQL Reference

### What changes were proposed in this pull request?
Add a page to list all commands in SQL Reference...

### Why are the changes needed?
so it's easier for user to find a specific command.

### Does this PR introduce any user-facing change?
before:

![image](https://user-images.githubusercontent.com/13592258/77938658-ec03e700-726a-11ea-983c-7a559cc0aae2.png)

after:

![image](https://user-images.githubusercontent.com/13592258/77937899-d3df9800-7269-11ea-85db-749a9521576a.png)


![image](https://user-images.githubusercontent.com/13592258/77937924-db9f3c80-7269-11ea-9441-7603feee421c.png)

Also move ```use database``` from query category to ddl category.

### How was this patch tested?
Manually build and check

Closes #28074 from huaxingao/list-all.

Authored-by: Huaxin Gao 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 1a7f9649b67d2108cb14e9e466855dfe52db6d66)
Signed-off-by: Takeshi Yamamuro 
---
 docs/_data/menu-sql.yaml   |  4 +--
 docs/sql-ref-syntax-ddl.md |  1 +
 docs/sql-ref-syntax.md | 62 +-
 3 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml
index 3bf4952..6534c50 100644
--- a/docs/_data/menu-sql.yaml
+++ b/docs/_data/menu-sql.yaml
@@ -123,6 +123,8 @@
   url: sql-ref-syntax-ddl-truncate-table.html
 - text: REPAIR TABLE
   url: sql-ref-syntax-ddl-repair-table.html
+- text: USE DATABASE
+  url: sql-ref-syntax-qry-select-usedb.html
 - text: Data Manipulation Statements
   url: sql-ref-syntax-dml.html
   subitems:
@@ -152,8 +154,6 @@
   url: sql-ref-syntax-qry-select-distribute-by.html
 - text: LIMIT Clause 
   url: sql-ref-syntax-qry-select-limit.html
-- text: USE database
-  url: sql-ref-syntax-qry-select-usedb.html
 - text: EXPLAIN
   url: sql-ref-syntax-qry-explain.html
 - text: Auxiliary Statements
diff --git a/docs/sql-ref-syntax-ddl.md b/docs/sql-ref-syntax-ddl.md
index 954020a..ab4e95a 100644
--- a/docs/sql-ref-syntax-ddl.md
+++ b/docs/sql-ref-syntax-ddl.md
@@ -36,3 +36,4 @@ Data Definition Statements are used to create or modify the 
structure of databas
 - [DROP VIEW](sql-ref-syntax-ddl-drop-view.html)
 - [TRUNCATE TABLE](sql-ref-syntax-ddl-truncate-table.html)
 - [REPAIR TABLE](sql-ref-syntax-ddl-repair-table.html)
+- [USE DATABASE](sql-ref-syntax-qry-select-usedb.html)
diff --git a/docs/sql-ref-syntax.md b/docs/sql-ref-syntax.md
index 2510278..3db97ac 100644
--- a/docs/sql-ref-syntax.md
+++ b/docs/sql-ref-syntax.md
@@ -19,4 +19,64 @@ license: |
   limitations under the License.
 ---
 
-Spark SQL is Apache Spark's module for working with structured data. The SQL 
Syntax section describes the SQL syntax in detail along with usage examples 
when applicable.
+Spark SQL is Apache Spark's module for working with structured data. The SQL 
Syntax section describes the SQL syntax in detail along with usage examples 
when applicable. This document provides a list of Data Definition and Data 
Manipulation Statements, as well as Data Retrieval and Auxiliary Statements.
+
+### DDL Statements
+- [ALTER DATABASE](sql-ref-syntax-ddl-alter-database.html)
+- [ALTER TABLE](sql-ref-syntax-ddl-alter-table.html)
+- [ALTER VIEW](sql-ref-syntax-ddl-alter-view.html)
+- [CREATE DATABASE](sql-ref-syntax-ddl-create-database.html)
+- [CREATE FUNCTION](sql-ref-syntax-ddl-create-function.html)
+- [CREATE TABLE](sql-ref-syntax-ddl-create-table.html)
+- [CREATE VIEW](sql-ref-syntax-ddl-create-view.html)
+- [DROP DATABASE](sql-ref-syntax-ddl-drop-database.html)
+- [DROP FUNCTION](sql-ref-syntax-ddl-drop-function.html)
+- [DROP TABLE](sql-ref-syntax-ddl-drop-table.html)
+- [DROP VIEW](sql-ref-syntax-ddl-drop-view.html)
+- [REPAIR TABLE](sql-ref-syntax-ddl-repair-table.html)
+- [TRUNCATE TABLE](sql-ref-syntax-ddl-truncate-table.html)
+- [USE DATABASE](sql-ref-syntax-qry-select-usedb.html)
+
+### DML Statements
+- [INSERT INTO](sql-ref-syntax-dml-insert-into.html)
+- [INSERT OVERWRITE](sql-ref-syntax-dml-insert-overwrite-table.html)
+- [INSERT OVERWRITE 
DIRECTORY](sql-ref-syntax-dml-insert-overwrite-directory.html)
+- [INSERT OVERWRITE DIRECTORY

[spark] branch master updated: [SPARK-31305][SQL][DOCS] Add a page to list all commands in SQL Reference

2020-03-31 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1a7f964  [SPARK-31305][SQL][DOCS] Add a page to list all commands in 
SQL Reference
1a7f964 is described below

commit 1a7f9649b67d2108cb14e9e466855dfe52db6d66
Author: Huaxin Gao 
AuthorDate: Wed Apr 1 08:42:15 2020 +0900

[SPARK-31305][SQL][DOCS] Add a page to list all commands in SQL Reference

### What changes were proposed in this pull request?
Add a page to list all commands in SQL Reference...

### Why are the changes needed?
so it's easier for user to find a specific command.

### Does this PR introduce any user-facing change?
before:

![image](https://user-images.githubusercontent.com/13592258/77938658-ec03e700-726a-11ea-983c-7a559cc0aae2.png)

after:

![image](https://user-images.githubusercontent.com/13592258/77937899-d3df9800-7269-11ea-85db-749a9521576a.png)


![image](https://user-images.githubusercontent.com/13592258/77937924-db9f3c80-7269-11ea-9441-7603feee421c.png)

Also move ```use database``` from query category to ddl category.

### How was this patch tested?
Manually build and check

Closes #28074 from huaxingao/list-all.

Authored-by: Huaxin Gao 
Signed-off-by: Takeshi Yamamuro 
---
 docs/_data/menu-sql.yaml   |  4 +--
 docs/sql-ref-syntax-ddl.md |  1 +
 docs/sql-ref-syntax.md | 62 +-
 3 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml
index 3bf4952..6534c50 100644
--- a/docs/_data/menu-sql.yaml
+++ b/docs/_data/menu-sql.yaml
@@ -123,6 +123,8 @@
   url: sql-ref-syntax-ddl-truncate-table.html
 - text: REPAIR TABLE
   url: sql-ref-syntax-ddl-repair-table.html
+- text: USE DATABASE
+  url: sql-ref-syntax-qry-select-usedb.html
 - text: Data Manipulation Statements
   url: sql-ref-syntax-dml.html
   subitems:
@@ -152,8 +154,6 @@
   url: sql-ref-syntax-qry-select-distribute-by.html
 - text: LIMIT Clause 
   url: sql-ref-syntax-qry-select-limit.html
-- text: USE database
-  url: sql-ref-syntax-qry-select-usedb.html
 - text: EXPLAIN
   url: sql-ref-syntax-qry-explain.html
 - text: Auxiliary Statements
diff --git a/docs/sql-ref-syntax-ddl.md b/docs/sql-ref-syntax-ddl.md
index 954020a..ab4e95a 100644
--- a/docs/sql-ref-syntax-ddl.md
+++ b/docs/sql-ref-syntax-ddl.md
@@ -36,3 +36,4 @@ Data Definition Statements are used to create or modify the 
structure of databas
 - [DROP VIEW](sql-ref-syntax-ddl-drop-view.html)
 - [TRUNCATE TABLE](sql-ref-syntax-ddl-truncate-table.html)
 - [REPAIR TABLE](sql-ref-syntax-ddl-repair-table.html)
+- [USE DATABASE](sql-ref-syntax-qry-select-usedb.html)
diff --git a/docs/sql-ref-syntax.md b/docs/sql-ref-syntax.md
index 2510278..3db97ac 100644
--- a/docs/sql-ref-syntax.md
+++ b/docs/sql-ref-syntax.md
@@ -19,4 +19,64 @@ license: |
   limitations under the License.
 ---
 
-Spark SQL is Apache Spark's module for working with structured data. The SQL 
Syntax section describes the SQL syntax in detail along with usage examples 
when applicable.
+Spark SQL is Apache Spark's module for working with structured data. The SQL 
Syntax section describes the SQL syntax in detail along with usage examples 
when applicable. This document provides a list of Data Definition and Data 
Manipulation Statements, as well as Data Retrieval and Auxiliary Statements.
+
+### DDL Statements
+- [ALTER DATABASE](sql-ref-syntax-ddl-alter-database.html)
+- [ALTER TABLE](sql-ref-syntax-ddl-alter-table.html)
+- [ALTER VIEW](sql-ref-syntax-ddl-alter-view.html)
+- [CREATE DATABASE](sql-ref-syntax-ddl-create-database.html)
+- [CREATE FUNCTION](sql-ref-syntax-ddl-create-function.html)
+- [CREATE TABLE](sql-ref-syntax-ddl-create-table.html)
+- [CREATE VIEW](sql-ref-syntax-ddl-create-view.html)
+- [DROP DATABASE](sql-ref-syntax-ddl-drop-database.html)
+- [DROP FUNCTION](sql-ref-syntax-ddl-drop-function.html)
+- [DROP TABLE](sql-ref-syntax-ddl-drop-table.html)
+- [DROP VIEW](sql-ref-syntax-ddl-drop-view.html)
+- [REPAIR TABLE](sql-ref-syntax-ddl-repair-table.html)
+- [TRUNCATE TABLE](sql-ref-syntax-ddl-truncate-table.html)
+- [USE DATABASE](sql-ref-syntax-qry-select-usedb.html)
+
+### DML Statements
+- [INSERT INTO](sql-ref-syntax-dml-insert-into.html)
+- [INSERT OVERWRITE](sql-ref-syntax-dml-insert-overwrite-table.html)
+- [INSERT OVERWRITE 
DIRECTORY](sql-ref-syntax-dml-insert-overwrite-directory.html)
+- [INSERT OVERWRITE DIRECTORY with Hive 
format](sql-ref-syntax-dml-insert-overwrite-directory-hive.html)
+- [LOAD](sql-ref-syntax-dml-load.html

[spark] branch branch-3.0 updated: [SPARK-31292][CORE][SQL] Replace toSet.toSeq with distinct for readability

2020-03-28 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 71dcf66  [SPARK-31292][CORE][SQL] Replace toSet.toSeq with distinct 
for readability
71dcf66 is described below

commit 71dcf6691a48dd622b83e128aa9be30f757b45ec
Author: Kengo Seki 
AuthorDate: Sun Mar 29 08:48:08 2020 +0900

[SPARK-31292][CORE][SQL] Replace toSet.toSeq with distinct for readability

### What changes were proposed in this pull request?

This PR replaces the method calls of `toSet.toSeq` with `distinct`.

### Why are the changes needed?

`toSet.toSeq` is intended to make its elements unique but a bit verbose. 
Using `distinct` instead is easier to understand and improves readability.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Tested with the existing unit tests and found no problem.

Closes #28062 from sekikn/SPARK-31292.

Authored-by: Kengo Seki 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 0b237bd615da4b2c2b781e72af4ad3a4f2951444)
Signed-off-by: Takeshi Yamamuro 
---
 core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala   | 2 +-
 core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala | 2 +-
 core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala | 2 +-
 core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala  | 2 +-
 .../test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala  | 2 +-
 sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala  | 2 +-
 6 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala 
b/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala
index 7dd7fc1..994b363 100644
--- a/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala
+++ b/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala
@@ -149,7 +149,7 @@ private[spark] object ResourceUtils extends Logging {
   def listResourceIds(sparkConf: SparkConf, componentName: String): 
Seq[ResourceID] = {
 sparkConf.getAllWithPrefix(s"$componentName.$RESOURCE_PREFIX.").map { case 
(key, _) =>
   key.substring(0, key.indexOf('.'))
-}.toSet.toSeq.map(name => new ResourceID(componentName, name))
+}.distinct.map(name => new ResourceID(componentName, name))
   }
 
   def parseAllResourceRequests(
diff --git a/core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala 
b/core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala
index 857c89d..15f2161 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala
@@ -69,7 +69,7 @@ private[spark] class ResultTask[T, U](
   with Serializable {
 
   @transient private[this] val preferredLocs: Seq[TaskLocation] = {
-if (locs == null) Nil else locs.toSet.toSeq
+if (locs == null) Nil else locs.distinct
   }
 
   override def runTask(context: TaskContext): U = {
diff --git 
a/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala 
b/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala
index 4c0c30a..a0ba920 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala
@@ -71,7 +71,7 @@ private[spark] class ShuffleMapTask(
   }
 
   @transient private val preferredLocs: Seq[TaskLocation] = {
-if (locs == null) Nil else locs.toSet.toSeq
+if (locs == null) Nil else locs.distinct
   }
 
   override def runTask(context: TaskContext): MapStatus = {
diff --git 
a/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala 
b/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
index 6a1d460..ed30473 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
@@ -408,7 +408,7 @@ private[spark] class TaskSchedulerImpl(
 newExecAvail = true
   }
 }
-val hosts = offers.map(_.host).toSet.toSeq
+val hosts = offers.map(_.host).distinct
 for ((host, Some(rack)) <- hosts.zip(getRacksForHosts(hosts))) {
   hostsByRack.getOrElseUpdate(rack, new HashSet[String]()) += host
 }
diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala
index e7ecf84..a083cdb 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala
@@ -758,7 +758,7 @@ class TaskSche

[spark] branch branch-3.0 updated: [SPARK-31292][CORE][SQL] Replace toSet.toSeq with distinct for readability

2020-03-28 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 71dcf66  [SPARK-31292][CORE][SQL] Replace toSet.toSeq with distinct 
for readability
71dcf66 is described below

commit 71dcf6691a48dd622b83e128aa9be30f757b45ec
Author: Kengo Seki 
AuthorDate: Sun Mar 29 08:48:08 2020 +0900

[SPARK-31292][CORE][SQL] Replace toSet.toSeq with distinct for readability

### What changes were proposed in this pull request?

This PR replaces the method calls of `toSet.toSeq` with `distinct`.

### Why are the changes needed?

`toSet.toSeq` is intended to make its elements unique but a bit verbose. 
Using `distinct` instead is easier to understand and improves readability.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Tested with the existing unit tests and found no problem.

Closes #28062 from sekikn/SPARK-31292.

Authored-by: Kengo Seki 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 0b237bd615da4b2c2b781e72af4ad3a4f2951444)
Signed-off-by: Takeshi Yamamuro 
---
 core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala   | 2 +-
 core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala | 2 +-
 core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala | 2 +-
 core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala  | 2 +-
 .../test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala  | 2 +-
 sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala  | 2 +-
 6 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala 
b/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala
index 7dd7fc1..994b363 100644
--- a/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala
+++ b/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala
@@ -149,7 +149,7 @@ private[spark] object ResourceUtils extends Logging {
   def listResourceIds(sparkConf: SparkConf, componentName: String): 
Seq[ResourceID] = {
 sparkConf.getAllWithPrefix(s"$componentName.$RESOURCE_PREFIX.").map { case 
(key, _) =>
   key.substring(0, key.indexOf('.'))
-}.toSet.toSeq.map(name => new ResourceID(componentName, name))
+}.distinct.map(name => new ResourceID(componentName, name))
   }
 
   def parseAllResourceRequests(
diff --git a/core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala 
b/core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala
index 857c89d..15f2161 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala
@@ -69,7 +69,7 @@ private[spark] class ResultTask[T, U](
   with Serializable {
 
   @transient private[this] val preferredLocs: Seq[TaskLocation] = {
-if (locs == null) Nil else locs.toSet.toSeq
+if (locs == null) Nil else locs.distinct
   }
 
   override def runTask(context: TaskContext): U = {
diff --git 
a/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala 
b/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala
index 4c0c30a..a0ba920 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala
@@ -71,7 +71,7 @@ private[spark] class ShuffleMapTask(
   }
 
   @transient private val preferredLocs: Seq[TaskLocation] = {
-if (locs == null) Nil else locs.toSet.toSeq
+if (locs == null) Nil else locs.distinct
   }
 
   override def runTask(context: TaskContext): MapStatus = {
diff --git 
a/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala 
b/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
index 6a1d460..ed30473 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
@@ -408,7 +408,7 @@ private[spark] class TaskSchedulerImpl(
 newExecAvail = true
   }
 }
-val hosts = offers.map(_.host).toSet.toSeq
+val hosts = offers.map(_.host).distinct
 for ((host, Some(rack)) <- hosts.zip(getRacksForHosts(hosts))) {
   hostsByRack.getOrElseUpdate(rack, new HashSet[String]()) += host
 }
diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala
index e7ecf84..a083cdb 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala
@@ -758,7 +758,7 @@ class TaskSche

[spark] branch master updated: [SPARK-31292][CORE][SQL] Replace toSet.toSeq with distinct for readability

2020-03-28 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0b237bd  [SPARK-31292][CORE][SQL] Replace toSet.toSeq with distinct 
for readability
0b237bd is described below

commit 0b237bd615da4b2c2b781e72af4ad3a4f2951444
Author: Kengo Seki 
AuthorDate: Sun Mar 29 08:48:08 2020 +0900

[SPARK-31292][CORE][SQL] Replace toSet.toSeq with distinct for readability

### What changes were proposed in this pull request?

This PR replaces the method calls of `toSet.toSeq` with `distinct`.

### Why are the changes needed?

`toSet.toSeq` is intended to make its elements unique but a bit verbose. 
Using `distinct` instead is easier to understand and improves readability.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Tested with the existing unit tests and found no problem.

Closes #28062 from sekikn/SPARK-31292.

Authored-by: Kengo Seki 
Signed-off-by: Takeshi Yamamuro 
---
 core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala   | 2 +-
 core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala | 2 +-
 core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala | 2 +-
 core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala  | 2 +-
 .../test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala  | 2 +-
 sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala  | 2 +-
 6 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala 
b/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala
index 36ef906..162f090 100644
--- a/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala
+++ b/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala
@@ -150,7 +150,7 @@ private[spark] object ResourceUtils extends Logging {
   def listResourceIds(sparkConf: SparkConf, componentName: String): 
Seq[ResourceID] = {
 sparkConf.getAllWithPrefix(s"$componentName.$RESOURCE_PREFIX.").map { case 
(key, _) =>
   key.substring(0, key.indexOf('.'))
-}.toSet.toSeq.map(name => new ResourceID(componentName, name))
+}.distinct.map(name => new ResourceID(componentName, name))
   }
 
   def parseAllResourceRequests(
diff --git a/core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala 
b/core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala
index 857c89d..15f2161 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala
@@ -69,7 +69,7 @@ private[spark] class ResultTask[T, U](
   with Serializable {
 
   @transient private[this] val preferredLocs: Seq[TaskLocation] = {
-if (locs == null) Nil else locs.toSet.toSeq
+if (locs == null) Nil else locs.distinct
   }
 
   override def runTask(context: TaskContext): U = {
diff --git 
a/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala 
b/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala
index 4c0c30a..a0ba920 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala
@@ -71,7 +71,7 @@ private[spark] class ShuffleMapTask(
   }
 
   @transient private val preferredLocs: Seq[TaskLocation] = {
-if (locs == null) Nil else locs.toSet.toSeq
+if (locs == null) Nil else locs.distinct
   }
 
   override def runTask(context: TaskContext): MapStatus = {
diff --git 
a/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala 
b/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
index 7e2fbb4..f0f84fe 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
@@ -487,7 +487,7 @@ private[spark] class TaskSchedulerImpl(
 newExecAvail = true
   }
 }
-val hosts = offers.map(_.host).toSet.toSeq
+val hosts = offers.map(_.host).distinct
 for ((host, Some(rack)) <- hosts.zip(getRacksForHosts(hosts))) {
   hostsByRack.getOrElseUpdate(rack, new HashSet[String]()) += host
 }
diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala
index 9ee84a8..b9a11e7 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala
@@ -761,7 +761,7 @@ class TaskSchedulerImplSuite extends SparkFunSuite with 
LocalSparkContext with B
 // that are explicitly blacklisted, plu

[spark] branch master updated (d025ddba -> 0b237bd)

2020-03-28 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d025ddba [SPARK-31238][SPARK-31284][TEST][FOLLOWUP] Fix 
readResourceOrcFile to create a local file from resource
 add 0b237bd  [SPARK-31292][CORE][SQL] Replace toSet.toSeq with distinct 
for readability

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala   | 2 +-
 core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala | 2 +-
 core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala | 2 +-
 core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala  | 2 +-
 .../test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala  | 2 +-
 sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala  | 2 +-
 6 files changed, 6 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31262][SQL][TESTS] Fix bug tests imported bracketed comments

2020-03-26 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 6f30ff4  [SPARK-31262][SQL][TESTS] Fix bug tests imported bracketed 
comments
6f30ff4 is described below

commit 6f30ff44cf2d3d347a516a0e0370d07e8de9352c
Author: beliefer 
AuthorDate: Fri Mar 27 08:09:17 2020 +0900

[SPARK-31262][SQL][TESTS] Fix bug tests imported bracketed comments

### What changes were proposed in this pull request?
This PR related to https://github.com/apache/spark/pull/27481.
If test case A uses `--IMPORT` to import test case B contains bracketed 
comments, the output can't display bracketed comments in golden files well.
The content of `nested-comments.sql` show below:
```
-- This test case just used to test imported bracketed comments.

-- the first case of bracketed comment
--QUERY-DELIMITER-START
/* This is the first example of bracketed comment.
SELECT 'ommented out content' AS first;
*/
SELECT 'selected content' AS first;
--QUERY-DELIMITER-END
```
The test case `comments.sql` imports `nested-comments.sql` below:
`--IMPORT nested-comments.sql`
Before this PR, the output will be:
```
-- !query
/* This is the first example of bracketed comment.
SELECT 'ommented out content' AS first
-- !query schema
struct<>
-- !query output
org.apache.spark.sql.catalyst.parser.ParseException

mismatched input '/' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 
'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 
'DROP',
'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 
'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 
'REVOKE', '
ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 
'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0)

== SQL ==
/* This is the first example of bracketed comment.
^^^
SELECT 'ommented out content' AS first

-- !query
*/
SELECT 'selected content' AS first
-- !query schema
struct<>
-- !query output
org.apache.spark.sql.catalyst.parser.ParseException

extraneous input '*/' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 
'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 
'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 
'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 
'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 
'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 
0)

== SQL ==
*/
^^^
SELECT 'selected content' AS first
```
After this PR, the output will be:
```
-- !query
/* This is the first example of bracketed comment.
SELECT 'ommented out content' AS first;
*/
SELECT 'selected content' AS first
-- !query schema
struct
-- !query output
selected content
```

### Why are the changes needed?
Golden files can't display the bracketed comments in imported test cases.

### Does this PR introduce any user-facing change?
'No'.

### How was this patch tested?
New UT.

Closes #28018 from beliefer/fix-bug-tests-imported-bracketed-comments.

Authored-by: beliefer 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 9e0fee933e62eb309d4aa32bb1e5126125d0bf9f)
Signed-off-by: Takeshi Yamamuro 
---
 .../src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala  | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
index 6c66166..848966a 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
@@ -256,20 +256,23 @@ class SQLQueryTestSuite extends QueryTest with 
SharedSparkSession {
 def splitWithSemicolon(seq: Seq[String]) = {
   seq.mkString("\n").split("(?<=[^]);")
 }
-val input = fileToString(new File(testCase.inputFile))
 
-val (comments, code) = input.split("\n").partition { line =>
+def splitCommentsAndCodes(input: String) = input.split("\n").partition { 
line =>
   val newLine = line.trim
   newLine.startsWith("--") && !newLine.startsWith("--QUERY-DELIMITER")
 }
 
+val input = fileToString(new File(testCase.inputFile))
+
+val (comments, code) = splitCommentsAndCodes(input)
+
 // If `--IMPORT` found, load code from another test case file, then insert 

[spark] branch master updated: [SPARK-31262][SQL][TESTS] Fix bug tests imported bracketed comments

2020-03-26 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9e0fee9  [SPARK-31262][SQL][TESTS] Fix bug tests imported bracketed 
comments
9e0fee9 is described below

commit 9e0fee933e62eb309d4aa32bb1e5126125d0bf9f
Author: beliefer 
AuthorDate: Fri Mar 27 08:09:17 2020 +0900

[SPARK-31262][SQL][TESTS] Fix bug tests imported bracketed comments

### What changes were proposed in this pull request?
This PR related to https://github.com/apache/spark/pull/27481.
If test case A uses `--IMPORT` to import test case B contains bracketed 
comments, the output can't display bracketed comments in golden files well.
The content of `nested-comments.sql` show below:
```
-- This test case just used to test imported bracketed comments.

-- the first case of bracketed comment
--QUERY-DELIMITER-START
/* This is the first example of bracketed comment.
SELECT 'ommented out content' AS first;
*/
SELECT 'selected content' AS first;
--QUERY-DELIMITER-END
```
The test case `comments.sql` imports `nested-comments.sql` below:
`--IMPORT nested-comments.sql`
Before this PR, the output will be:
```
-- !query
/* This is the first example of bracketed comment.
SELECT 'ommented out content' AS first
-- !query schema
struct<>
-- !query output
org.apache.spark.sql.catalyst.parser.ParseException

mismatched input '/' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 
'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 
'DROP',
'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 
'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 
'REVOKE', '
ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 
'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0)

== SQL ==
/* This is the first example of bracketed comment.
^^^
SELECT 'ommented out content' AS first

-- !query
*/
SELECT 'selected content' AS first
-- !query schema
struct<>
-- !query output
org.apache.spark.sql.catalyst.parser.ParseException

extraneous input '*/' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 
'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 
'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 
'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 
'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 
'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 
0)

== SQL ==
*/
^^^
SELECT 'selected content' AS first
```
After this PR, the output will be:
```
-- !query
/* This is the first example of bracketed comment.
SELECT 'ommented out content' AS first;
*/
SELECT 'selected content' AS first
-- !query schema
struct
-- !query output
selected content
```

### Why are the changes needed?
Golden files can't display the bracketed comments in imported test cases.

### Does this PR introduce any user-facing change?
'No'.

### How was this patch tested?
New UT.

Closes #28018 from beliefer/fix-bug-tests-imported-bracketed-comments.

Authored-by: beliefer 
Signed-off-by: Takeshi Yamamuro 
---
 .../src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala  | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
index 6c66166..848966a 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
@@ -256,20 +256,23 @@ class SQLQueryTestSuite extends QueryTest with 
SharedSparkSession {
 def splitWithSemicolon(seq: Seq[String]) = {
   seq.mkString("\n").split("(?<=[^]);")
 }
-val input = fileToString(new File(testCase.inputFile))
 
-val (comments, code) = input.split("\n").partition { line =>
+def splitCommentsAndCodes(input: String) = input.split("\n").partition { 
line =>
   val newLine = line.trim
   newLine.startsWith("--") && !newLine.startsWith("--QUERY-DELIMITER")
 }
 
+val input = fileToString(new File(testCase.inputFile))
+
+val (comments, code) = splitCommentsAndCodes(input)
+
 // If `--IMPORT` found, load code from another test case file, then insert 
them
 // into the head in this test.
 val importedTestCaseName = comments.filter(_.startsWith("--IMPOR

[spark] branch branch-3.0 updated: [SPARK-30292][SQL][FOLLOWUP] ansi cast from strings to integral numbers (byte/short/int/long) should fail with fraction

2020-03-19 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 1a5cd16  [SPARK-30292][SQL][FOLLOWUP] ansi cast from strings to 
integral numbers (byte/short/int/long) should fail with fraction
1a5cd16 is described below

commit 1a5cd167e0901948d68d6c7880d39966e74d10b3
Author: Wenchen Fan 
AuthorDate: Fri Mar 20 00:52:09 2020 +0900

[SPARK-30292][SQL][FOLLOWUP] ansi cast from strings to integral numbers 
(byte/short/int/long) should fail with fraction

### What changes were proposed in this pull request?

This is a followup of https://github.com/apache/spark/pull/26933

Fraction string like "1.23" is definitely not a valid integral format and 
we should fail to do the cast under the ANSI mode.

### Why are the changes needed?

correct the ANSI cast behavior from string to integral

### Does this PR introduce any user-facing change?

Yes under ANSI mode, but ANSI mode is off by default.

### How was this patch tested?

new test

Closes #27957 from cloud-fan/ansi.

Authored-by: Wenchen Fan 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit ac262cb27255f989f6a6dd864bd5114a928b96da)
Signed-off-by: Takeshi Yamamuro 
---
 .../org/apache/spark/unsafe/types/UTF8String.java  | 24 +-
 .../spark/sql/catalyst/expressions/CastSuite.scala |  2 ++
 2 files changed, 16 insertions(+), 10 deletions(-)

diff --git 
a/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java 
b/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
index c538466..186597f 100644
--- a/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
+++ b/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
@@ -1105,6 +1105,10 @@ public final class UTF8String implements 
Comparable, Externalizable,
* @return true if the parsing was successful else false
*/
   public boolean toLong(LongWrapper toLongResult) {
+return toLong(toLongResult, true);
+  }
+
+  private boolean toLong(LongWrapper toLongResult, boolean allowDecimal) {
 int offset = 0;
 while (offset < this.numBytes && getByte(offset) <= ' ') offset++;
 if (offset == this.numBytes) return false;
@@ -1129,7 +1133,7 @@ public final class UTF8String implements 
Comparable, Externalizable,
 while (offset <= end) {
   b = getByte(offset);
   offset++;
-  if (b == separator) {
+  if (b == separator && allowDecimal) {
 // We allow decimals and will return a truncated integral in that case.
 // Therefore we won't throw an exception here (checking the fractional
 // part happens below.)
@@ -1198,6 +1202,10 @@ public final class UTF8String implements 
Comparable, Externalizable,
* @return true if the parsing was successful else false
*/
   public boolean toInt(IntWrapper intWrapper) {
+return toInt(intWrapper, true);
+  }
+
+  private boolean toInt(IntWrapper intWrapper, boolean allowDecimal) {
 int offset = 0;
 while (offset < this.numBytes && getByte(offset) <= ' ') offset++;
 if (offset == this.numBytes) return false;
@@ -1222,7 +1230,7 @@ public final class UTF8String implements 
Comparable, Externalizable,
 while (offset <= end) {
   b = getByte(offset);
   offset++;
-  if (b == separator) {
+  if (b == separator && allowDecimal) {
 // We allow decimals and will return a truncated integral in that case.
 // Therefore we won't throw an exception here (checking the fractional
 // part happens below.)
@@ -1276,9 +1284,7 @@ public final class UTF8String implements 
Comparable, Externalizable,
 if (toInt(intWrapper)) {
   int intValue = intWrapper.value;
   short result = (short) intValue;
-  if (result == intValue) {
-return true;
-  }
+  return result == intValue;
 }
 return false;
   }
@@ -1287,9 +1293,7 @@ public final class UTF8String implements 
Comparable, Externalizable,
 if (toInt(intWrapper)) {
   int intValue = intWrapper.value;
   byte result = (byte) intValue;
-  if (result == intValue) {
-return true;
-  }
+  return result == intValue;
 }
 return false;
   }
@@ -1302,7 +1306,7 @@ public final class UTF8String implements 
Comparable, Externalizable,
*/
   public long toLongExact() {
 LongWrapper result = new LongWrapper();
-if (toLong(result)) {
+if (toLong(result, false)) {
   return result.value;
 }
 throw new NumberFormatException("invalid input syntax for type numeric: " 
+ this);
@@ -1316,7 +1320,7 @@ public final class UTF8String implements 
Comparable, Externalizable,
*

[spark] branch branch-3.0 updated: [SPARK-30292][SQL][FOLLOWUP] ansi cast from strings to integral numbers (byte/short/int/long) should fail with fraction

2020-03-19 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 1a5cd16  [SPARK-30292][SQL][FOLLOWUP] ansi cast from strings to 
integral numbers (byte/short/int/long) should fail with fraction
1a5cd16 is described below

commit 1a5cd167e0901948d68d6c7880d39966e74d10b3
Author: Wenchen Fan 
AuthorDate: Fri Mar 20 00:52:09 2020 +0900

[SPARK-30292][SQL][FOLLOWUP] ansi cast from strings to integral numbers 
(byte/short/int/long) should fail with fraction

### What changes were proposed in this pull request?

This is a followup of https://github.com/apache/spark/pull/26933

Fraction string like "1.23" is definitely not a valid integral format and 
we should fail to do the cast under the ANSI mode.

### Why are the changes needed?

correct the ANSI cast behavior from string to integral

### Does this PR introduce any user-facing change?

Yes under ANSI mode, but ANSI mode is off by default.

### How was this patch tested?

new test

Closes #27957 from cloud-fan/ansi.

Authored-by: Wenchen Fan 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit ac262cb27255f989f6a6dd864bd5114a928b96da)
Signed-off-by: Takeshi Yamamuro 
---
 .../org/apache/spark/unsafe/types/UTF8String.java  | 24 +-
 .../spark/sql/catalyst/expressions/CastSuite.scala |  2 ++
 2 files changed, 16 insertions(+), 10 deletions(-)

diff --git 
a/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java 
b/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
index c538466..186597f 100644
--- a/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
+++ b/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
@@ -1105,6 +1105,10 @@ public final class UTF8String implements 
Comparable, Externalizable,
* @return true if the parsing was successful else false
*/
   public boolean toLong(LongWrapper toLongResult) {
+return toLong(toLongResult, true);
+  }
+
+  private boolean toLong(LongWrapper toLongResult, boolean allowDecimal) {
 int offset = 0;
 while (offset < this.numBytes && getByte(offset) <= ' ') offset++;
 if (offset == this.numBytes) return false;
@@ -1129,7 +1133,7 @@ public final class UTF8String implements 
Comparable, Externalizable,
 while (offset <= end) {
   b = getByte(offset);
   offset++;
-  if (b == separator) {
+  if (b == separator && allowDecimal) {
 // We allow decimals and will return a truncated integral in that case.
 // Therefore we won't throw an exception here (checking the fractional
 // part happens below.)
@@ -1198,6 +1202,10 @@ public final class UTF8String implements 
Comparable, Externalizable,
* @return true if the parsing was successful else false
*/
   public boolean toInt(IntWrapper intWrapper) {
+return toInt(intWrapper, true);
+  }
+
+  private boolean toInt(IntWrapper intWrapper, boolean allowDecimal) {
 int offset = 0;
 while (offset < this.numBytes && getByte(offset) <= ' ') offset++;
 if (offset == this.numBytes) return false;
@@ -1222,7 +1230,7 @@ public final class UTF8String implements 
Comparable, Externalizable,
 while (offset <= end) {
   b = getByte(offset);
   offset++;
-  if (b == separator) {
+  if (b == separator && allowDecimal) {
 // We allow decimals and will return a truncated integral in that case.
 // Therefore we won't throw an exception here (checking the fractional
 // part happens below.)
@@ -1276,9 +1284,7 @@ public final class UTF8String implements 
Comparable, Externalizable,
 if (toInt(intWrapper)) {
   int intValue = intWrapper.value;
   short result = (short) intValue;
-  if (result == intValue) {
-return true;
-  }
+  return result == intValue;
 }
 return false;
   }
@@ -1287,9 +1293,7 @@ public final class UTF8String implements 
Comparable, Externalizable,
 if (toInt(intWrapper)) {
   int intValue = intWrapper.value;
   byte result = (byte) intValue;
-  if (result == intValue) {
-return true;
-  }
+  return result == intValue;
 }
 return false;
   }
@@ -1302,7 +1306,7 @@ public final class UTF8String implements 
Comparable, Externalizable,
*/
   public long toLongExact() {
 LongWrapper result = new LongWrapper();
-if (toLong(result)) {
+if (toLong(result, false)) {
   return result.value;
 }
 throw new NumberFormatException("invalid input syntax for type numeric: " 
+ this);
@@ -1316,7 +1320,7 @@ public final class UTF8String implements 
Comparable, Externalizable,
*

[spark] branch master updated (a177628 -> ac262cb)

2020-03-19 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a177628  [SPARK-31187][SQL] Sort the whole-stage codegen debug output 
by codegenStageId
 add ac262cb  [SPARK-30292][SQL][FOLLOWUP] ansi cast from strings to 
integral numbers (byte/short/int/long) should fail with fraction

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/unsafe/types/UTF8String.java  | 24 +-
 .../spark/sql/catalyst/expressions/CastSuite.scala |  2 ++
 2 files changed, 16 insertions(+), 10 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (a177628 -> ac262cb)

2020-03-19 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a177628  [SPARK-31187][SQL] Sort the whole-stage codegen debug output 
by codegenStageId
 add ac262cb  [SPARK-30292][SQL][FOLLOWUP] ansi cast from strings to 
integral numbers (byte/short/int/long) should fail with fraction

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/unsafe/types/UTF8String.java  | 24 +-
 .../spark/sql/catalyst/expressions/CastSuite.scala |  2 ++
 2 files changed, 16 insertions(+), 10 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31187][SQL] Sort the whole-stage codegen debug output by codegenStageId

2020-03-19 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new a8c08b1  [SPARK-31187][SQL] Sort the whole-stage codegen debug output 
by codegenStageId
a8c08b1 is described below

commit a8c08b1d81aefd1e3d7f4616b76e2285f9981cc7
Author: Kris Mok 
AuthorDate: Thu Mar 19 20:53:01 2020 +0900

[SPARK-31187][SQL] Sort the whole-stage codegen debug output by 
codegenStageId

### What changes were proposed in this pull request?

Spark SQL's whole-stage codegen (WSCG) supports dumping the generated code 
to help with debugging. One way to get the generated code is through 
`df.queryExecution.debug.codegen`, or SQL `EXPLAIN CODEGEN` statement.

The generated code is currently printed without specific ordering, which 
can make debugging a bit annoying. This PR makes a minor improvement to sort 
the codegen dump by the `codegenStageId`, ascending.

After this change, the following query:
```scala
spark.range(10).agg(sum('id)).queryExecution.debug.codegen
```
will always dump the generated code in a natural, stable order. A version 
of this example with shorter output is:
```

spark.range(10).agg(sum('id)).queryExecution.debug.codegenToSeq.map(_._1).foreach(println)
*(1) HashAggregate(keys=[], functions=[partial_sum(id#8L)], 
output=[sum#15L])
+- *(1) Range (0, 10, step=1, splits=16)

*(2) HashAggregate(keys=[], functions=[sum(id#8L)], output=[sum(id)#12L])
+- Exchange SinglePartition, true, [id=#30]
   +- *(1) HashAggregate(keys=[], functions=[partial_sum(id#8L)], 
output=[sum#15L])
  +- *(1) Range (0, 10, step=1, splits=16)
```

The number of codegen stages within a single SQL query tends to be very 
small, most likely < 50, so the overhead of adding the sorting shouldn't be 
significant.

### Why are the changes needed?

Minor improvement to aid WSCG debugging.

### Does this PR introduce any user-facing change?

No user-facing change for end-users; minor change for developers who debug 
WSCG generated code.

### How was this patch tested?

Manually tested the output; all other tests still pass.

Closes #27955 from rednaxelafx/codegen.

Authored-by: Kris Mok 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit a1776288f48d450fea28f50fef78fd6aa10a8160)
Signed-off-by: Takeshi Yamamuro 
---
 .../src/main/scala/org/apache/spark/sql/execution/debug/package.scala   | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala
index 6a57ef2..6c40104 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala
@@ -113,7 +113,7 @@ package object debug {
 s
   case s => s
 }
-codegenSubtrees.toSeq.map { subtree =>
+codegenSubtrees.toSeq.sortBy(_.codegenStageId).map { subtree =>
   val (_, source) = subtree.doCodeGen()
   val codeStats = try {
 CodeGenerator.compile(source)._2


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-31187][SQL] Sort the whole-stage codegen debug output by codegenStageId

2020-03-19 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a177628  [SPARK-31187][SQL] Sort the whole-stage codegen debug output 
by codegenStageId
a177628 is described below

commit a1776288f48d450fea28f50fef78fd6aa10a8160
Author: Kris Mok 
AuthorDate: Thu Mar 19 20:53:01 2020 +0900

[SPARK-31187][SQL] Sort the whole-stage codegen debug output by 
codegenStageId

### What changes were proposed in this pull request?

Spark SQL's whole-stage codegen (WSCG) supports dumping the generated code 
to help with debugging. One way to get the generated code is through 
`df.queryExecution.debug.codegen`, or SQL `EXPLAIN CODEGEN` statement.

The generated code is currently printed without specific ordering, which 
can make debugging a bit annoying. This PR makes a minor improvement to sort 
the codegen dump by the `codegenStageId`, ascending.

After this change, the following query:
```scala
spark.range(10).agg(sum('id)).queryExecution.debug.codegen
```
will always dump the generated code in a natural, stable order. A version 
of this example with shorter output is:
```

spark.range(10).agg(sum('id)).queryExecution.debug.codegenToSeq.map(_._1).foreach(println)
*(1) HashAggregate(keys=[], functions=[partial_sum(id#8L)], 
output=[sum#15L])
+- *(1) Range (0, 10, step=1, splits=16)

*(2) HashAggregate(keys=[], functions=[sum(id#8L)], output=[sum(id)#12L])
+- Exchange SinglePartition, true, [id=#30]
   +- *(1) HashAggregate(keys=[], functions=[partial_sum(id#8L)], 
output=[sum#15L])
  +- *(1) Range (0, 10, step=1, splits=16)
```

The number of codegen stages within a single SQL query tends to be very 
small, most likely < 50, so the overhead of adding the sorting shouldn't be 
significant.

### Why are the changes needed?

Minor improvement to aid WSCG debugging.

### Does this PR introduce any user-facing change?

No user-facing change for end-users; minor change for developers who debug 
WSCG generated code.

### How was this patch tested?

Manually tested the output; all other tests still pass.

Closes #27955 from rednaxelafx/codegen.

Authored-by: Kris Mok 
Signed-off-by: Takeshi Yamamuro 
---
 .../src/main/scala/org/apache/spark/sql/execution/debug/package.scala   | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala
index 6a57ef2..6c40104 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala
@@ -113,7 +113,7 @@ package object debug {
 s
   case s => s
 }
-codegenSubtrees.toSeq.map { subtree =>
+codegenSubtrees.toSeq.sortBy(_.codegenStageId).map { subtree =>
   val (_, source) = subtree.doCodeGen()
   val codeStats = try {
 CodeGenerator.compile(source)._2


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31171][SQL][FOLLOWUP] update document

2020-03-18 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 39e9b55  [SPARK-31171][SQL][FOLLOWUP] update document
39e9b55 is described below

commit 39e9b554ea171e71ea152c1d3a59f72e8918dfd2
Author: Wenchen Fan 
AuthorDate: Thu Mar 19 07:29:31 2020 +0900

[SPARK-31171][SQL][FOLLOWUP] update document

### What changes were proposed in this pull request?

A followup of https://github.com/apache/spark/pull/27936 to update document.

### Why are the changes needed?

correct document

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

N/A

Closes #27950 from cloud-fan/null.

Authored-by: Wenchen Fan 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 8643e5d9c50294f59b01988d99d447a38776178e)
Signed-off-by: Takeshi Yamamuro 
---
 docs/sql-ref-ansi-compliance.md| 7 ++-
 .../spark/sql/catalyst/expressions/collectionOperations.scala  | 6 +++---
 sql/core/src/main/scala/org/apache/spark/sql/functions.scala   | 4 
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/docs/sql-ref-ansi-compliance.md b/docs/sql-ref-ansi-compliance.md
index 27e60b4..bc5bde6 100644
--- a/docs/sql-ref-ansi-compliance.md
+++ b/docs/sql-ref-ansi-compliance.md
@@ -21,7 +21,7 @@ license: |
 
 Since Spark 3.0, Spark SQL introduces two experimental options to comply with 
the SQL standard: `spark.sql.ansi.enabled` and 
`spark.sql.storeAssignmentPolicy` (See a table below for details).
 
-When `spark.sql.ansi.enabled` is set to `true`, Spark SQL follows the standard 
in basic behaviours (e.g., arithmetic operations, type conversion, and SQL 
parsing).
+When `spark.sql.ansi.enabled` is set to `true`, Spark SQL follows the standard 
in basic behaviours (e.g., arithmetic operations, type conversion, SQL 
functions and SQL parsing).
 Moreover, Spark SQL has an independent option to control implicit casting 
behaviours when inserting rows in a table.
 The casting behaviours are defined as store assignment rules in the standard.
 
@@ -140,6 +140,11 @@ SELECT * FROM t;
 
 {% endhighlight %}
 
+### SQL Functions
+
+The behavior of some SQL functions can be different under ANSI mode 
(`spark.sql.ansi.enabled=true`).
+  - `size`: This function returns null for null input under ANSI mode.
+
 ### SQL Keywords
 
 When `spark.sql.ansi.enabled` is true, Spark SQL will use the ANSI mode parser.
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
index 6d95909..8b61bc4 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
@@ -77,9 +77,9 @@ trait BinaryArrayExpressionWithImplicitCast extends 
BinaryExpression
 @ExpressionDescription(
   usage = """
 _FUNC_(expr) - Returns the size of an array or a map.
-The function returns -1 if its input is null and 
spark.sql.legacy.sizeOfNull is set to true.
-If spark.sql.legacy.sizeOfNull is set to false, the function returns null 
for null input.
-By default, the spark.sql.legacy.sizeOfNull parameter is set to true.
+The function returns null for null input if spark.sql.legacy.sizeOfNull is 
set to false or
+spark.sql.ansi.enabled is set to true. Otherwise, the function returns -1 
for null input.
+With the default settings, the function returns -1 for null input.
   """,
   examples = """
 Examples:
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
index e2d3d55..69383d4 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
@@ -3957,6 +3957,10 @@ object functions {
   /**
* Returns length of array or map.
*
+   * The function returns null for null input if spark.sql.legacy.sizeOfNull 
is set to false or
+   * spark.sql.ansi.enabled is set to true. Otherwise, the function returns -1 
for null input.
+   * With the default settings, the function returns -1 for null input.
+   *
* @group collection_funcs
* @since 1.5.0
*/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-31171][SQL][FOLLOWUP] update document

2020-03-18 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8643e5d  [SPARK-31171][SQL][FOLLOWUP] update document
8643e5d is described below

commit 8643e5d9c50294f59b01988d99d447a38776178e
Author: Wenchen Fan 
AuthorDate: Thu Mar 19 07:29:31 2020 +0900

[SPARK-31171][SQL][FOLLOWUP] update document

### What changes were proposed in this pull request?

A followup of https://github.com/apache/spark/pull/27936 to update document.

### Why are the changes needed?

correct document

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

N/A

Closes #27950 from cloud-fan/null.

Authored-by: Wenchen Fan 
Signed-off-by: Takeshi Yamamuro 
---
 docs/sql-ref-ansi-compliance.md| 7 ++-
 .../spark/sql/catalyst/expressions/collectionOperations.scala  | 6 +++---
 sql/core/src/main/scala/org/apache/spark/sql/functions.scala   | 4 
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/docs/sql-ref-ansi-compliance.md b/docs/sql-ref-ansi-compliance.md
index 27e60b4..bc5bde6 100644
--- a/docs/sql-ref-ansi-compliance.md
+++ b/docs/sql-ref-ansi-compliance.md
@@ -21,7 +21,7 @@ license: |
 
 Since Spark 3.0, Spark SQL introduces two experimental options to comply with 
the SQL standard: `spark.sql.ansi.enabled` and 
`spark.sql.storeAssignmentPolicy` (See a table below for details).
 
-When `spark.sql.ansi.enabled` is set to `true`, Spark SQL follows the standard 
in basic behaviours (e.g., arithmetic operations, type conversion, and SQL 
parsing).
+When `spark.sql.ansi.enabled` is set to `true`, Spark SQL follows the standard 
in basic behaviours (e.g., arithmetic operations, type conversion, SQL 
functions and SQL parsing).
 Moreover, Spark SQL has an independent option to control implicit casting 
behaviours when inserting rows in a table.
 The casting behaviours are defined as store assignment rules in the standard.
 
@@ -140,6 +140,11 @@ SELECT * FROM t;
 
 {% endhighlight %}
 
+### SQL Functions
+
+The behavior of some SQL functions can be different under ANSI mode 
(`spark.sql.ansi.enabled=true`).
+  - `size`: This function returns null for null input under ANSI mode.
+
 ### SQL Keywords
 
 When `spark.sql.ansi.enabled` is true, Spark SQL will use the ANSI mode parser.
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
index 6d95909..8b61bc4 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
@@ -77,9 +77,9 @@ trait BinaryArrayExpressionWithImplicitCast extends 
BinaryExpression
 @ExpressionDescription(
   usage = """
 _FUNC_(expr) - Returns the size of an array or a map.
-The function returns -1 if its input is null and 
spark.sql.legacy.sizeOfNull is set to true.
-If spark.sql.legacy.sizeOfNull is set to false, the function returns null 
for null input.
-By default, the spark.sql.legacy.sizeOfNull parameter is set to true.
+The function returns null for null input if spark.sql.legacy.sizeOfNull is 
set to false or
+spark.sql.ansi.enabled is set to true. Otherwise, the function returns -1 
for null input.
+With the default settings, the function returns -1 for null input.
   """,
   examples = """
 Examples:
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
index 5603f20..6e189df 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
@@ -3980,6 +3980,10 @@ object functions {
   /**
* Returns length of array or map.
*
+   * The function returns null for null input if spark.sql.legacy.sizeOfNull 
is set to false or
+   * spark.sql.ansi.enabled is set to true. Otherwise, the function returns -1 
for null input.
+   * With the default settings, the function returns -1 for null input.
+   *
* @group collection_funcs
* @since 1.5.0
*/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31150][SQL][FOLLOWUP] handle ' as escape for text

2020-03-18 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new aba7a09  [SPARK-31150][SQL][FOLLOWUP] handle ' as escape for text
aba7a09 is described below

commit aba7a09da53425481893ce6d21281dc85874c619
Author: Kent Yao 
AuthorDate: Thu Mar 19 07:27:06 2020 +0900

[SPARK-31150][SQL][FOLLOWUP] handle ' as escape for text

### What changes were proposed in this pull request?

pattern `''` means literal `'`

```sql
select date_format(to_timestamp("1904-01-23 15:02:01", 'y-MM-dd 
HH:mm:ss'), "y-MM-dd HH:mm:ss''S");
5377-02-14 06:27:19'00519
```

https://github.com/apache/spark/commit/0946a9514f56565c78b0555383c1ece14aaf2b7b 
missed this case and this pr add it back.

### Why are the changes needed?

bugfix

### Does this PR introduce any user-facing change?

no
### How was this patch tested?

add ut

Closes #27949 from yaooqinn/SPARK-31150-2.

Authored-by: Kent Yao 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 3d695954e53038f978bebcb3e798fa8728d1)
Signed-off-by: Takeshi Yamamuro 
---
 .../catalyst/util/DateTimeFormatterHelper.scala|  4 +++
 .../test/resources/sql-tests/inputs/datetime.sql   |  5 
 .../resources/sql-tests/results/datetime.sql.out   | 34 +-
 3 files changed, 42 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
index 72bae28..4ed618e 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
@@ -117,6 +117,10 @@ private object DateTimeFormatterHelper {
   pattern: String): DateTimeFormatterBuilder = {
 val builder = createBuilder()
 pattern.split("'").zipWithIndex.foreach {
+  // Split string starting with the regex itself which is `'` here will 
produce an extra empty
+  // string at res(0). So when the first element here is empty string we 
do not need append `'`
+  // literal to the DateTimeFormatterBuilder.
+  case ("", idx) if idx != 0 => builder.appendLiteral("'")
   case (pattenPart, idx) if idx % 2 == 0 =>
 var rest = pattenPart
 while (rest.nonEmpty) {
diff --git a/sql/core/src/test/resources/sql-tests/inputs/datetime.sql 
b/sql/core/src/test/resources/sql-tests/inputs/datetime.sql
index a06cdfd..2c4ed64 100644
--- a/sql/core/src/test/resources/sql-tests/inputs/datetime.sql
+++ b/sql/core/src/test/resources/sql-tests/inputs/datetime.sql
@@ -107,3 +107,8 @@ select to_timestamp("S2019-10-06", "'S'-MM-dd");
 select date_format(timestamp '2019-10-06', '-MM-dd uuee');
 select date_format(timestamp '2019-10-06', '-MM-dd uucc');
 select date_format(timestamp '2019-10-06', '-MM-dd ');
+
+select to_timestamp("2019-10-06T10:11:12'12", "-MM-dd'T'HH:mm:ss''"); 
-- middle
+select to_timestamp("2019-10-06T10:11:12'", "-MM-dd'T'HH:mm:ss''"); -- tail
+select to_timestamp("'2019-10-06T10:11:12", "''-MM-dd'T'HH:mm:ss"); -- head
+select to_timestamp("P2019-10-06T10:11:12", "'P'-MM-dd'T'HH:mm:ss"); -- 
head but as single quote
diff --git a/sql/core/src/test/resources/sql-tests/results/datetime.sql.out 
b/sql/core/src/test/resources/sql-tests/results/datetime.sql.out
index 714412f..f440b5f 100755
--- a/sql/core/src/test/resources/sql-tests/results/datetime.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/datetime.sql.out
@@ -1,5 +1,5 @@
 -- Automatically generated by SQLQueryTestSuite
--- Number of queries: 73
+-- Number of queries: 77
 
 
 -- !query
@@ -601,3 +601,35 @@ select date_format(timestamp '2019-10-06', '-MM-dd 
')
 struct
 -- !query output
 2019-10-06 Sunday
+
+
+-- !query
+select to_timestamp("2019-10-06T10:11:12'12", "-MM-dd'T'HH:mm:ss''")
+-- !query schema
+struct
+-- !query output
+2019-10-06 10:11:12.12
+
+
+-- !query
+select to_timestamp("2019-10-06T10:11:12'", "-MM-dd'T'HH:mm:ss''")
+-- !query schema
+struct
+-- !query output
+2019-10-06 10:11:12
+
+
+-- !query
+select to_timestamp("'2019-10-06T10:11:12", "''-MM-dd'T'HH:mm:ss")
+-- !query schema
+struct
+-- !query output
+2019-10-06 10:11:12
+
+
+-- !query
+select to_timestamp("P2019-10-06T10:11:12", "'P'-MM-dd'T'HH:mm:ss")
+-- !query schema
+struct
+-- !query output
+2019-10-06 10:11:12


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-31150][SQL][FOLLOWUP] handle ' as escape for text

2020-03-18 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3d69595  [SPARK-31150][SQL][FOLLOWUP] handle ' as escape for text
3d69595 is described below

commit 3d695954e53038f978bebcb3e798fa8728d1
Author: Kent Yao 
AuthorDate: Thu Mar 19 07:27:06 2020 +0900

[SPARK-31150][SQL][FOLLOWUP] handle ' as escape for text

### What changes were proposed in this pull request?

pattern `''` means literal `'`

```sql
select date_format(to_timestamp("1904-01-23 15:02:01", 'y-MM-dd 
HH:mm:ss'), "y-MM-dd HH:mm:ss''S");
5377-02-14 06:27:19'00519
```

https://github.com/apache/spark/commit/0946a9514f56565c78b0555383c1ece14aaf2b7b 
missed this case and this pr add it back.

### Why are the changes needed?

bugfix

### Does this PR introduce any user-facing change?

no
### How was this patch tested?

add ut

Closes #27949 from yaooqinn/SPARK-31150-2.

Authored-by: Kent Yao 
Signed-off-by: Takeshi Yamamuro 
---
 .../catalyst/util/DateTimeFormatterHelper.scala|  4 +++
 .../test/resources/sql-tests/inputs/datetime.sql   |  5 
 .../resources/sql-tests/results/datetime.sql.out   | 34 +-
 3 files changed, 42 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
index 72bae28..4ed618e 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
@@ -117,6 +117,10 @@ private object DateTimeFormatterHelper {
   pattern: String): DateTimeFormatterBuilder = {
 val builder = createBuilder()
 pattern.split("'").zipWithIndex.foreach {
+  // Split string starting with the regex itself which is `'` here will 
produce an extra empty
+  // string at res(0). So when the first element here is empty string we 
do not need append `'`
+  // literal to the DateTimeFormatterBuilder.
+  case ("", idx) if idx != 0 => builder.appendLiteral("'")
   case (pattenPart, idx) if idx % 2 == 0 =>
 var rest = pattenPart
 while (rest.nonEmpty) {
diff --git a/sql/core/src/test/resources/sql-tests/inputs/datetime.sql 
b/sql/core/src/test/resources/sql-tests/inputs/datetime.sql
index a06cdfd..2c4ed64 100644
--- a/sql/core/src/test/resources/sql-tests/inputs/datetime.sql
+++ b/sql/core/src/test/resources/sql-tests/inputs/datetime.sql
@@ -107,3 +107,8 @@ select to_timestamp("S2019-10-06", "'S'-MM-dd");
 select date_format(timestamp '2019-10-06', '-MM-dd uuee');
 select date_format(timestamp '2019-10-06', '-MM-dd uucc');
 select date_format(timestamp '2019-10-06', '-MM-dd ');
+
+select to_timestamp("2019-10-06T10:11:12'12", "-MM-dd'T'HH:mm:ss''"); 
-- middle
+select to_timestamp("2019-10-06T10:11:12'", "-MM-dd'T'HH:mm:ss''"); -- tail
+select to_timestamp("'2019-10-06T10:11:12", "''-MM-dd'T'HH:mm:ss"); -- head
+select to_timestamp("P2019-10-06T10:11:12", "'P'-MM-dd'T'HH:mm:ss"); -- 
head but as single quote
diff --git a/sql/core/src/test/resources/sql-tests/results/datetime.sql.out 
b/sql/core/src/test/resources/sql-tests/results/datetime.sql.out
index 714412f..f440b5f 100755
--- a/sql/core/src/test/resources/sql-tests/results/datetime.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/datetime.sql.out
@@ -1,5 +1,5 @@
 -- Automatically generated by SQLQueryTestSuite
--- Number of queries: 73
+-- Number of queries: 77
 
 
 -- !query
@@ -601,3 +601,35 @@ select date_format(timestamp '2019-10-06', '-MM-dd 
')
 struct
 -- !query output
 2019-10-06 Sunday
+
+
+-- !query
+select to_timestamp("2019-10-06T10:11:12'12", "-MM-dd'T'HH:mm:ss''")
+-- !query schema
+struct
+-- !query output
+2019-10-06 10:11:12.12
+
+
+-- !query
+select to_timestamp("2019-10-06T10:11:12'", "-MM-dd'T'HH:mm:ss''")
+-- !query schema
+struct
+-- !query output
+2019-10-06 10:11:12
+
+
+-- !query
+select to_timestamp("'2019-10-06T10:11:12", "''-MM-dd'T'HH:mm:ss")
+-- !query schema
+struct
+-- !query output
+2019-10-06 10:11:12
+
+
+-- !query
+select to_timestamp("P2019-10-06T10:11:12", "'P'-MM-dd'T'HH:mm:ss")
+-- !query schema
+struct
+-- !query output
+2019-10-06 10:11:12


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31151][SQL][DOC] Reorganize the migration guide of SQL

2020-03-14 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f83ef7d  [SPARK-31151][SQL][DOC] Reorganize the migration guide of SQL
f83ef7d is described below

commit f83ef7d143aafbbdd1bb322567481f68db72195a
Author: gatorsmile 
AuthorDate: Sun Mar 15 07:35:20 2020 +0900

[SPARK-31151][SQL][DOC] Reorganize the migration guide of SQL

### What changes were proposed in this pull request?
The current migration guide of SQL is too long for most readers to find the 
needed info. This PR is to group the items in the migration guide of Spark SQL 
based on the corresponding components.

Note. This PR does not change the contents of the migration guides. 
Attached figure is the screenshot after the change.


![screencapture-127-0-0-1-4000-sql-migration-guide-html-2020-03-14-12_00_40](https://user-images.githubusercontent.com/11567269/76688626-d3010200-65eb-11ea-9ce7-265bc90ebb2c.png)

### Why are the changes needed?
The current migration guide of SQL is too long for most readers to find the 
needed info.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
N/A

Closes #27909 from gatorsmile/migrationGuideReorg.

Authored-by: gatorsmile 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 4d4c00c1b564b57d3016ce8c3bfcffaa6e58f012)
Signed-off-by: Takeshi Yamamuro 
---
 docs/sql-migration-guide.md | 287 +++-
 1 file changed, 150 insertions(+), 137 deletions(-)

diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index 19c744c..31d5c68 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -23,92 +23,119 @@ license: |
 {:toc}
 
 ## Upgrading from Spark SQL 2.4 to 3.0
-  - Since Spark 3.0, when inserting a value into a table column with a 
different data type, the type coercion is performed as per ANSI SQL standard. 
Certain unreasonable type conversions such as converting `string` to `int` and 
`double` to `boolean` are disallowed. A runtime exception will be thrown if the 
value is out-of-range for the data type of the column. In Spark version 2.4 and 
earlier, type conversions during table insertion are allowed as long as they 
are valid `Cast`. When inse [...]
 
-  - In Spark 3.0, the deprecated methods `SQLContext.createExternalTable` and 
`SparkSession.createExternalTable` have been removed in favor of its 
replacement, `createTable`.
-
-  - In Spark 3.0, the deprecated `HiveContext` class has been removed. Use 
`SparkSession.builder.enableHiveSupport()` instead.
-
-  - Since Spark 3.0, configuration `spark.sql.crossJoin.enabled` become 
internal configuration, and is true by default, so by default spark won't raise 
exception on sql with implicit cross join.
-
-  - In Spark version 2.4 and earlier, SQL queries such as `FROM ` or 
`FROM  UNION ALL FROM ` are supported by accident. In hive-style 
`FROM  SELECT `, the `SELECT` clause is not negligible. Neither 
Hive nor Presto support this syntax. Therefore we will treat these queries as 
invalid since Spark 3.0.
+### Dataset/DataFrame APIs
 
   - Since Spark 3.0, the Dataset and DataFrame API `unionAll` is not 
deprecated any more. It is an alias for `union`.
 
-  - In Spark version 2.4 and earlier, the parser of JSON data source treats 
empty strings as null for some data types such as `IntegerType`. For 
`FloatType`, `DoubleType`, `DateType` and `TimestampType`, it fails on empty 
strings and throws exceptions. Since Spark 3.0, we disallow empty strings and 
will throw exceptions for data types except for `StringType` and `BinaryType`. 
The previous behaviour of allowing empty string can be restored by setting 
`spark.sql.legacy.json.allowEmptyStrin [...]
-
-  - Since Spark 3.0, the `from_json` functions supports two modes - 
`PERMISSIVE` and `FAILFAST`. The modes can be set via the `mode` option. The 
default mode became `PERMISSIVE`. In previous versions, behavior of `from_json` 
did not conform to either `PERMISSIVE` nor `FAILFAST`, especially in processing 
of malformed JSON records. For example, the JSON string `{"a" 1}` with the 
schema `a INT` is converted to `null` by previous versions but Spark 3.0 
converts it to `Row(null)`.
-
-  - The `ADD JAR` command previously returned a result set with the single 
value 0. It now returns an empty result set.
-
-  - In Spark version 2.4 and earlier, users can create map values with map 
type key via built-in function such as `CreateMap`, `MapFromArrays`, etc. Since 
Spark 3.0, it's not allowed to create map values with map type key with these 
built-in functions. Users can use `map_entries` function to convert map to 
array> as a workaround. In addition, users can still read 
map values with

[spark] branch branch-3.0 updated: [SPARK-31151][SQL][DOC] Reorganize the migration guide of SQL

2020-03-14 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f83ef7d  [SPARK-31151][SQL][DOC] Reorganize the migration guide of SQL
f83ef7d is described below

commit f83ef7d143aafbbdd1bb322567481f68db72195a
Author: gatorsmile 
AuthorDate: Sun Mar 15 07:35:20 2020 +0900

[SPARK-31151][SQL][DOC] Reorganize the migration guide of SQL

### What changes were proposed in this pull request?
The current migration guide of SQL is too long for most readers to find the 
needed info. This PR is to group the items in the migration guide of Spark SQL 
based on the corresponding components.

Note. This PR does not change the contents of the migration guides. 
Attached figure is the screenshot after the change.


![screencapture-127-0-0-1-4000-sql-migration-guide-html-2020-03-14-12_00_40](https://user-images.githubusercontent.com/11567269/76688626-d3010200-65eb-11ea-9ce7-265bc90ebb2c.png)

### Why are the changes needed?
The current migration guide of SQL is too long for most readers to find the 
needed info.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
N/A

Closes #27909 from gatorsmile/migrationGuideReorg.

Authored-by: gatorsmile 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 4d4c00c1b564b57d3016ce8c3bfcffaa6e58f012)
Signed-off-by: Takeshi Yamamuro 
---
 docs/sql-migration-guide.md | 287 +++-
 1 file changed, 150 insertions(+), 137 deletions(-)

diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index 19c744c..31d5c68 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -23,92 +23,119 @@ license: |
 {:toc}
 
 ## Upgrading from Spark SQL 2.4 to 3.0
-  - Since Spark 3.0, when inserting a value into a table column with a 
different data type, the type coercion is performed as per ANSI SQL standard. 
Certain unreasonable type conversions such as converting `string` to `int` and 
`double` to `boolean` are disallowed. A runtime exception will be thrown if the 
value is out-of-range for the data type of the column. In Spark version 2.4 and 
earlier, type conversions during table insertion are allowed as long as they 
are valid `Cast`. When inse [...]
 
-  - In Spark 3.0, the deprecated methods `SQLContext.createExternalTable` and 
`SparkSession.createExternalTable` have been removed in favor of its 
replacement, `createTable`.
-
-  - In Spark 3.0, the deprecated `HiveContext` class has been removed. Use 
`SparkSession.builder.enableHiveSupport()` instead.
-
-  - Since Spark 3.0, configuration `spark.sql.crossJoin.enabled` become 
internal configuration, and is true by default, so by default spark won't raise 
exception on sql with implicit cross join.
-
-  - In Spark version 2.4 and earlier, SQL queries such as `FROM ` or 
`FROM  UNION ALL FROM ` are supported by accident. In hive-style 
`FROM  SELECT `, the `SELECT` clause is not negligible. Neither 
Hive nor Presto support this syntax. Therefore we will treat these queries as 
invalid since Spark 3.0.
+### Dataset/DataFrame APIs
 
   - Since Spark 3.0, the Dataset and DataFrame API `unionAll` is not 
deprecated any more. It is an alias for `union`.
 
-  - In Spark version 2.4 and earlier, the parser of JSON data source treats 
empty strings as null for some data types such as `IntegerType`. For 
`FloatType`, `DoubleType`, `DateType` and `TimestampType`, it fails on empty 
strings and throws exceptions. Since Spark 3.0, we disallow empty strings and 
will throw exceptions for data types except for `StringType` and `BinaryType`. 
The previous behaviour of allowing empty string can be restored by setting 
`spark.sql.legacy.json.allowEmptyStrin [...]
-
-  - Since Spark 3.0, the `from_json` functions supports two modes - 
`PERMISSIVE` and `FAILFAST`. The modes can be set via the `mode` option. The 
default mode became `PERMISSIVE`. In previous versions, behavior of `from_json` 
did not conform to either `PERMISSIVE` nor `FAILFAST`, especially in processing 
of malformed JSON records. For example, the JSON string `{"a" 1}` with the 
schema `a INT` is converted to `null` by previous versions but Spark 3.0 
converts it to `Row(null)`.
-
-  - The `ADD JAR` command previously returned a result set with the single 
value 0. It now returns an empty result set.
-
-  - In Spark version 2.4 and earlier, users can create map values with map 
type key via built-in function such as `CreateMap`, `MapFromArrays`, etc. Since 
Spark 3.0, it's not allowed to create map values with map type key with these 
built-in functions. Users can use `map_entries` function to convert map to 
array> as a workaround. In addition, users can still read 
map values with

[spark] branch master updated (9628aca -> 4d4c00c)

2020-03-14 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9628aca  [MINOR][DOCS] Fix [[...]] to `...` and ... in 
documentation
 add 4d4c00c  [SPARK-31151][SQL][DOC] Reorganize the migration guide of SQL

No new revisions were added by this update.

Summary of changes:
 docs/sql-migration-guide.md | 287 +++-
 1 file changed, 150 insertions(+), 137 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-31151][SQL][DOC] Reorganize the migration guide of SQL

2020-03-14 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4d4c00c  [SPARK-31151][SQL][DOC] Reorganize the migration guide of SQL
4d4c00c is described below

commit 4d4c00c1b564b57d3016ce8c3bfcffaa6e58f012
Author: gatorsmile 
AuthorDate: Sun Mar 15 07:35:20 2020 +0900

[SPARK-31151][SQL][DOC] Reorganize the migration guide of SQL

### What changes were proposed in this pull request?
The current migration guide of SQL is too long for most readers to find the 
needed info. This PR is to group the items in the migration guide of Spark SQL 
based on the corresponding components.

Note. This PR does not change the contents of the migration guides. 
Attached figure is the screenshot after the change.


![screencapture-127-0-0-1-4000-sql-migration-guide-html-2020-03-14-12_00_40](https://user-images.githubusercontent.com/11567269/76688626-d3010200-65eb-11ea-9ce7-265bc90ebb2c.png)

### Why are the changes needed?
The current migration guide of SQL is too long for most readers to find the 
needed info.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
N/A

Closes #27909 from gatorsmile/migrationGuideReorg.

Authored-by: gatorsmile 
Signed-off-by: Takeshi Yamamuro 
---
 docs/sql-migration-guide.md | 287 +++-
 1 file changed, 150 insertions(+), 137 deletions(-)

diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index 7cca43e..d6b663d 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -26,92 +26,119 @@ license: |
   - Since Spark 3.1, grouping_id() returns long values. In Spark version 3.0 
and earlier, this function returns int values. To restore the behavior before 
Spark 3.0, you can set `spark.sql.legacy.integerGroupingId` to `true`.
 
 ## Upgrading from Spark SQL 2.4 to 3.0
-  - Since Spark 3.0, when inserting a value into a table column with a 
different data type, the type coercion is performed as per ANSI SQL standard. 
Certain unreasonable type conversions such as converting `string` to `int` and 
`double` to `boolean` are disallowed. A runtime exception will be thrown if the 
value is out-of-range for the data type of the column. In Spark version 2.4 and 
earlier, type conversions during table insertion are allowed as long as they 
are valid `Cast`. When inse [...]
 
-  - In Spark 3.0, the deprecated methods `SQLContext.createExternalTable` and 
`SparkSession.createExternalTable` have been removed in favor of its 
replacement, `createTable`.
-
-  - In Spark 3.0, the deprecated `HiveContext` class has been removed. Use 
`SparkSession.builder.enableHiveSupport()` instead.
-
-  - Since Spark 3.0, configuration `spark.sql.crossJoin.enabled` become 
internal configuration, and is true by default, so by default spark won't raise 
exception on sql with implicit cross join.
-
-  - In Spark version 2.4 and earlier, SQL queries such as `FROM ` or 
`FROM  UNION ALL FROM ` are supported by accident. In hive-style 
`FROM  SELECT `, the `SELECT` clause is not negligible. Neither 
Hive nor Presto support this syntax. Therefore we will treat these queries as 
invalid since Spark 3.0.
+### Dataset/DataFrame APIs
 
   - Since Spark 3.0, the Dataset and DataFrame API `unionAll` is not 
deprecated any more. It is an alias for `union`.
 
-  - In Spark version 2.4 and earlier, the parser of JSON data source treats 
empty strings as null for some data types such as `IntegerType`. For 
`FloatType`, `DoubleType`, `DateType` and `TimestampType`, it fails on empty 
strings and throws exceptions. Since Spark 3.0, we disallow empty strings and 
will throw exceptions for data types except for `StringType` and `BinaryType`. 
The previous behaviour of allowing empty string can be restored by setting 
`spark.sql.legacy.json.allowEmptyStrin [...]
-
-  - Since Spark 3.0, the `from_json` functions supports two modes - 
`PERMISSIVE` and `FAILFAST`. The modes can be set via the `mode` option. The 
default mode became `PERMISSIVE`. In previous versions, behavior of `from_json` 
did not conform to either `PERMISSIVE` nor `FAILFAST`, especially in processing 
of malformed JSON records. For example, the JSON string `{"a" 1}` with the 
schema `a INT` is converted to `null` by previous versions but Spark 3.0 
converts it to `Row(null)`.
-
-  - The `ADD JAR` command previously returned a result set with the single 
value 0. It now returns an empty result set.
-
-  - In Spark version 2.4 and earlier, users can create map values with map 
type key via built-in function such as `CreateMap`, `MapFromArrays`, etc. Since 
Spark 3.0, it's not allowed to create map values with map type key with these 
built-in functions. Users can use `map_entries

[spark] branch branch-3.0 updated: [SPARK-31151][SQL][DOC] Reorganize the migration guide of SQL

2020-03-14 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f83ef7d  [SPARK-31151][SQL][DOC] Reorganize the migration guide of SQL
f83ef7d is described below

commit f83ef7d143aafbbdd1bb322567481f68db72195a
Author: gatorsmile 
AuthorDate: Sun Mar 15 07:35:20 2020 +0900

[SPARK-31151][SQL][DOC] Reorganize the migration guide of SQL

### What changes were proposed in this pull request?
The current migration guide of SQL is too long for most readers to find the 
needed info. This PR is to group the items in the migration guide of Spark SQL 
based on the corresponding components.

Note. This PR does not change the contents of the migration guides. 
Attached figure is the screenshot after the change.


![screencapture-127-0-0-1-4000-sql-migration-guide-html-2020-03-14-12_00_40](https://user-images.githubusercontent.com/11567269/76688626-d3010200-65eb-11ea-9ce7-265bc90ebb2c.png)

### Why are the changes needed?
The current migration guide of SQL is too long for most readers to find the 
needed info.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
N/A

Closes #27909 from gatorsmile/migrationGuideReorg.

Authored-by: gatorsmile 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 4d4c00c1b564b57d3016ce8c3bfcffaa6e58f012)
Signed-off-by: Takeshi Yamamuro 
---
 docs/sql-migration-guide.md | 287 +++-
 1 file changed, 150 insertions(+), 137 deletions(-)

diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index 19c744c..31d5c68 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -23,92 +23,119 @@ license: |
 {:toc}
 
 ## Upgrading from Spark SQL 2.4 to 3.0
-  - Since Spark 3.0, when inserting a value into a table column with a 
different data type, the type coercion is performed as per ANSI SQL standard. 
Certain unreasonable type conversions such as converting `string` to `int` and 
`double` to `boolean` are disallowed. A runtime exception will be thrown if the 
value is out-of-range for the data type of the column. In Spark version 2.4 and 
earlier, type conversions during table insertion are allowed as long as they 
are valid `Cast`. When inse [...]
 
-  - In Spark 3.0, the deprecated methods `SQLContext.createExternalTable` and 
`SparkSession.createExternalTable` have been removed in favor of its 
replacement, `createTable`.
-
-  - In Spark 3.0, the deprecated `HiveContext` class has been removed. Use 
`SparkSession.builder.enableHiveSupport()` instead.
-
-  - Since Spark 3.0, configuration `spark.sql.crossJoin.enabled` become 
internal configuration, and is true by default, so by default spark won't raise 
exception on sql with implicit cross join.
-
-  - In Spark version 2.4 and earlier, SQL queries such as `FROM ` or 
`FROM  UNION ALL FROM ` are supported by accident. In hive-style 
`FROM  SELECT `, the `SELECT` clause is not negligible. Neither 
Hive nor Presto support this syntax. Therefore we will treat these queries as 
invalid since Spark 3.0.
+### Dataset/DataFrame APIs
 
   - Since Spark 3.0, the Dataset and DataFrame API `unionAll` is not 
deprecated any more. It is an alias for `union`.
 
-  - In Spark version 2.4 and earlier, the parser of JSON data source treats 
empty strings as null for some data types such as `IntegerType`. For 
`FloatType`, `DoubleType`, `DateType` and `TimestampType`, it fails on empty 
strings and throws exceptions. Since Spark 3.0, we disallow empty strings and 
will throw exceptions for data types except for `StringType` and `BinaryType`. 
The previous behaviour of allowing empty string can be restored by setting 
`spark.sql.legacy.json.allowEmptyStrin [...]
-
-  - Since Spark 3.0, the `from_json` functions supports two modes - 
`PERMISSIVE` and `FAILFAST`. The modes can be set via the `mode` option. The 
default mode became `PERMISSIVE`. In previous versions, behavior of `from_json` 
did not conform to either `PERMISSIVE` nor `FAILFAST`, especially in processing 
of malformed JSON records. For example, the JSON string `{"a" 1}` with the 
schema `a INT` is converted to `null` by previous versions but Spark 3.0 
converts it to `Row(null)`.
-
-  - The `ADD JAR` command previously returned a result set with the single 
value 0. It now returns an empty result set.
-
-  - In Spark version 2.4 and earlier, users can create map values with map 
type key via built-in function such as `CreateMap`, `MapFromArrays`, etc. Since 
Spark 3.0, it's not allowed to create map values with map type key with these 
built-in functions. Users can use `map_entries` function to convert map to 
array> as a workaround. In addition, users can still read 
map values with

[spark] branch branch-3.0 updated: [SPARK-30962][SQL][DOC] Documentation for Alter table command phase 2

2020-03-10 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new b8e2cb3  [SPARK-30962][SQL][DOC] Documentation for Alter table command 
phase 2
b8e2cb3 is described below

commit b8e2cb32cbc75601d6d7a841362676cf2f273bda
Author: Qianyang Yu 
AuthorDate: Wed Mar 11 08:47:30 2020 +0900

[SPARK-30962][SQL][DOC] Documentation for Alter table command phase 2

### What changes were proposed in this pull request?

### Why are the changes needed?

Based on [JIRA 30962](https://issues.apache.org/jira/browse/SPARK-30962), 
we want to add all the support `Alter Table` syntax for V1 table.

### Does this PR introduce any user-facing change?

Yes

### How was this patch tested?

Before:
The documentation looks like
 [Alter Table](https://github.com/apache/spark/pull/25590)

After:
https://user-images.githubusercontent.com/7550280/75824837-168c7e00-5d59-11ea-9751-d1dab0f5a892.png;>
https://user-images.githubusercontent.com/7550280/75824859-21dfa980-5d59-11ea-8b49-3adf6eb55fc6.png;>
https://user-images.githubusercontent.com/7550280/75824884-2e640200-5d59-11ea-81ef-d77d0a8efee2.png;>
https://user-images.githubusercontent.com/7550280/75824910-39b72d80-5d59-11ea-84d0-bffa2499f086.png;>
https://user-images.githubusercontent.com/7550280/75824937-45a2ef80-5d59-11ea-932c-314924856834.png;>
https://user-images.githubusercontent.com/7550280/75824965-4cc9fd80-5d59-11ea-815b-8c1ebad310b1.png;>
https://user-images.githubusercontent.com/7550280/75824978-518eb180-5d59-11ea-8a55-2fa26376b9c1.png;>

https://user-images.githubusercontent.com/7550280/75825001-5bb0b000-5d59-11ea-8dd9-dcfbfa1b4330.png;>

Notes:
Those syntaxes are not supported by v1 Table.

- `ALTER TABLE .. RENAME COLUMN`
- `ALTER TABLE ... DROP (COLUMN | COLUMNS)`
- `ALTER TABLE ... (ALTER | CHANGE) COLUMN? alterColumnAction` only support 
change comments, not other actions: `datatype, position, (SET | DROP) NOT NULL`
- `ALTER TABLE .. CHANGE COLUMN?`
- `ALTER TABLE  REPLACE COLUMNS`
- `ALTER TABLE ... RECOVER PARTITIONS`
-

Closes #27779 from kevinyu98/spark-30962-alterT.

Authored-by: Qianyang Yu 
    Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 0f54dc7c03ed975ecb7f776a0151b9325d21e85c)
    Signed-off-by: Takeshi Yamamuro 
---
 docs/sql-ref-syntax-ddl-alter-table.md | 213 -
 1 file changed, 210 insertions(+), 3 deletions(-)

diff --git a/docs/sql-ref-syntax-ddl-alter-table.md 
b/docs/sql-ref-syntax-ddl-alter-table.md
index 373fa8d..2dd808b 100644
--- a/docs/sql-ref-syntax-ddl-alter-table.md
+++ b/docs/sql-ref-syntax-ddl-alter-table.md
@@ -23,14 +23,13 @@ license: |
 `ALTER TABLE` statement changes the schema or properties of a table.
 
 ### RENAME 
-`ALTER TABLE RENAME` statement changes the table name of an existing table in 
the database.
+`ALTER TABLE RENAME TO` statement changes the table name of an existing table 
in the database.
 
  Syntax
 {% highlight sql %}
 ALTER TABLE table_identifier RENAME TO table_identifier
 
 ALTER TABLE table_identifier partition_spec RENAME TO partition_spec
-
 {% endhighlight %}
 
  Parameters
@@ -83,6 +82,109 @@ ALTER TABLE table_identifier ADD COLUMNS ( col_spec [ , 
col_spec ... ] )
 
 
 
+### ALTER OR CHANGE COLUMN
+`ALTER TABLE ALTER COLUMN` or `ALTER TABLE CHANGE COLUMN` statement changes 
column's comment.
+
+ Syntax
+{% highlight sql %}
+ALTER TABLE table_identifier { ALTER | CHANGE } [ COLUMN ] col_spec 
alterColumnAction
+{% endhighlight %}
+
+ Parameters
+
+  table_identifier
+  
+Specifies a table name, which may be optionally qualified with a database 
name.
+Syntax:
+  
+[ database_name. ] table_name
+  
+  
+
+
+
+  COLUMN col_spec
+  Specifies the column to be altered or be changed.
+
+
+
+  alterColumnAction
+   
+ Change the comment string.
+ Syntax:
+
+COMMENT STRING
+ 
+
+
+
+
+### ADD AND DROP PARTITION
+
+ ADD PARTITION
+`ALTER TABLE ADD` statement adds partition to the partitioned table.
+
+# Syntax
+{% highlight sql %}
+ALTER TABLE table_identifier ADD [IF NOT EXISTS] 
+( partition_spec [ partition_spec ... ] )
+{% endhighlight %}
+ 
+# Parameters
+
+  table_identifier
+  
+Specifies a table name, which may be optionally qualified with a database 
name.
+Syntax:
+  
+[ database_name. ] table_name
+  
+  
+
+
+
+  partition_spec
+  
+Partition to be added. 
+Syntax:
+  
+PARTITION ( partition_col_name  = partition_col_val [ , ... ] )
+  
+  
+ 
+
+ DROP PARTITION
+`ALTER TABLE DROP` statement drops th

[spark] branch master updated: [SPARK-30962][SQL][DOC] Documentation for Alter table command phase 2

2020-03-10 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0f54dc7  [SPARK-30962][SQL][DOC] Documentation for Alter table command 
phase 2
0f54dc7 is described below

commit 0f54dc7c03ed975ecb7f776a0151b9325d21e85c
Author: Qianyang Yu 
AuthorDate: Wed Mar 11 08:47:30 2020 +0900

[SPARK-30962][SQL][DOC] Documentation for Alter table command phase 2

### What changes were proposed in this pull request?

### Why are the changes needed?

Based on [JIRA 30962](https://issues.apache.org/jira/browse/SPARK-30962), 
we want to add all the support `Alter Table` syntax for V1 table.

### Does this PR introduce any user-facing change?

Yes

### How was this patch tested?

Before:
The documentation looks like
 [Alter Table](https://github.com/apache/spark/pull/25590)

After:
https://user-images.githubusercontent.com/7550280/75824837-168c7e00-5d59-11ea-9751-d1dab0f5a892.png;>
https://user-images.githubusercontent.com/7550280/75824859-21dfa980-5d59-11ea-8b49-3adf6eb55fc6.png;>
https://user-images.githubusercontent.com/7550280/75824884-2e640200-5d59-11ea-81ef-d77d0a8efee2.png;>
https://user-images.githubusercontent.com/7550280/75824910-39b72d80-5d59-11ea-84d0-bffa2499f086.png;>
https://user-images.githubusercontent.com/7550280/75824937-45a2ef80-5d59-11ea-932c-314924856834.png;>
https://user-images.githubusercontent.com/7550280/75824965-4cc9fd80-5d59-11ea-815b-8c1ebad310b1.png;>
https://user-images.githubusercontent.com/7550280/75824978-518eb180-5d59-11ea-8a55-2fa26376b9c1.png;>

https://user-images.githubusercontent.com/7550280/75825001-5bb0b000-5d59-11ea-8dd9-dcfbfa1b4330.png;>

Notes:
Those syntaxes are not supported by v1 Table.

- `ALTER TABLE .. RENAME COLUMN`
- `ALTER TABLE ... DROP (COLUMN | COLUMNS)`
- `ALTER TABLE ... (ALTER | CHANGE) COLUMN? alterColumnAction` only support 
change comments, not other actions: `datatype, position, (SET | DROP) NOT NULL`
- `ALTER TABLE .. CHANGE COLUMN?`
- `ALTER TABLE  REPLACE COLUMNS`
- `ALTER TABLE ... RECOVER PARTITIONS`
-

Closes #27779 from kevinyu98/spark-30962-alterT.

Authored-by: Qianyang Yu 
    Signed-off-by: Takeshi Yamamuro 
---
 docs/sql-ref-syntax-ddl-alter-table.md | 213 -
 1 file changed, 210 insertions(+), 3 deletions(-)

diff --git a/docs/sql-ref-syntax-ddl-alter-table.md 
b/docs/sql-ref-syntax-ddl-alter-table.md
index 373fa8d..2dd808b 100644
--- a/docs/sql-ref-syntax-ddl-alter-table.md
+++ b/docs/sql-ref-syntax-ddl-alter-table.md
@@ -23,14 +23,13 @@ license: |
 `ALTER TABLE` statement changes the schema or properties of a table.
 
 ### RENAME 
-`ALTER TABLE RENAME` statement changes the table name of an existing table in 
the database.
+`ALTER TABLE RENAME TO` statement changes the table name of an existing table 
in the database.
 
  Syntax
 {% highlight sql %}
 ALTER TABLE table_identifier RENAME TO table_identifier
 
 ALTER TABLE table_identifier partition_spec RENAME TO partition_spec
-
 {% endhighlight %}
 
  Parameters
@@ -83,6 +82,109 @@ ALTER TABLE table_identifier ADD COLUMNS ( col_spec [ , 
col_spec ... ] )
 
 
 
+### ALTER OR CHANGE COLUMN
+`ALTER TABLE ALTER COLUMN` or `ALTER TABLE CHANGE COLUMN` statement changes 
column's comment.
+
+ Syntax
+{% highlight sql %}
+ALTER TABLE table_identifier { ALTER | CHANGE } [ COLUMN ] col_spec 
alterColumnAction
+{% endhighlight %}
+
+ Parameters
+
+  table_identifier
+  
+Specifies a table name, which may be optionally qualified with a database 
name.
+Syntax:
+  
+[ database_name. ] table_name
+  
+  
+
+
+
+  COLUMN col_spec
+  Specifies the column to be altered or be changed.
+
+
+
+  alterColumnAction
+   
+ Change the comment string.
+ Syntax:
+
+COMMENT STRING
+ 
+
+
+
+
+### ADD AND DROP PARTITION
+
+ ADD PARTITION
+`ALTER TABLE ADD` statement adds partition to the partitioned table.
+
+# Syntax
+{% highlight sql %}
+ALTER TABLE table_identifier ADD [IF NOT EXISTS] 
+( partition_spec [ partition_spec ... ] )
+{% endhighlight %}
+ 
+# Parameters
+
+  table_identifier
+  
+Specifies a table name, which may be optionally qualified with a database 
name.
+Syntax:
+  
+[ database_name. ] table_name
+  
+  
+
+
+
+  partition_spec
+  
+Partition to be added. 
+Syntax:
+  
+PARTITION ( partition_col_name  = partition_col_val [ , ... ] )
+  
+  
+ 
+
+ DROP PARTITION
+`ALTER TABLE DROP` statement drops the partition of the table.
+
+# Syntax
+{% highlight sql %}
+ALTER TABLE table_identifier DROP [ IF EXISTS ] partition_spec 

[spark] branch master updated (2e3adad -> 71c73d5)

2020-03-05 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2e3adad  [SPARK-31061][SQL] Provide ability to alter the provider of a 
table
 add 71c73d5  [SPARK-30279][SQL] Support 32 or more grouping attributes for 
GROUPING_ID

No new revisions were added by this update.

Summary of changes:
 docs/sql-migration-guide.md|  3 ++
 .../spark/sql/catalyst/analysis/Analyzer.scala | 10 ++--
 .../spark/sql/catalyst/expressions/grouping.scala  | 28 +++
 .../plans/logical/basicLogicalOperators.scala  | 13 --
 .../org/apache/spark/sql/internal/SQLConf.scala|  9 
 .../analysis/ResolveGroupingAnalyticsSuite.scala   | 54 --
 .../sql-tests/results/group-analytics.sql.out  |  8 ++--
 .../sql-tests/results/grouping_set.sql.out |  4 +-
 .../results/postgreSQL/groupingsets.sql.out|  2 +-
 .../results/udf/udf-group-analytics.sql.out|  8 ++--
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 31 +
 11 files changed, 117 insertions(+), 53 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [SPARK-30998][SQL][2.4] ClassCastException when a generator having nested inner generators

2020-03-03 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new f4c8c48  [SPARK-30998][SQL][2.4] ClassCastException when a generator 
having nested inner generators
f4c8c48 is described below

commit f4c8c4892197b8c5425a8013a09e9b379444e6fc
Author: Takeshi Yamamuro 
AuthorDate: Tue Mar 3 23:47:40 2020 +0900

[SPARK-30998][SQL][2.4] ClassCastException when a generator having nested 
inner generators

### What changes were proposed in this pull request?

A query below failed in branch-2.4;

```
scala> sql("select array(array(1, 2), array(3)) 
ar").select(explode(explode($"ar"))).show()
20/03/01 13:51:56 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 
0)/ 1]
java.lang.ClassCastException: scala.collection.mutable.ArrayOps$ofRef 
cannot be cast to org.apache.spark.sql.catalyst.util.ArrayData
at 
org.apache.spark.sql.catalyst.expressions.ExplodeBase.eval(generators.scala:313)
at 
org.apache.spark.sql.execution.GenerateExec.$anonfun$doExecute$8(GenerateExec.scala:108)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
at scala.collection.Iterator$ConcatIterator.hasNext(Iterator.scala:222)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
...
```

This pr modified the `hasNestedGenerator` code in `ExtractGenerator` for 
correctly catching nested inner generators.

This backport PR comes from https://github.com/apache/spark/pull/27750#
### Why are the changes needed?

A bug fix.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Added tests.

Closes #27769 from maropu/SPARK-20998-BRANCH-2.4.

    Authored-by: Takeshi Yamamuro 
Signed-off-by: Takeshi Yamamuro 
---
 .../apache/spark/sql/catalyst/analysis/Analyzer.scala | 16 +---
 .../sql/catalyst/analysis/AnalysisErrorSuite.scala| 19 +++
 .../org/apache/spark/sql/GeneratorFunctionSuite.scala |  8 
 3 files changed, 40 insertions(+), 3 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 0fedf7f..61f77be 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -1681,10 +1681,20 @@ class Analyzer(
 }
 
 private def hasNestedGenerator(expr: NamedExpression): Boolean = {
+  def hasInnerGenerator(g: Generator): Boolean = g match {
+// Since `GeneratorOuter` is just a wrapper of generators, we skip it 
here
+case go: GeneratorOuter =>
+  hasInnerGenerator(go.child)
+case _ =>
+  g.children.exists { _.find {
+case _: Generator => true
+case _ => false
+  }.isDefined }
+  }
   CleanupAliases.trimNonTopLevelAliases(expr) match {
-case UnresolvedAlias(_: Generator, _) => false
-case Alias(_: Generator, _) => false
-case MultiAlias(_: Generator, _) => false
+case UnresolvedAlias(g: Generator, _) => hasInnerGenerator(g)
+case Alias(g: Generator, _) => hasInnerGenerator(g)
+case MultiAlias(g: Generator, _) => hasInnerGenerator(g)
 case other => hasGenerator(other)
   }
 }
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala
index 45319aa..337902f 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala
@@ -395,6 +395,25 @@ class AnalysisErrorSuite extends AnalysisTest {
   )
 
   errorTest(
+"SPARK-30998: unsupported nested inner generators",
+{
+  val nestedListRelation = LocalRelation(
+AttributeReference("nestedList", ArrayType(ArrayType(IntegerType)))())
+  nestedListRelation.select(Explode(Explode($"nestedList")))
+},
+"Generators are not supported when it's nested in expressions, but got: " +
+  "explode(explode(nestedList))" :: Nil
+  )
+
+  errorTest(
+"SPARK-30998: unsupported nested inner generators for aggregates",
+testRelation.select(Explode(Explode(
+  CreateArray(CreateArray(min($"a") :: max($"a") :: Nil) :: N

[spark] branch branch-2.4 updated: [SPARK-30998][SQL][2.4] ClassCastException when a generator having nested inner generators

2020-03-03 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new f4c8c48  [SPARK-30998][SQL][2.4] ClassCastException when a generator 
having nested inner generators
f4c8c48 is described below

commit f4c8c4892197b8c5425a8013a09e9b379444e6fc
Author: Takeshi Yamamuro 
AuthorDate: Tue Mar 3 23:47:40 2020 +0900

[SPARK-30998][SQL][2.4] ClassCastException when a generator having nested 
inner generators

### What changes were proposed in this pull request?

A query below failed in branch-2.4;

```
scala> sql("select array(array(1, 2), array(3)) 
ar").select(explode(explode($"ar"))).show()
20/03/01 13:51:56 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 
0)/ 1]
java.lang.ClassCastException: scala.collection.mutable.ArrayOps$ofRef 
cannot be cast to org.apache.spark.sql.catalyst.util.ArrayData
at 
org.apache.spark.sql.catalyst.expressions.ExplodeBase.eval(generators.scala:313)
at 
org.apache.spark.sql.execution.GenerateExec.$anonfun$doExecute$8(GenerateExec.scala:108)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
at scala.collection.Iterator$ConcatIterator.hasNext(Iterator.scala:222)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
...
```

This pr modified the `hasNestedGenerator` code in `ExtractGenerator` for 
correctly catching nested inner generators.

This backport PR comes from https://github.com/apache/spark/pull/27750#
### Why are the changes needed?

A bug fix.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Added tests.

Closes #27769 from maropu/SPARK-20998-BRANCH-2.4.

    Authored-by: Takeshi Yamamuro 
Signed-off-by: Takeshi Yamamuro 
---
 .../apache/spark/sql/catalyst/analysis/Analyzer.scala | 16 +---
 .../sql/catalyst/analysis/AnalysisErrorSuite.scala| 19 +++
 .../org/apache/spark/sql/GeneratorFunctionSuite.scala |  8 
 3 files changed, 40 insertions(+), 3 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 0fedf7f..61f77be 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -1681,10 +1681,20 @@ class Analyzer(
 }
 
 private def hasNestedGenerator(expr: NamedExpression): Boolean = {
+  def hasInnerGenerator(g: Generator): Boolean = g match {
+// Since `GeneratorOuter` is just a wrapper of generators, we skip it 
here
+case go: GeneratorOuter =>
+  hasInnerGenerator(go.child)
+case _ =>
+  g.children.exists { _.find {
+case _: Generator => true
+case _ => false
+  }.isDefined }
+  }
   CleanupAliases.trimNonTopLevelAliases(expr) match {
-case UnresolvedAlias(_: Generator, _) => false
-case Alias(_: Generator, _) => false
-case MultiAlias(_: Generator, _) => false
+case UnresolvedAlias(g: Generator, _) => hasInnerGenerator(g)
+case Alias(g: Generator, _) => hasInnerGenerator(g)
+case MultiAlias(g: Generator, _) => hasInnerGenerator(g)
 case other => hasGenerator(other)
   }
 }
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala
index 45319aa..337902f 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala
@@ -395,6 +395,25 @@ class AnalysisErrorSuite extends AnalysisTest {
   )
 
   errorTest(
+"SPARK-30998: unsupported nested inner generators",
+{
+  val nestedListRelation = LocalRelation(
+AttributeReference("nestedList", ArrayType(ArrayType(IntegerType)))())
+  nestedListRelation.select(Explode(Explode($"nestedList")))
+},
+"Generators are not supported when it's nested in expressions, but got: " +
+  "explode(explode(nestedList))" :: Nil
+  )
+
+  errorTest(
+"SPARK-30998: unsupported nested inner generators for aggregates",
+testRelation.select(Explode(Explode(
+  CreateArray(CreateArray(min($"a") :: max($"a") :: Nil) :: N

[spark] branch branch-3.0 updated: [SPARK-30998][SQL] ClassCastException when a generator having nested inner generators

2020-03-03 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ded0a72  [SPARK-30998][SQL] ClassCastException when a generator having 
nested inner generators
ded0a72 is described below

commit ded0a72d81c1d34753be8a156126312506fb50b1
Author: Takeshi Yamamuro 
AuthorDate: Tue Mar 3 19:00:33 2020 +0900

[SPARK-30998][SQL] ClassCastException when a generator having nested inner 
generators

### What changes were proposed in this pull request?

A query below failed in the master;

```
scala> sql("select array(array(1, 2), array(3)) 
ar").select(explode(explode($"ar"))).show()
20/03/01 13:51:56 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 
0)/ 1]
java.lang.ClassCastException: scala.collection.mutable.ArrayOps$ofRef 
cannot be cast to org.apache.spark.sql.catalyst.util.ArrayData
at 
org.apache.spark.sql.catalyst.expressions.ExplodeBase.eval(generators.scala:313)
at 
org.apache.spark.sql.execution.GenerateExec.$anonfun$doExecute$8(GenerateExec.scala:108)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
at scala.collection.Iterator$ConcatIterator.hasNext(Iterator.scala:222)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
...
```

This pr modified the `hasNestedGenerator` code in `ExtractGenerator` for 
correctly catching nested inner generators.

### Why are the changes needed?

A bug fix.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Added tests.

Closes #27750 from maropu/HandleNestedGenerators.

    Authored-by: Takeshi Yamamuro 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 313e62c376acab30e546df253b28452a664d3e73)
Signed-off-by: Takeshi Yamamuro 
---
 .../apache/spark/sql/catalyst/analysis/Analyzer.scala | 16 +---
 .../sql/catalyst/analysis/AnalysisErrorSuite.scala| 19 +++
 .../org/apache/spark/sql/GeneratorFunctionSuite.scala |  8 
 3 files changed, 40 insertions(+), 3 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 3d79799..486b952 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -2164,10 +2164,20 @@ class Analyzer(
 }
 
 private def hasNestedGenerator(expr: NamedExpression): Boolean = {
+  def hasInnerGenerator(g: Generator): Boolean = g match {
+// Since `GeneratorOuter` is just a wrapper of generators, we skip it 
here
+case go: GeneratorOuter =>
+  hasInnerGenerator(go.child)
+case _ =>
+  g.children.exists { _.find {
+case _: Generator => true
+case _ => false
+  }.isDefined }
+  }
   CleanupAliases.trimNonTopLevelAliases(expr) match {
-case UnresolvedAlias(_: Generator, _) => false
-case Alias(_: Generator, _) => false
-case MultiAlias(_: Generator, _) => false
+case UnresolvedAlias(g: Generator, _) => hasInnerGenerator(g)
+case Alias(g: Generator, _) => hasInnerGenerator(g)
+case MultiAlias(g: Generator, _) => hasInnerGenerator(g)
 case other => hasGenerator(other)
   }
 }
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala
index 8f62b0b..3db1053 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala
@@ -434,6 +434,25 @@ class AnalysisErrorSuite extends AnalysisTest {
   )
 
   errorTest(
+"SPARK-30998: unsupported nested inner generators",
+{
+  val nestedListRelation = LocalRelation(
+AttributeReference("nestedList", ArrayType(ArrayType(IntegerType)))())
+  nestedListRelation.select(Explode(Explode($"nestedList")))
+},
+"Generators are not supported when it's nested in expressions, but got: " +
+  "explode(explode(nestedList))" :: Nil
+  )
+
+  errorTest(
+"SPARK-30998: unsupported nested inner generators for aggregates",
+testRelation.select(Explode(Explode(
+  CreateArray(CreateArray(min($"a") :: max($"a&q

[spark] branch master updated (1fac06c -> 313e62c)

2020-03-03 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1fac06c  Revert "[SPARK-30808][SQL] Enable Java 8 time API in Thrift 
server"
 add 313e62c  [SPARK-30998][SQL] ClassCastException when a generator having 
nested inner generators

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/analysis/Analyzer.scala | 16 +---
 .../sql/catalyst/analysis/AnalysisErrorSuite.scala| 19 +++
 .../org/apache/spark/sql/GeneratorFunctionSuite.scala |  8 
 3 files changed, 40 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-30956][SQL][TESTS] Use intercept instead of try-catch to assert failures in IntervalUtilsSuite

2020-02-27 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 933e576  [SPARK-30956][SQL][TESTS] Use intercept instead of try-catch 
to assert failures in IntervalUtilsSuite
933e576 is described below

commit 933e576aab0a40e53f275ae960fc45b7ed2d6f06
Author: Kent Yao 
AuthorDate: Thu Feb 27 23:12:35 2020 +0900

[SPARK-30956][SQL][TESTS] Use intercept instead of try-catch to assert 
failures in IntervalUtilsSuite

### What changes were proposed in this pull request?

In this PR, I addressed the comment from 
https://github.com/apache/spark/pull/27672#discussion_r383719562 to use 
`intercept` instead of `try-catch` block to assert  failures in the 
IntervalUtilsSuite

### Why are the changes needed?

improve tests
### Does this PR introduce any user-facing change?

no

### How was this patch tested?

Nah

Closes #27700 from yaooqinn/intervaltest.

Authored-by: Kent Yao 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 2d2706cb86ddccd2fc60378b0f47a437ec354017)
Signed-off-by: Takeshi Yamamuro 
---
 .../sql/catalyst/util/IntervalUtilsSuite.scala | 119 +
 1 file changed, 26 insertions(+), 93 deletions(-)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/IntervalUtilsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/IntervalUtilsSuite.scala
index e7c3163..1628a61 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/IntervalUtilsSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/IntervalUtilsSuite.scala
@@ -35,27 +35,17 @@ class IntervalUtilsSuite extends SparkFunSuite with 
SQLHelper {
 assert(safeStringToInterval(UTF8String.fromString(input)) === expected)
   }
 
-  private def checkFromStringWithFunc(
-  input: String,
-  months: Int,
-  days: Int,
-  us: Long,
-  func: CalendarInterval => CalendarInterval): Unit = {
-val expected = new CalendarInterval(months, days, us)
-assert(func(stringToInterval(UTF8String.fromString(input))) === expected)
-assert(func(safeStringToInterval(UTF8String.fromString(input))) === 
expected)
+  private def checkFromInvalidString(input: String, errorMsg: String): Unit = {
+failFuncWithInvalidInput(input, errorMsg, s => 
stringToInterval(UTF8String.fromString(s)))
+assert(safeStringToInterval(UTF8String.fromString(input)) === null)
   }
 
-  private def checkFromInvalidString(input: String, errorMsg: String): Unit = {
-try {
-  stringToInterval(UTF8String.fromString(input))
-  fail("Expected to throw an exception for the invalid input")
-} catch {
-  case e: IllegalArgumentException =>
-val msg = e.getMessage
-assert(msg.contains(errorMsg))
+  private def failFuncWithInvalidInput(
+  input: String, errorMsg: String, converter: String => CalendarInterval): 
Unit = {
+withClue("Expected to throw an exception for the invalid input") {
+  val e = intercept[IllegalArgumentException](converter(input))
+  assert(e.getMessage.contains(errorMsg))
 }
-assert(safeStringToInterval(UTF8String.fromString(input)) === null)
   }
 
   private def testSingleUnit(
@@ -87,7 +77,6 @@ class IntervalUtilsSuite extends SparkFunSuite with SQLHelper 
{
 }
   }
 
-
   test("string to interval: multiple units") {
 Seq(
   "-1 MONTH 1 day -1 microseconds" -> new CalendarInterval(-1, 1, -1),
@@ -145,22 +134,9 @@ class IntervalUtilsSuite extends SparkFunSuite with 
SQLHelper {
 assert(fromYearMonthString("99-10") === new CalendarInterval(99 * 12 + 10, 
0, 0L))
 assert(fromYearMonthString("+99-10") === new CalendarInterval(99 * 12 + 
10, 0, 0L))
 assert(fromYearMonthString("-8-10") === new CalendarInterval(-8 * 12 - 10, 
0, 0L))
-
-try {
-  fromYearMonthString("99-15")
-  fail("Expected to throw an exception for the invalid input")
-} catch {
-  case e: IllegalArgumentException =>
-assert(e.getMessage.contains("month 15 outside range"))
-}
-
-try {
-  fromYearMonthString("9a9-15")
-  fail("Expected to throw an exception for the invalid input")
-} catch {
-  case e: IllegalArgumentException =>
-assert(e.getMessage.contains("Interval string does not match 
year-month format"))
-}
+failFuncWithInvalidInput("99-15", "month 15 outside range", 
fromYearMonthString)
+failFuncWithInvalidInput("9a9-15", "Interval string does not match 
year-month format",
+  fromYearMonthString)
   }
 
   tes

[spark] branch master updated (22dfd15 -> 2d2706c)

2020-02-27 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 22dfd15  [SPARK-30937][DOC] Group Hive upgrade guides together
 add 2d2706c  [SPARK-30956][SQL][TESTS] Use intercept instead of try-catch 
to assert failures in IntervalUtilsSuite

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/util/IntervalUtilsSuite.scala | 119 +
 1 file changed, 26 insertions(+), 93 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-30844][SQL] Static partition should also follow StoreAssignmentPolicy when insert into table

2020-02-23 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f30f50a  [SPARK-30844][SQL] Static partition should also follow 
StoreAssignmentPolicy when insert into table
f30f50a is described below

commit f30f50a76f4b9fb5e652620563fb9055c5f30521
Author: yi.wu 
AuthorDate: Sun Feb 23 17:46:19 2020 +0900

[SPARK-30844][SQL] Static partition should also follow 
StoreAssignmentPolicy when insert into table

### What changes were proposed in this pull request?

Make static partition also follows `StoreAssignmentPolicy` when insert into 
table:

if `StoreAssignmentPolicy=LEGACY`, using `Cast`;
if `StoreAssignmentPolicy=ANSI | STRIC`, using `AnsiCast`;

E.g., for the table `t` created by:

```
create table t(a int, b string) using parquet partitioned by (a)
```
and insert values with `StoreAssignmentPolicy=ANSI` using:
```
insert into t partition(a='ansi') values('ansi')
```

Before this PR:

```
+++
|   b|   a|
+++
|ansi|null|
+++
```

After this PR, insert will fail by:
```
java.lang.NumberFormatException: invalid input syntax for type numeric: ansi
```

(It should be better if we could use `TableOutputResolver.checkField` to 
fully follow `StoreAssignmentPolicy`. But since we lost the data type of static 
partition's value at first place, it's hard to use 
`TableOutputResolver.checkField`.)

### Why are the changes needed?

I think we should follow `StoreAssignmentPolicy` when insert into table for 
any columns, including static partition.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Added new test.

Closes #27597 from Ngone51/fix-static-partition.

Authored-by: yi.wu 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 9c2eadc7268844d49ec41da818002c99bb56addf)
Signed-off-by: Takeshi Yamamuro 
---
 .../execution/datasources/DataSourceStrategy.scala  | 13 -
 .../spark/sql/sources/DataSourceAnalysisSuite.scala | 10 --
 .../org/apache/spark/sql/sources/InsertSuite.scala  | 21 +
 3 files changed, 41 insertions(+), 3 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
index e3a0a0a..2d902b5 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
@@ -39,6 +39,7 @@ import org.apache.spark.sql.catalyst.rules.Rule
 import org.apache.spark.sql.execution.{RowDataSourceScanExec, SparkPlan}
 import org.apache.spark.sql.execution.command._
 import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.internal.SQLConf.StoreAssignmentPolicy
 import org.apache.spark.sql.sources._
 import org.apache.spark.sql.types._
 import org.apache.spark.unsafe.types.UTF8String
@@ -104,7 +105,17 @@ case class DataSourceAnalysis(conf: SQLConf) extends 
Rule[LogicalPlan] with Cast
 None
   } else if (potentialSpecs.size == 1) {
 val partValue = potentialSpecs.head._2
-Some(Alias(cast(Literal(partValue), field.dataType), field.name)())
+conf.storeAssignmentPolicy match {
+  // SPARK-30844: try our best to follow StoreAssignmentPolicy for 
static partition
+  // values but not completely follow because we can't do static type 
checking due to
+  // the reason that the parser has erased the type info of static 
partition values
+  // and converted them to string.
+  case StoreAssignmentPolicy.ANSI | StoreAssignmentPolicy.STRICT =>
+Some(Alias(AnsiCast(Literal(partValue), field.dataType,
+  Option(conf.sessionLocalTimeZone)), field.name)())
+  case _ =>
+Some(Alias(cast(Literal(partValue), field.dataType), field.name)())
+}
   } else {
 throw new AnalysisException(
   s"Partition column ${field.name} have multiple values specified, " +
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/sources/DataSourceAnalysisSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/sources/DataSourceAnalysisSuite.scala
index e1022e3..a6c5090 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/sources/DataSourceAnalysisSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/sources/DataSourceAnalysisSuite.scala
@@ -22,9 +22,10 @@ import org.scalatest.BeforeAndAfterAll
 import org.apache.spark.Spark

[spark] branch master updated (25f5bfa -> 9c2eadc)

2020-02-23 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 25f5bfa  [SPARK-30903][SQL] Fail fast on duplicate columns when 
analyze columns
 add 9c2eadc  [SPARK-30844][SQL] Static partition should also follow 
StoreAssignmentPolicy when insert into table

No new revisions were added by this update.

Summary of changes:
 .../execution/datasources/DataSourceStrategy.scala  | 13 -
 .../spark/sql/sources/DataSourceAnalysisSuite.scala | 10 --
 .../org/apache/spark/sql/sources/InsertSuite.scala  | 21 +
 3 files changed, 41 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (25f5bfa -> 9c2eadc)

2020-02-23 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 25f5bfa  [SPARK-30903][SQL] Fail fast on duplicate columns when 
analyze columns
 add 9c2eadc  [SPARK-30844][SQL] Static partition should also follow 
StoreAssignmentPolicy when insert into table

No new revisions were added by this update.

Summary of changes:
 .../execution/datasources/DataSourceStrategy.scala  | 13 -
 .../spark/sql/sources/DataSourceAnalysisSuite.scala | 10 --
 .../org/apache/spark/sql/sources/InsertSuite.scala  | 21 +
 3 files changed, 41 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-30903][SQL] Fail fast on duplicate columns when analyze columns

2020-02-22 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 4a82ead  [SPARK-30903][SQL] Fail fast on duplicate columns when 
analyze columns
4a82ead is described below

commit 4a82ead147c944c8d4828bbbd4a7e3ec3d3e1135
Author: yi.wu 
AuthorDate: Sun Feb 23 09:52:54 2020 +0900

[SPARK-30903][SQL] Fail fast on duplicate columns when analyze columns



### What changes were proposed in this pull request?


Add new `CommandCheck` rule and fail fast when detects duplicate columns in 
`AnalyzeColumnCommand`.

### Why are the changes needed?


To avoid duplicate statistics computation for the same column in 
`AnalyzeColumnCommand`.

### Does this PR introduce any user-facing change?


Yes. User now get exception when input duplicate columns.

### How was this patch tested?


Added new test.

Closes #27651 from Ngone51/fail_on_dup_cols.

Authored-by: yi.wu 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 25f5bfaa6e624da7f491e770a2383038fc6009e1)
Signed-off-by: Takeshi Yamamuro 
---
 .../spark/sql/execution/command/CommandCheck.scala | 38 ++
 .../sql/internal/BaseSessionStateBuilder.scala |  2 ++
 .../spark/sql/StatisticsCollectionSuite.scala  | 17 ++
 .../spark/sql/hive/HiveSessionStateBuilder.scala   |  2 ++
 4 files changed, 59 insertions(+)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandCheck.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandCheck.scala
new file mode 100644
index 000..dedace4
--- /dev/null
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandCheck.scala
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command
+
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.util.SchemaUtils
+
+/**
+ * Checks legitimization of various execution commands.
+ */
+case class CommandCheck(conf: SQLConf) extends (LogicalPlan => Unit) {
+
+  override def apply(plan: LogicalPlan): Unit = {
+plan.foreach {
+  case AnalyzeColumnCommand(_, colsOpt, allColumns) if !allColumns =>
+colsOpt.foreach(SchemaUtils.checkColumnNameDuplication(
+  _, "in analyze columns.", conf.caseSensitiveAnalysis))
+
+  case _ =>
+}
+  }
+}
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala
index 2137fe2..20e1b56 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala
@@ -28,6 +28,7 @@ import org.apache.spark.sql.catalyst.rules.Rule
 import org.apache.spark.sql.connector.catalog.CatalogManager
 import org.apache.spark.sql.execution.{ColumnarRule, QueryExecution, 
SparkOptimizer, SparkPlanner, SparkSqlParser}
 import org.apache.spark.sql.execution.analysis.DetectAmbiguousSelfJoin
+import org.apache.spark.sql.execution.command.CommandCheck
 import org.apache.spark.sql.execution.datasources._
 import org.apache.spark.sql.execution.datasources.v2.{TableCapabilityCheck, 
V2SessionCatalog}
 import org.apache.spark.sql.streaming.StreamingQueryManager
@@ -190,6 +191,7 @@ abstract class BaseSessionStateBuilder(
 PreReadCheck +:
 HiveOnlyCheck +:
 TableCapabilityCheck +:
+CommandCheck(conf) +:
 customCheckRules
   }
 
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala
index e9ceab6..30b15a8 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala
+++ 
b/sql/c

[spark] branch master updated (bcce1b1 -> 25f5bfa)

2020-02-22 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from bcce1b1  [SPARK-30904][SQL] Thrift RowBasedSet serialization throws 
NullPointerException on NULL BigDecimal
 add 25f5bfa  [SPARK-30903][SQL] Fail fast on duplicate columns when 
analyze columns

No new revisions were added by this update.

Summary of changes:
 .../CommandCheck.scala}| 23 ++
 .../sql/internal/BaseSessionStateBuilder.scala |  2 ++
 .../spark/sql/StatisticsCollectionSuite.scala  | 17 
 .../spark/sql/hive/HiveSessionStateBuilder.scala   |  2 ++
 4 files changed, 36 insertions(+), 8 deletions(-)
 copy 
sql/core/src/main/scala/org/apache/spark/sql/execution/{streaming/continuous/WriteToContinuousDataSource.scala
 => command/CommandCheck.scala} (60%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (bcce1b1 -> 25f5bfa)

2020-02-22 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from bcce1b1  [SPARK-30904][SQL] Thrift RowBasedSet serialization throws 
NullPointerException on NULL BigDecimal
 add 25f5bfa  [SPARK-30903][SQL] Fail fast on duplicate columns when 
analyze columns

No new revisions were added by this update.

Summary of changes:
 .../CommandCheck.scala}| 23 ++
 .../sql/internal/BaseSessionStateBuilder.scala |  2 ++
 .../spark/sql/StatisticsCollectionSuite.scala  | 17 
 .../spark/sql/hive/HiveSessionStateBuilder.scala   |  2 ++
 4 files changed, 36 insertions(+), 8 deletions(-)
 copy 
sql/core/src/main/scala/org/apache/spark/sql/execution/{streaming/continuous/WriteToContinuousDataSource.scala
 => command/CommandCheck.scala} (60%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [MINOR][SQL] Fix error position of NOSCAN

2020-02-20 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new a415d07  [MINOR][SQL] Fix error position of NOSCAN
a415d07 is described below

commit a415d07c90ad46c9d88d78e956cb5680b213ce71
Author: yi.wu 
AuthorDate: Fri Feb 21 15:21:53 2020 +0900

[MINOR][SQL] Fix error position of NOSCAN

### What changes were proposed in this pull request?

Point to correct position when miswrite `NOSCAN` detects.

### Why are the changes needed?

Before:

```
[info]   org.apache.spark.sql.catalyst.parser.ParseException: Expected 
`NOSCAN` instead of `SCAN`(line 1, pos 0)
[info]
[info] == SQL ==
[info] ANALYZE TABLE analyze_partition_with_null PARTITION (name) COMPUTE 
STATISTICS SCAN
[info] ^^^
```

After:

```
[info]   org.apache.spark.sql.catalyst.parser.ParseException: Expected 
`NOSCAN` instead of `SCAN`(line 1, pos 78)
[info]
[info] == SQL ==
[info] ANALYZE TABLE analyze_partition_with_null PARTITION (name) COMPUTE 
STATISTICS SCAN
[info] 
--^^^
```

### Does this PR introduce any user-facing change?

Yes, user will see better error message.

### How was this patch tested?

Manually test.

Closes #27662 from Ngone51/fix_noscan_reference.

Authored-by: yi.wu 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 4d356554a61024c7d3dc450accec1b3639c37e19)
Signed-off-by: Takeshi Yamamuro 
---
 .../main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala   | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
index 62e5685..36c1647 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
@@ -3165,7 +3165,8 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
 }
 if (ctx.identifier != null &&
 ctx.identifier.getText.toLowerCase(Locale.ROOT) != "noscan") {
-  throw new ParseException(s"Expected `NOSCAN` instead of 
`${ctx.identifier.getText}`", ctx)
+  throw new ParseException(s"Expected `NOSCAN` instead of 
`${ctx.identifier.getText}`",
+ctx.identifier())
 }
 
 val tableName = visitMultipartIdentifier(ctx.multipartIdentifier())


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (4d5166f -> 4d35655)

2020-02-20 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4d5166f  [SPARK-30880][DOCS] Delete Sphinx Makefile cruft
 add 4d35655  [MINOR][SQL] Fix error position of NOSCAN

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala   | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (8629597 -> d5b92b2)

2020-01-25 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8629597  [SPARK-30639][BUILD] Upgrade Jersey to 2.30
 add d5b92b2  [SPARK-30579][DOC] Document ORDER BY Clause of SELECT 
statement in SQL Reference

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-qry-select-orderby.md | 123 +-
 1 file changed, 122 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (8629597 -> d5b92b2)

2020-01-25 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8629597  [SPARK-30639][BUILD] Upgrade Jersey to 2.30
 add d5b92b2  [SPARK-30579][DOC] Document ORDER BY Clause of SELECT 
statement in SQL Reference

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-qry-select-orderby.md | 123 +-
 1 file changed, 122 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (3228d72 -> 4847f73)

2020-01-23 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3228d72  [SPARK-30603][SQL] Move RESERVED_PROPERTIES from 
SupportsNamespaces and TableCatalog to CatalogV2Util
 add 4847f73  [SPARK-30298][SQL] Respect aliases in output partitioning of 
projects and aggregates

No new revisions were added by this update.

Summary of changes:
 .../execution/AliasAwareOutputPartitioning.scala   | 55 ++
 .../execution/aggregate/HashAggregateExec.scala|  4 +-
 .../aggregate/ObjectHashAggregateExec.scala|  4 +-
 .../execution/aggregate/SortAggregateExec.scala|  6 +-
 .../sql/execution/basicPhysicalOperators.scala |  5 +-
 .../apache/spark/sql/execution/PlannerSuite.scala  | 88 ++
 .../spark/sql/sources/BucketedReadSuite.scala  | 14 
 7 files changed, 166 insertions(+), 10 deletions(-)
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/AliasAwareOutputPartitioning.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (f35f352 -> d0bf447)

2020-01-23 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f35f352  [SPARK-30543][ML][PYSPARK][R] RandomForest add Param 
bootstrap to control sampling method
 add d0bf447  [SPARK-30575][DOCS][FOLLOWUP] Fix typos in documents

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-qry-select-groupby.md | 2 +-
 docs/sql-ref-syntax-qry-select-having.md  | 4 ++--
 docs/sql-ref-syntax-qry-select-where.md   | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (a3a42b3 -> 5a55a5a)

2020-01-15 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a3a42b3  [SPARK-27986][SQL][FOLLOWUP] Respect filter in sql/toString 
of AggregateExpression
 add 5a55a5a  [SPARK-30518][SQL] Precision and scale should be same for 
values between -1.0 and 1.0 in Decimal

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/sql/types/Decimal.scala|  8 +---
 .../catalyst/expressions/ArithmeticExpressionSuite.scala   |  8 
 .../scala/org/apache/spark/sql/types/DecimalSuite.scala| 14 +++---
 3 files changed, 16 insertions(+), 14 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (883ae33 -> a3a42b3)

2020-01-15 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 883ae33  [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework
 add a3a42b3  [SPARK-27986][SQL][FOLLOWUP] Respect filter in sql/toString 
of AggregateExpression

No new revisions were added by this update.

Summary of changes:
 .../expressions/aggregate/interfaces.scala | 19 +--
 .../spark/sql/execution/aggregate/AggUtils.scala   | 23 ++--
 .../sql-tests/results/group-by-filter.sql.out  | 62 +++---
 .../results/postgreSQL/aggregates_part3.sql.out|  4 +-
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 33 +++-
 5 files changed, 86 insertions(+), 55 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (d42cf45 -> 8a926e4)

2020-01-15 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d42cf45  [SPARK-30246][CORE] OneForOneStreamManager might leak memory 
in connectionTerminated
 add 8a926e4  [SPARK-26736][SQL] Partition pruning through nondeterministic 
expressions in Hive tables

No new revisions were added by this update.

Summary of changes:
 .../src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala  | 2 +-
 ...stic condition - query test-0-56a1c59bd13c2a83a91eb0ec658fcecc} | 0
 .../scala/org/apache/spark/sql/hive/execution/PruningSuite.scala   | 7 +++
 3 files changed, 8 insertions(+), 1 deletion(-)
 copy sql/hive/src/test/resources/golden/{Partition pruning - left only 1 
partition - query test-0-3adc3a7f76b2abd059904ba81a595db3 => Partition pruning 
- with filter containing non-deterministic condition - query 
test-0-56a1c59bd13c2a83a91eb0ec658fcecc} (100%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (d42cf45 -> 8a926e4)

2020-01-15 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d42cf45  [SPARK-30246][CORE] OneForOneStreamManager might leak memory 
in connectionTerminated
 add 8a926e4  [SPARK-26736][SQL] Partition pruning through nondeterministic 
expressions in Hive tables

No new revisions were added by this update.

Summary of changes:
 .../src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala  | 2 +-
 ...stic condition - query test-0-56a1c59bd13c2a83a91eb0ec658fcecc} | 0
 .../scala/org/apache/spark/sql/hive/execution/PruningSuite.scala   | 7 +++
 3 files changed, 8 insertions(+), 1 deletion(-)
 copy sql/hive/src/test/resources/golden/{Partition pruning - left only 1 
partition - query test-0-3adc3a7f76b2abd059904ba81a595db3 => Partition pruning 
- with filter containing non-deterministic condition - query 
test-0-56a1c59bd13c2a83a91eb0ec658fcecc} (100%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (240840f -> 5f6cd61)

2020-01-15 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 240840f  [SPARK-30515][SQL] Refactor SimplifyBinaryComparison to 
reduce the time complexity
 add 5f6cd61  [SPARK-29708][SQL] Correct aggregated values when grouping 
sets are duplicated

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/analysis/Analyzer.scala |  2 +-
 .../plans/logical/basicLogicalOperators.scala  | 30 ++
 .../resources/sql-tests/inputs/grouping_set.sql|  6 +++
 .../sql-tests/inputs/postgreSQL/groupingsets.sql   |  1 -
 .../sql-tests/results/grouping_set.sql.out | 47 --
 .../results/postgreSQL/groupingsets.sql.out|  6 ++-
 6 files changed, 78 insertions(+), 14 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (51d2917 -> 240840f)

2020-01-15 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 51d2917  [SPARK-30505][DOCS] Deprecate Avro option `ignoreExtension` 
in sql-data-sources-avro.md
 add 240840f  [SPARK-30515][SQL] Refactor SimplifyBinaryComparison to 
reduce the time complexity

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/optimizer/expressions.scala | 49 +++---
 1 file changed, 25 insertions(+), 24 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (51d2917 -> 240840f)

2020-01-15 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 51d2917  [SPARK-30505][DOCS] Deprecate Avro option `ignoreExtension` 
in sql-data-sources-avro.md
 add 240840f  [SPARK-30515][SQL] Refactor SimplifyBinaryComparison to 
reduce the time complexity

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/optimizer/expressions.scala | 49 +++---
 1 file changed, 25 insertions(+), 24 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (1846b02 -> 88fc8db)

2020-01-13 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1846b02  [SPARK-30500][SPARK-30501][SQL] Remove SQL configs deprecated 
in Spark 2.1 and 2.3
 add 88fc8db  [SPARK-30482][SQL][CORE][TESTS] Add sub-class of 
`AppenderSkeleton` reusable in tests

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/SparkFunSuite.scala | 21 +--
 .../sql/catalyst/analysis/ResolveHintsSuite.scala  | 15 +---
 .../catalyst/expressions/CodeGenerationSuite.scala | 21 ++-
 .../catalyst/optimizer/OptimizerLoggingSuite.scala | 43 +-
 .../scala/org/apache/spark/sql/JoinHintSuite.scala | 15 +---
 .../sql/execution/datasources/csv/CSVSuite.scala   | 17 ++---
 .../apache/spark/sql/internal/SQLConfSuite.scala   | 12 +-
 7 files changed, 41 insertions(+), 103 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (1846b02 -> 88fc8db)

2020-01-13 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1846b02  [SPARK-30500][SPARK-30501][SQL] Remove SQL configs deprecated 
in Spark 2.1 and 2.3
 add 88fc8db  [SPARK-30482][SQL][CORE][TESTS] Add sub-class of 
`AppenderSkeleton` reusable in tests

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/SparkFunSuite.scala | 21 +--
 .../sql/catalyst/analysis/ResolveHintsSuite.scala  | 15 +---
 .../catalyst/expressions/CodeGenerationSuite.scala | 21 ++-
 .../catalyst/optimizer/OptimizerLoggingSuite.scala | 43 +-
 .../scala/org/apache/spark/sql/JoinHintSuite.scala | 15 +---
 .../sql/execution/datasources/csv/CSVSuite.scala   | 17 ++---
 .../apache/spark/sql/internal/SQLConfSuite.scala   | 12 +-
 7 files changed, 41 insertions(+), 103 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (b389b8c -> 81e1a21)

2020-01-13 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b389b8c  [SPARK-30188][SQL] Resolve the failed unit tests when enable 
AQE
 add 81e1a21  [SPARK-30234][SQL][DOCS][FOLOWUP] Update Documentation for 
ADD FILE and LIST FILE

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-aux-resource-mgmt-add-file.md  | 9 +
 docs/sql-ref-syntax-aux-resource-mgmt-list-file.md | 2 +-
 2 files changed, 6 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (b389b8c -> 81e1a21)

2020-01-13 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b389b8c  [SPARK-30188][SQL] Resolve the failed unit tests when enable 
AQE
 add 81e1a21  [SPARK-30234][SQL][DOCS][FOLOWUP] Update Documentation for 
ADD FILE and LIST FILE

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-aux-resource-mgmt-add-file.md  | 9 +
 docs/sql-ref-syntax-aux-resource-mgmt-list-file.md | 2 +-
 2 files changed, 6 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (d6532c7 -> b942832)

2020-01-10 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d6532c7  [SPARK-30448][CORE] accelerator aware scheduling enforce 
cores as limiting resource
 add b942832  [SPARK-30343][SQL] Skip unnecessary checks in 
RewriteDistinctAggregates

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/optimizer/RewriteDistinctAggregates.scala   | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (bcf07cb -> 418f7dc)

2020-01-10 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from bcf07cb  [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax
 add 418f7dc  [SPARK-30447][SQL] Constant propagation nullability issue

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/optimizer/expressions.scala | 41 --
 .../optimizer/ConstantPropagationSuite.scala   | 25 -
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala |  9 +
 3 files changed, 63 insertions(+), 12 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated (e52ae4e -> 6ac3659)

2020-01-08 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e52ae4e  [SPARK-30450][INFRA][FOLLOWUP][2.4] Fix git folder regex for 
windows file separator
 add 6ac3659  [SPARK-30410][SQL][2.4] Calculating size of table with large 
number of partitions causes flooding logs

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/execution/command/CommandUtils.scala  | 10 +++---
 .../spark/sql/execution/datasources/InMemoryFileIndex.scala|  6 +-
 2 files changed, 12 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (d7c7e37 -> 9535776)

2020-01-07 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d7c7e37  [SPARK-30381][ML] Refactor GBT to reuse treePoints for all 
trees
 add 9535776  [SPARK-30302][SQL] Complete info for show create table for 
views

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/catalog/interface.scala |   9 +-
 .../spark/sql/execution/command/tables.scala   |  43 +++--
 .../sql-tests/inputs/show-create-table.sql |  31 +++
 .../sql-tests/results/show-create-table.sql.out| 103 -
 4 files changed, 174 insertions(+), 12 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (d7c7e37 -> 9535776)

2020-01-07 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d7c7e37  [SPARK-30381][ML] Refactor GBT to reuse treePoints for all 
trees
 add 9535776  [SPARK-30302][SQL] Complete info for show create table for 
views

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/catalog/interface.scala |   9 +-
 .../spark/sql/execution/command/tables.scala   |  43 +++--
 .../sql-tests/inputs/show-create-table.sql |  31 +++
 .../sql-tests/results/show-create-table.sql.out| 103 -
 4 files changed, 174 insertions(+), 12 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (ed8a260 -> ed73ed8)

2020-01-07 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ed8a260  [SPARK-30450][INFRA] Exclude .git folder for python linter
 add ed73ed8  [SPARK-28825][SQL][DOC] Documentation for Explain Command

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-qry-explain.md | 119 -
 1 file changed, 118 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (ed8a260 -> ed73ed8)

2020-01-07 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ed8a260  [SPARK-30450][INFRA] Exclude .git folder for python linter
 add ed73ed8  [SPARK-28825][SQL][DOC] Documentation for Explain Command

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-qry-explain.md | 119 -
 1 file changed, 118 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



<    3   4   5   6   7   8   9   10   >