[spark] branch master updated: [SPARK-39095][PYTHON] Adjust `GroupBy.std` to match pandas 1.4

2022-05-05 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2d175986906 [SPARK-39095][PYTHON] Adjust `GroupBy.std` to match pandas 
1.4
2d175986906 is described below

commit 2d175986906b9ddf4b10b2b50d635b8bc07908fd
Author: Xinrong Meng 
AuthorDate: Fri May 6 10:40:08 2022 +0900

[SPARK-39095][PYTHON] Adjust `GroupBy.std` to match pandas 1.4

### What changes were proposed in this pull request?
Adjust `GroupBy.std` to match pandas 1.4.

Specifically, raise the TypeError when all aggregation columns are of 
unaccepted data types.

### Why are the changes needed?
Improve API compatibility with pandas.

### Does this PR introduce _any_ user-facing change?
Yes.
```py
>>> psdf = ps.DataFrame(
... {
... "A": [1, 2, 1, 2],
... "B": [3.1, 4.1, 4.1, 3.1],
... "C": ["a", "b", "b", "a"],
... "D": [True, False, False, True],
... }
... )
>>> psdf
   AB  C  D
0  1  3.1  a   True
1  2  4.1  b  False
2  1  4.1  b  False
3  2  3.1  a   True

### Before
>>> psdf.groupby('A')[['C']].std()
Empty DataFrame
Columns: []
Index: [1, 2]

### After
>>> psdf.groupby('A')[['C']].std()
...
TypeError: Unaccepted data types of aggregation columns; numeric or bool 
expected.
```

### How was this patch tested?
Unit tests.

Closes #36444 from xinrong-databricks/groupby.std.

Authored-by: Xinrong Meng 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/pandas/groupby.py| 15 +--
 python/pyspark/pandas/tests/test_groupby.py | 19 +++
 2 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/python/pyspark/pandas/groupby.py b/python/pyspark/pandas/groupby.py
index 386b24c1916..20f7ec55660 100644
--- a/python/pyspark/pandas/groupby.py
+++ b/python/pyspark/pandas/groupby.py
@@ -640,6 +640,17 @@ class GroupBy(Generic[FrameLike], metaclass=ABCMeta):
 """
 assert ddof in (0, 1)
 
+# Raise the TypeError when all aggregation columns are of unaccepted 
data types
+all_unaccepted = True
+for _agg_col in self._agg_columns:
+if isinstance(_agg_col.spark.data_type, (NumericType, 
BooleanType)):
+all_unaccepted = False
+break
+if all_unaccepted:
+raise TypeError(
+"Unaccepted data types of aggregation columns; numeric or bool 
expected."
+)
+
 return self._reduce_for_stat_function(
 F.stddev_pop if ddof == 0 else F.stddev_samp,
 accepted_spark_types=(NumericType,),
@@ -2756,9 +2767,9 @@ class GroupBy(Generic[FrameLike], metaclass=ABCMeta):
 
 Parameters
 --
-sfun : The aggregate function to apply per column
+sfun : The aggregate function to apply per column.
 accepted_spark_types: Accepted spark types of columns to be aggregated;
-  default None means all spark types are accepted
+  default None means all spark types are accepted.
 bool_to_numeric: If True, boolean columns are converted to numeric 
columns, which
  are accepted for all statistical functions regardless 
of
  `accepted_spark_types`.
diff --git a/python/pyspark/pandas/tests/test_groupby.py 
b/python/pyspark/pandas/tests/test_groupby.py
index f645373eb3c..33f24a5e2be 100644
--- a/python/pyspark/pandas/tests/test_groupby.py
+++ b/python/pyspark/pandas/tests/test_groupby.py
@@ -1286,13 +1286,24 @@ class GroupByTest(PandasOnSparkTestCase, TestUtils):
 ps.DataFrame({"B": [3.1, 3.1], "D": [0, 0]}, index=pd.Index([1, 
2], name="A")),
 )
 
-# TODO: fix bug of `std` and re-enable the test below
-# self._test_stat_func(lambda groupby_obj: groupby_obj.std(), 
check_exact=False)
-self.assert_eq(psdf.groupby("A").std(), pdf.groupby("A").std(), 
check_exact=False)
+with self.assertRaisesRegex(
+TypeError, "Unaccepted data types of aggregation columns; numeric 
or bool expected."
+):
+psdf.groupby("A")[["C"]].std()
+
+self.assert_eq(
+psdf.groupby("A").std().sort_index(),
+pdf.groupby("A").std().sort_index(),
+check_exact=False,
+)
 
 # TODO: fix bug of `sum` and re-enable the test below
 # self._test_stat_func(lambda groupby_obj: groupby_obj.sum(), 
check_exact=False)
-self.assert_eq(psdf.groupby("A").sum(), pdf.groupby("A").sum(), 
check_exact=False)
+self.assert_eq(
+psdf.groupby("A").sum().

[spark] branch master updated: [SPARK-39108][SQL] Show hints for try_add/try_substract/try_multiply in int/long overflow errors

2022-05-05 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c274812284a [SPARK-39108][SQL] Show hints for 
try_add/try_substract/try_multiply in int/long overflow errors
c274812284a is described below

commit c274812284a3b7ec725e6b8afc2e7ab0f91b923e
Author: Gengliang Wang 
AuthorDate: Thu May 5 23:03:44 2022 +0300

[SPARK-39108][SQL] Show hints for try_add/try_substract/try_multiply in 
int/long overflow errors

### What changes were proposed in this pull request?

Show hints for try_add/try_substract/try_multiply in int/long overflow 
errors

### Why are the changes needed?

Better error message for resolving the overflow errors under ANSI mode.

### Does this PR introduce _any_ user-facing change?

No, minor error message improvement

### How was this patch tested?

UT

Closes #36456 from gengliangwang/tryHint.

Authored-by: Gengliang Wang 
Signed-off-by: Max Gekk 
---
 .../scala/org/apache/spark/sql/catalyst/util/MathUtils.scala | 12 ++--
 .../test/resources/sql-tests/results/postgreSQL/int4.sql.out | 12 ++--
 .../test/resources/sql-tests/results/postgreSQL/int8.sql.out |  8 
 .../sql-tests/results/postgreSQL/window_part2.sql.out|  4 ++--
 4 files changed, 18 insertions(+), 18 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala
index f96c9fba5a3..e5c87a41ea8 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala
@@ -27,32 +27,32 @@ object MathUtils {
   def addExact(a: Int, b: Int): Int = withOverflow(Math.addExact(a, b))
 
   def addExact(a: Int, b: Int, errorContext: String): Int =
-withOverflow(Math.addExact(a, b), errorContext = errorContext)
+withOverflow(Math.addExact(a, b), hint = "try_add", errorContext = 
errorContext)
 
   def addExact(a: Long, b: Long): Long = withOverflow(Math.addExact(a, b))
 
   def addExact(a: Long, b: Long, errorContext: String): Long =
-withOverflow(Math.addExact(a, b), errorContext = errorContext)
+withOverflow(Math.addExact(a, b), hint = "try_add", errorContext = 
errorContext)
 
   def subtractExact(a: Int, b: Int): Int = withOverflow(Math.subtractExact(a, 
b))
 
   def subtractExact(a: Int, b: Int, errorContext: String): Int =
-withOverflow(Math.subtractExact(a, b), errorContext = errorContext)
+withOverflow(Math.subtractExact(a, b), hint = "try_subtract", errorContext 
= errorContext)
 
   def subtractExact(a: Long, b: Long): Long = 
withOverflow(Math.subtractExact(a, b))
 
   def subtractExact(a: Long, b: Long, errorContext: String): Long =
-withOverflow(Math.subtractExact(a, b), errorContext = errorContext)
+withOverflow(Math.subtractExact(a, b), hint = "try_subtract", errorContext 
= errorContext)
 
   def multiplyExact(a: Int, b: Int): Int = withOverflow(Math.multiplyExact(a, 
b))
 
   def multiplyExact(a: Int, b: Int, errorContext: String): Int =
-withOverflow(Math.multiplyExact(a, b), errorContext = errorContext)
+withOverflow(Math.multiplyExact(a, b), hint = "try_multiply", errorContext 
= errorContext)
 
   def multiplyExact(a: Long, b: Long): Long = 
withOverflow(Math.multiplyExact(a, b))
 
   def multiplyExact(a: Long, b: Long, errorContext: String): Long =
-withOverflow(Math.multiplyExact(a, b), errorContext = errorContext)
+withOverflow(Math.multiplyExact(a, b), hint = "try_multiply", errorContext 
= errorContext)
 
   def negateExact(a: Int): Int = withOverflow(Math.negateExact(a))
 
diff --git 
a/sql/core/src/test/resources/sql-tests/results/postgreSQL/int4.sql.out 
b/sql/core/src/test/resources/sql-tests/results/postgreSQL/int4.sql.out
index 6b42e31340f..a39cdbc340c 100755
--- a/sql/core/src/test/resources/sql-tests/results/postgreSQL/int4.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/postgreSQL/int4.sql.out
@@ -200,7 +200,7 @@ SELECT '' AS five, i.f1, i.f1 * smallint('2') AS x FROM 
INT4_TBL i
 struct<>
 -- !query output
 org.apache.spark.SparkArithmeticException
-[ARITHMETIC_OVERFLOW] integer overflow. If necessary set 
spark.sql.ansi.enabled to false (except for ANSI interval type) to bypass this 
error.
+[ARITHMETIC_OVERFLOW] integer overflow. To return NULL instead, use 
'try_multiply'. If necessary set spark.sql.ansi.enabled to false (except for 
ANSI interval type) to bypass this error.
 == SQL(line 1, position 25) ==
 SELECT '' AS five, i.f1, i.f1 * smallint('2') AS x FROM INT4_TBL i
  
@@ -223,7 +223,7 @@ SELECT '' AS five, i.f1, i.f1 * int('2') AS x FROM INT4_TBL 
i
 struct<

[spark] branch master updated: [SPARK-39099][BUILD] Add dependencies to Dockerfile for building Spark releases

2022-05-05 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4b1c2fb7a27 [SPARK-39099][BUILD] Add dependencies to Dockerfile for 
building Spark releases
4b1c2fb7a27 is described below

commit 4b1c2fb7a27757ebf470416c8ec02bb5c1f7fa49
Author: Max Gekk 
AuthorDate: Thu May 5 20:10:06 2022 +0300

[SPARK-39099][BUILD] Add dependencies to Dockerfile for building Spark 
releases

### What changes were proposed in this pull request?
Add missed dependencies to `dev/create-release/spark-rm/Dockerfile`.

### Why are the changes needed?
To be able to build Spark releases.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By building the Spark 3.3 release via:
```
$ dev/create-release/do-release-docker.sh -d /home/ubuntu/max/spark-3.3-rc1
```

Closes #36449 from MaxGekk/deps-Dockerfile.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 dev/create-release/spark-rm/Dockerfile | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/dev/create-release/spark-rm/Dockerfile 
b/dev/create-release/spark-rm/Dockerfile
index ffd60c07af0..c6555e0463d 100644
--- a/dev/create-release/spark-rm/Dockerfile
+++ b/dev/create-release/spark-rm/Dockerfile
@@ -42,7 +42,7 @@ ARG APT_INSTALL="apt-get install --no-install-recommends -y"
 #   We should use the latest Sphinx version once this is fixed.
 # TODO(SPARK-35375): Jinja2 3.0.0+ causes error when building with Sphinx.
 #   See also https://issues.apache.org/jira/browse/SPARK-35375.
-ARG PIP_PKGS="sphinx==3.0.4 mkdocs==1.1.2 numpy==1.19.4 
pydata_sphinx_theme==0.4.1 ipython==7.19.0 nbsphinx==0.8.0 numpydoc==1.1.0 
jinja2==2.11.3 twine==3.4.1 sphinx-plotly-directive==0.1.3 pandas==1.1.5 
pyarrow==3.0.0 plotly==5.4.0"
+ARG PIP_PKGS="sphinx==3.0.4 mkdocs==1.1.2 numpy==1.19.4 
pydata_sphinx_theme==0.4.1 ipython==7.19.0 nbsphinx==0.8.0 numpydoc==1.1.0 
jinja2==2.11.3 twine==3.4.1 sphinx-plotly-directive==0.1.3 pandas==1.1.5 
pyarrow==3.0.0 plotly==5.4.0 markupsafe==2.0.1 docutils<0.17"
 ARG GEM_PKGS="bundler:2.2.9"
 
 # Install extra needed repos and refresh.
@@ -79,9 +79,9 @@ RUN apt-get clean && apt-get update && $APT_INSTALL gnupg 
ca-certificates && \
   # Note that PySpark doc generation also needs pandoc due to nbsphinx
   $APT_INSTALL r-base r-base-dev && \
   $APT_INSTALL libcurl4-openssl-dev libgit2-dev libssl-dev libxml2-dev && \
-  $APT_INSTALL texlive-latex-base texlive texlive-fonts-extra texinfo qpdf && \
+  $APT_INSTALL texlive-latex-base texlive texlive-fonts-extra texinfo qpdf 
texlive-latex-extra && \
   $APT_INSTALL libfontconfig1-dev libharfbuzz-dev libfribidi-dev 
libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev && \
-  Rscript -e "install.packages(c('curl', 'xml2', 'httr', 'devtools', 
'testthat', 'knitr', 'rmarkdown', 'roxygen2', 'e1071', 'survival'), 
repos='https://cloud.r-project.org/')" && \
+  Rscript -e "install.packages(c('curl', 'xml2', 'httr', 'devtools', 
'testthat', 'knitr', 'rmarkdown', 'markdown', 'roxygen2', 'e1071', 'survival'), 
repos='https://cloud.r-project.org/')" && \
   Rscript -e "devtools::install_github('jimhester/lintr')" && \
   Rscript -e "devtools::install_version('pkgdown', version='2.0.1', 
repos='https://cloud.r-project.org')" && \
   Rscript -e "devtools::install_version('preferably', version='0.4', 
repos='https://cloud.r-project.org')" && \


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.3 updated: [SPARK-39099][BUILD] Add dependencies to Dockerfile for building Spark releases

2022-05-05 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 6a61f95a359 [SPARK-39099][BUILD] Add dependencies to Dockerfile for 
building Spark releases
6a61f95a359 is described below

commit 6a61f95a359e6aa9d09f8044019074dc7effcf30
Author: Max Gekk 
AuthorDate: Thu May 5 20:10:06 2022 +0300

[SPARK-39099][BUILD] Add dependencies to Dockerfile for building Spark 
releases

### What changes were proposed in this pull request?
Add missed dependencies to `dev/create-release/spark-rm/Dockerfile`.

### Why are the changes needed?
To be able to build Spark releases.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By building the Spark 3.3 release via:
```
$ dev/create-release/do-release-docker.sh -d /home/ubuntu/max/spark-3.3-rc1
```

Closes #36449 from MaxGekk/deps-Dockerfile.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
(cherry picked from commit 4b1c2fb7a27757ebf470416c8ec02bb5c1f7fa49)
Signed-off-by: Max Gekk 
---
 dev/create-release/spark-rm/Dockerfile | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/dev/create-release/spark-rm/Dockerfile 
b/dev/create-release/spark-rm/Dockerfile
index ffd60c07af0..c6555e0463d 100644
--- a/dev/create-release/spark-rm/Dockerfile
+++ b/dev/create-release/spark-rm/Dockerfile
@@ -42,7 +42,7 @@ ARG APT_INSTALL="apt-get install --no-install-recommends -y"
 #   We should use the latest Sphinx version once this is fixed.
 # TODO(SPARK-35375): Jinja2 3.0.0+ causes error when building with Sphinx.
 #   See also https://issues.apache.org/jira/browse/SPARK-35375.
-ARG PIP_PKGS="sphinx==3.0.4 mkdocs==1.1.2 numpy==1.19.4 
pydata_sphinx_theme==0.4.1 ipython==7.19.0 nbsphinx==0.8.0 numpydoc==1.1.0 
jinja2==2.11.3 twine==3.4.1 sphinx-plotly-directive==0.1.3 pandas==1.1.5 
pyarrow==3.0.0 plotly==5.4.0"
+ARG PIP_PKGS="sphinx==3.0.4 mkdocs==1.1.2 numpy==1.19.4 
pydata_sphinx_theme==0.4.1 ipython==7.19.0 nbsphinx==0.8.0 numpydoc==1.1.0 
jinja2==2.11.3 twine==3.4.1 sphinx-plotly-directive==0.1.3 pandas==1.1.5 
pyarrow==3.0.0 plotly==5.4.0 markupsafe==2.0.1 docutils<0.17"
 ARG GEM_PKGS="bundler:2.2.9"
 
 # Install extra needed repos and refresh.
@@ -79,9 +79,9 @@ RUN apt-get clean && apt-get update && $APT_INSTALL gnupg 
ca-certificates && \
   # Note that PySpark doc generation also needs pandoc due to nbsphinx
   $APT_INSTALL r-base r-base-dev && \
   $APT_INSTALL libcurl4-openssl-dev libgit2-dev libssl-dev libxml2-dev && \
-  $APT_INSTALL texlive-latex-base texlive texlive-fonts-extra texinfo qpdf && \
+  $APT_INSTALL texlive-latex-base texlive texlive-fonts-extra texinfo qpdf 
texlive-latex-extra && \
   $APT_INSTALL libfontconfig1-dev libharfbuzz-dev libfribidi-dev 
libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev && \
-  Rscript -e "install.packages(c('curl', 'xml2', 'httr', 'devtools', 
'testthat', 'knitr', 'rmarkdown', 'roxygen2', 'e1071', 'survival'), 
repos='https://cloud.r-project.org/')" && \
+  Rscript -e "install.packages(c('curl', 'xml2', 'httr', 'devtools', 
'testthat', 'knitr', 'rmarkdown', 'markdown', 'roxygen2', 'e1071', 'survival'), 
repos='https://cloud.r-project.org/')" && \
   Rscript -e "devtools::install_github('jimhester/lintr')" && \
   Rscript -e "devtools::install_version('pkgdown', version='2.0.1', 
repos='https://cloud.r-project.org')" && \
   Rscript -e "devtools::install_version('preferably', version='0.4', 
repos='https://cloud.r-project.org')" && \


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [MINOR] Remove unused import

2022-05-05 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new bf447046327 [MINOR] Remove unused import
bf447046327 is described below

commit bf447046327b80f176fd638db418d0513b9c2516
Author: panbingkun 
AuthorDate: Thu May 5 19:25:32 2022 +0300

[MINOR] Remove unused import

### What changes were proposed in this pull request?
Remove unused import in `numerics`.

### Why are the changes needed?
Cleanup

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
N/A

Closes #36454 from panbingkun/minor.

Authored-by: panbingkun 
Signed-off-by: Max Gekk 
---
 sql/catalyst/src/main/scala/org/apache/spark/sql/types/numerics.scala | 1 -
 1 file changed, 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/numerics.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/numerics.scala
index fea792f08d0..c3d893d82fc 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/numerics.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/numerics.scala
@@ -18,7 +18,6 @@
 package org.apache.spark.sql.types
 
 import scala.math.Numeric._
-import scala.math.Ordering
 
 import org.apache.spark.sql.catalyst.util.{MathUtils, SQLOrderingUtil}
 import org.apache.spark.sql.errors.QueryExecutionErrors


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-37938][SQL][TESTS] Use error classes in the parsing errors of partitions

2022-05-05 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 29ff671933e [SPARK-37938][SQL][TESTS] Use error classes in the parsing 
errors of partitions
29ff671933e is described below

commit 29ff671933e3b432e69a26761bc79856f21b82c7
Author: panbingkun 
AuthorDate: Thu May 5 19:22:28 2022 +0300

[SPARK-37938][SQL][TESTS] Use error classes in the parsing errors of 
partitions

## What changes were proposed in this pull request?
Migrate the following errors in QueryParsingErrors onto use error classes:

- emptyPartitionKeyError => INVALID_SQL_SYNTAX
- partitionTransformNotExpectedError => INVALID_SQL_SYNTAX
- descColumnForPartitionUnsupportedError => 
UNSUPPORTED_FEATURE.DESC_TABLE_COLUMN_PARTITION
- incompletePartitionSpecificationError => INVALID_SQL_SYNTAX

### Why are the changes needed?
Porting parsing errors of partitions to new error framework, improve test 
coverage, and document expected error messages in tests.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
By running new test:
```
$ build/sbt "sql/testOnly *QueryParsingErrorsSuite*"
```

Closes #36416 from panbingkun/SPARK-37938.

Authored-by: panbingkun 
Signed-off-by: Max Gekk 
---
 core/src/main/resources/error/error-classes.json   |  3 ++
 .../spark/sql/errors/QueryParsingErrors.scala  | 22 ++--
 .../spark/sql/catalyst/parser/DDLParserSuite.scala |  2 +-
 .../resources/sql-tests/results/describe.sql.out   |  2 +-
 .../spark/sql/errors/QueryErrorsSuiteBase.scala| 16 --
 .../spark/sql/errors/QueryParsingErrorsSuite.scala | 60 ++
 .../command/ShowPartitionsParserSuite.scala| 22 +---
 .../command/TruncateTableParserSuite.scala | 21 +---
 8 files changed, 125 insertions(+), 23 deletions(-)

diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index 24b50c4209a..3a7bc757f73 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -206,6 +206,9 @@
   "AES_MODE" : {
 "message" : [ "AES- with the padding  by the 
 function." ]
   },
+  "DESC_TABLE_COLUMN_PARTITION" : {
+"message" : [ "DESC TABLE COLUMN for a specific partition." ]
+  },
   "DISTRIBUTE_BY" : {
 "message" : [ "DISTRIBUTE BY clause." ]
   },
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
index ed5773f4f82..1d15557c9d0 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
@@ -77,7 +77,11 @@ object QueryParsingErrors extends QueryErrorsBase {
   }
 
   def emptyPartitionKeyError(key: String, ctx: PartitionSpecContext): 
Throwable = {
-new ParseException(s"Found an empty partition key '$key'.", ctx)
+new ParseException(
+  errorClass = "INVALID_SQL_SYNTAX",
+  messageParameters =
+Array(s"Partition key ${toSQLId(key)} must set value (can't be 
empty)."),
+  ctx)
   }
 
   def combinationQueryResultClausesUnsupportedError(ctx: 
QueryOrganizationContext): Throwable = {
@@ -243,7 +247,11 @@ object QueryParsingErrors extends QueryErrorsBase {
 
   def partitionTransformNotExpectedError(
   name: String, describe: String, ctx: ApplyTransformContext): Throwable = 
{
-new ParseException(s"Expected a column reference for transform $name: 
$describe", ctx)
+new ParseException(
+  errorClass = "INVALID_SQL_SYNTAX",
+  messageParameters =
+Array(s"Expected a column reference for transform ${toSQLId(name)}: 
$describe"),
+  ctx)
   }
 
   def tooManyArgumentsForTransformError(name: String, ctx: 
ApplyTransformContext): Throwable = {
@@ -298,12 +306,18 @@ object QueryParsingErrors extends QueryErrorsBase {
   }
 
   def descColumnForPartitionUnsupportedError(ctx: DescribeRelationContext): 
Throwable = {
-new ParseException("DESC TABLE COLUMN for a specific partition is not 
supported", ctx)
+new ParseException(
+  errorClass = "UNSUPPORTED_FEATURE",
+  messageParameters = Array("DESC_TABLE_COLUMN_PARTITION"),
+  ctx)
   }
 
   def incompletePartitionSpecificationError(
   key: String, ctx: DescribeRelationContext): Throwable = {
-new ParseException(s"PARTITION specification is incomplete: `$key`", ctx)
+new ParseException(
+  errorClass = "INVALID_SQL_SYNTAX",
+  messageParameters = Array(s"PARTITION specification is incomplete: 
${toSQLId(key)}"),
+  ctx)
   }
 
   def computeStatisticsNot

[spark] branch master updated (ba499b1dcc1 -> 215b1b9e518)

2022-05-05 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from ba499b1dcc1 [SPARK-39068][SQL] Make thriftserver and sparksql-cli 
support in-memory catalog
 add 215b1b9e518 [SPARK-30661][ML][PYTHON] KMeans blockify input vectors

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/ml/linalg/BLAS.scala|  83 +++-
 .../org/apache/spark/ml/linalg/Matrices.scala  |  72 ++--
 .../scala/org/apache/spark/ml/linalg/Vectors.scala |   7 +
 .../org/apache/spark/ml/clustering/KMeans.scala| 428 +++--
 .../org/apache/spark/mllib/clustering/KMeans.scala |  16 +
 .../apache/spark/ml/clustering/KMeansSuite.scala   | 373 +-
 python/pyspark/ml/clustering.py|  48 ++-
 7 files changed, 787 insertions(+), 240 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (6689b97ec76 -> ba499b1dcc1)

2022-05-05 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 6689b97ec76 [SPARK-35912][SQL][FOLLOW-UP] Add a legacy configuration 
for respecting nullability in DataFrame.schema.csv/json(ds)
 add ba499b1dcc1 [SPARK-39068][SQL] Make thriftserver and sparksql-cli 
support in-memory catalog

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/hive/thriftserver/SparkSQLEnv.scala  | 29 +++---
 .../spark/sql/hive/thriftserver/CliSuite.scala | 20 +++
 2 files changed, 40 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r54275 - in /dev/spark/v3.3.0-rc1-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/R/articles/ _site/api/R/deps/ _site/api/R/deps/bootstrap-5.1.0/ _site/api/R/deps/jquery-3.6.0/ _site/api

2022-05-05 Thread maxgekk
Author: maxgekk
Date: Thu May  5 08:51:39 2022
New Revision: 54275

Log:
Apache Spark v3.3.0-rc1 docs


[This commit notification would consist of 2649 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r54273 - /dev/spark/v3.3.0-rc1-bin/

2022-05-05 Thread maxgekk
Author: maxgekk
Date: Thu May  5 08:17:05 2022
New Revision: 54273

Log:
Apache Spark v3.3.0-rc1

Added:
dev/spark/v3.3.0-rc1-bin/
dev/spark/v3.3.0-rc1-bin/SparkR_3.3.0.tar.gz   (with props)
dev/spark/v3.3.0-rc1-bin/SparkR_3.3.0.tar.gz.asc
dev/spark/v3.3.0-rc1-bin/SparkR_3.3.0.tar.gz.sha512
dev/spark/v3.3.0-rc1-bin/pyspark-3.3.0.tar.gz   (with props)
dev/spark/v3.3.0-rc1-bin/pyspark-3.3.0.tar.gz.asc
dev/spark/v3.3.0-rc1-bin/pyspark-3.3.0.tar.gz.sha512
dev/spark/v3.3.0-rc1-bin/spark-3.3.0-bin-hadoop2.tgz   (with props)
dev/spark/v3.3.0-rc1-bin/spark-3.3.0-bin-hadoop2.tgz.asc
dev/spark/v3.3.0-rc1-bin/spark-3.3.0-bin-hadoop2.tgz.sha512
dev/spark/v3.3.0-rc1-bin/spark-3.3.0-bin-hadoop3-scala2.13.tgz   (with 
props)
dev/spark/v3.3.0-rc1-bin/spark-3.3.0-bin-hadoop3-scala2.13.tgz.asc
dev/spark/v3.3.0-rc1-bin/spark-3.3.0-bin-hadoop3-scala2.13.tgz.sha512
dev/spark/v3.3.0-rc1-bin/spark-3.3.0-bin-hadoop3.tgz   (with props)
dev/spark/v3.3.0-rc1-bin/spark-3.3.0-bin-hadoop3.tgz.asc
dev/spark/v3.3.0-rc1-bin/spark-3.3.0-bin-hadoop3.tgz.sha512
dev/spark/v3.3.0-rc1-bin/spark-3.3.0-bin-without-hadoop.tgz   (with props)
dev/spark/v3.3.0-rc1-bin/spark-3.3.0-bin-without-hadoop.tgz.asc
dev/spark/v3.3.0-rc1-bin/spark-3.3.0-bin-without-hadoop.tgz.sha512
dev/spark/v3.3.0-rc1-bin/spark-3.3.0.tgz   (with props)
dev/spark/v3.3.0-rc1-bin/spark-3.3.0.tgz.asc
dev/spark/v3.3.0-rc1-bin/spark-3.3.0.tgz.sha512

Added: dev/spark/v3.3.0-rc1-bin/SparkR_3.3.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.3.0-rc1-bin/SparkR_3.3.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.3.0-rc1-bin/SparkR_3.3.0.tar.gz.asc
==
--- dev/spark/v3.3.0-rc1-bin/SparkR_3.3.0.tar.gz.asc (added)
+++ dev/spark/v3.3.0-rc1-bin/SparkR_3.3.0.tar.gz.asc Thu May  5 08:17:05 2022
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJHBAABCgAxFiEEgPuOvo66aFBJiXA0kbXcgV2/ENMFAmJzh6QTHG1heGdla2tA
+YXBhY2hlLm9yZwAKCRCRtdyBXb8Q07HcEACkCSXRG7LXd0+/jBU49syIUIpOsUrN
+bgbq90ifbo6eCidbhj4wJl5OZO7tKCsV2IrbQYRHVP0Lq7GTCw1Fg4/mY4QiLkhi
+RWDizZrKrr9CbHXVFo7ZTlIiaxjnTOcIxauKRtu6rbIJdfIzZyRZwhAYerdK6WOx
+atrcWfrY/MhKW/v6/25b8R4SWpLssNXaGj5RRqhs/cn/Kjwus8WkBDzQIibcE2ac
+TJA+agMH2fkyC1sUaZOVEo1E68nUBV/vv5GyEtctjnESGDsh90/d+6X8L2cmME9H
+YGUO91cT1byN3LCR0FDqMSTea8yh3HsdTQ4Ly+s1Ia7h5UCwnDlpFXTyHsHX9sv7
+osXKz4b1ejogjxHlCiPpFgZ+P3gNa31mpJWmOwMLE49Cgxcn7DdZUXTZaAwZmwhH
+YURgYtpqrG+4oKpAOLGR+wx+2ZGv0a0QeLd4iTUEhxhiPFRw9QkNG5VUmHgz237b
+ZJzz9Ef0wLbaS5F6ZySk0FBqHTPgCsPZS3ZtmdU76zg37mNPej2xotLrLon2TXhN
+TJkcLI8azbRoqcrNSOWKjBWYbLJ3nG4bDNqEkqdi/QApiisnneuXX89w152SI8vF
+/GoyJK0xs6rjCsUURXWUZ/kzeVQHxtXfBNLk967+TSOHVDaKFehhS0hJbRNUP0jp
+O+gTjMZQfQh+Uw==
+=saiU
+-END PGP SIGNATURE-

Added: dev/spark/v3.3.0-rc1-bin/SparkR_3.3.0.tar.gz.sha512
==
--- dev/spark/v3.3.0-rc1-bin/SparkR_3.3.0.tar.gz.sha512 (added)
+++ dev/spark/v3.3.0-rc1-bin/SparkR_3.3.0.tar.gz.sha512 Thu May  5 08:17:05 2022
@@ -0,0 +1,3 @@
+SparkR_3.3.0.tar.gz: 98A2665A 04513C1A BE26952E 7396E3B7 AF63715B B6CCFAF3
+ CD8C04EC A9F2374F F9E159D3 635CA631 22E4DCEE 1F6B6FE9
+ F91F2E18 C9518AAF 713DC95A 3D39D496

Added: dev/spark/v3.3.0-rc1-bin/pyspark-3.3.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.3.0-rc1-bin/pyspark-3.3.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.3.0-rc1-bin/pyspark-3.3.0.tar.gz.asc
==
--- dev/spark/v3.3.0-rc1-bin/pyspark-3.3.0.tar.gz.asc (added)
+++ dev/spark/v3.3.0-rc1-bin/pyspark-3.3.0.tar.gz.asc Thu May  5 08:17:05 2022
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJHBAABCgAxFiEEgPuOvo66aFBJiXA0kbXcgV2/ENMFAmJzh6YTHG1heGdla2tA
+YXBhY2hlLm9yZwAKCRCRtdyBXb8Q0+4LD/wMGUzSXVcBCbUsVYtEtmoWjqBDZks7
+wN0SrnaI4UNXKlV0/rRbSMGRnVuqdwAlwJsb2RYNS56wswgTz9bhUB9cUUiSWftp
+Pf5XE9LqarekEF48kSYv6XOGCoXIA4wa9BdfzBF8Q43kCI4WTRibv9xaMv+F60or
+0xwgLl+8666M0L+Jg2tzrdI+cnkf42j07pL1HfqCsoZJSjxFmgSexXigZj+oSw+p
+4bTTofAWUfj+jILpPw8s7Vnf0Gvi7YEGpfchUv9oB8N1LzKLyS1HYNLGSAqbE1vm
+CvG9X8IzWQr4wIVqWSMWnsfImJL7EcA+G1SrUZP//d5UitvbF3ZZ5tMUvPYqgfKz
+S7kwyxuI1/uQ6CpJ5vxdrQQfRauYA4oWws4jWf2O6xOF5VIB1F0aF0//SLdauR+r
+GX4aYzQF+2DG6pIGJWYfrE9I4U4/LQLbdVVawItNnMKjphxD3Vi1kn9ITzJAtpLE
+75T9wPvlqSY7bLQlpBLd2+mModF2K+Gonr8Z06Xe0kr/R+tyrjrP5Oa++egLcaFo
+ZCr+L6WvkW8XnCfzU7T7d7wNKlskw7sh9BqOluMr+YW9rL+CKEYiM4JZrlUZCT3R
+rcLnVX47qigSw+WETHtMLA/TWYS6FQpKqs49cYbWAAT2K6mvmPiM1MupZSo6HgS+
+/KROoSIKLGVTRA==

[spark] branch branch-3.3 updated: [SPARK-35912][SQL][FOLLOW-UP] Add a legacy configuration for respecting nullability in DataFrame.schema.csv/json(ds)

2022-05-05 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 0f2e3ecb994 [SPARK-35912][SQL][FOLLOW-UP] Add a legacy configuration 
for respecting nullability in DataFrame.schema.csv/json(ds)
0f2e3ecb994 is described below

commit 0f2e3ecb9943aec91204c168b6402f3e5de53ca2
Author: Hyukjin Kwon 
AuthorDate: Thu May 5 16:23:28 2022 +0900

[SPARK-35912][SQL][FOLLOW-UP] Add a legacy configuration for respecting 
nullability in DataFrame.schema.csv/json(ds)

### What changes were proposed in this pull request?

This PR is a followup of https://github.com/apache/spark/pull/33436, that 
adds a legacy configuration. It's found that it can break a valid usacase 
(https://github.com/apache/spark/pull/33436/files#r863271189):

```scala
import org.apache.spark.sql.types._
val ds = Seq("a,", "a,b").toDS
spark.read.schema(
  StructType(
StructField("f1", StringType, nullable = false) ::
StructField("f2", StringType, nullable = false) :: Nil)
  ).option("mode", "DROPMALFORMED").csv(ds).show()
```

**Before:**

```
+---+---+
| f1| f2|
+---+---+
|  a|  b|
+---+---+
```

**After:**

```
+---++
| f1|  f2|
+---++
|  a|null|
|  a|   b|
+---++
```

This PR adds a configuration to restore **Before** behaviour.

### Why are the changes needed?

To avoid breakage of valid usecases.

### Does this PR introduce _any_ user-facing change?

Yes, it adds a new configuration 
`spark.sql.legacy.respectNullabilityInTextDatasetConversion` (`false` by 
default) to respect the nullability in 
`DataFrameReader.schema(schema).csv(dataset)` and 
`DataFrameReader.schema(schema).json(dataset)` when the user-specified schema 
is provided.

### How was this patch tested?

Unittests were added.

Closes #36435 from HyukjinKwon/SPARK-35912.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 6689b97ec76abe5bab27f02869f8f16b32530d1a)
Signed-off-by: Hyukjin Kwon 
---
 docs/sql-migration-guide.md|  2 +-
 .../main/scala/org/apache/spark/sql/internal/SQLConf.scala | 11 +++
 .../main/scala/org/apache/spark/sql/DataFrameReader.scala  | 13 +++--
 .../spark/sql/execution/datasources/csv/CSVSuite.scala | 10 ++
 .../spark/sql/execution/datasources/json/JsonSuite.scala   | 14 +-
 5 files changed, 46 insertions(+), 4 deletions(-)

diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index b6bfb0ed2be..a7757d6c9a0 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -30,7 +30,7 @@ license: |
 
   - Since Spark 3.3, the functions `lpad` and `rpad` have been overloaded to 
support byte sequences. When the first argument is a byte sequence, the 
optional padding pattern must also be a byte sequence and the result is a 
BINARY value. The default padding pattern in this case is the zero byte. To 
restore the legacy behavior of always returning string types, set 
`spark.sql.legacy.lpadRpadAlwaysReturnString` to `true`.
 
-  - Since Spark 3.3, Spark turns a non-nullable schema into nullable for API 
`DataFrameReader.schema(schema: StructType).json(jsonDataset: Dataset[String])` 
and `DataFrameReader.schema(schema: StructType).csv(csvDataset: 
Dataset[String])` when the schema is specified by the user and contains 
non-nullable fields.
+  - Since Spark 3.3, Spark turns a non-nullable schema into nullable for API 
`DataFrameReader.schema(schema: StructType).json(jsonDataset: Dataset[String])` 
and `DataFrameReader.schema(schema: StructType).csv(csvDataset: 
Dataset[String])` when the schema is specified by the user and contains 
non-nullable fields. To restore the legacy behavior of respecting the 
nullability, set `spark.sql.legacy.respectNullabilityInTextDatasetConversion` 
to `true`.
 
   - Since Spark 3.3, when the date or timestamp pattern is not specified, 
Spark converts an input string to a date/timestamp using the `CAST` expression 
approach. The changes affect CSV/JSON datasources and parsing of partition 
values. In Spark 3.2 or earlier, when the date or timestamp pattern is not set, 
Spark uses the default patterns: `-MM-dd` for dates and `-MM-dd 
HH:mm:ss` for timestamps. After the changes, Spark still recognizes the pattern 
together with
 
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 76f3d1f5a84..b6230f71383 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/in

[spark] branch master updated: [SPARK-35912][SQL][FOLLOW-UP] Add a legacy configuration for respecting nullability in DataFrame.schema.csv/json(ds)

2022-05-05 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6689b97ec76 [SPARK-35912][SQL][FOLLOW-UP] Add a legacy configuration 
for respecting nullability in DataFrame.schema.csv/json(ds)
6689b97ec76 is described below

commit 6689b97ec76abe5bab27f02869f8f16b32530d1a
Author: Hyukjin Kwon 
AuthorDate: Thu May 5 16:23:28 2022 +0900

[SPARK-35912][SQL][FOLLOW-UP] Add a legacy configuration for respecting 
nullability in DataFrame.schema.csv/json(ds)

### What changes were proposed in this pull request?

This PR is a followup of https://github.com/apache/spark/pull/33436, that 
adds a legacy configuration. It's found that it can break a valid usacase 
(https://github.com/apache/spark/pull/33436/files#r863271189):

```scala
import org.apache.spark.sql.types._
val ds = Seq("a,", "a,b").toDS
spark.read.schema(
  StructType(
StructField("f1", StringType, nullable = false) ::
StructField("f2", StringType, nullable = false) :: Nil)
  ).option("mode", "DROPMALFORMED").csv(ds).show()
```

**Before:**

```
+---+---+
| f1| f2|
+---+---+
|  a|  b|
+---+---+
```

**After:**

```
+---++
| f1|  f2|
+---++
|  a|null|
|  a|   b|
+---++
```

This PR adds a configuration to restore **Before** behaviour.

### Why are the changes needed?

To avoid breakage of valid usecases.

### Does this PR introduce _any_ user-facing change?

Yes, it adds a new configuration 
`spark.sql.legacy.respectNullabilityInTextDatasetConversion` (`false` by 
default) to respect the nullability in 
`DataFrameReader.schema(schema).csv(dataset)` and 
`DataFrameReader.schema(schema).json(dataset)` when the user-specified schema 
is provided.

### How was this patch tested?

Unittests were added.

Closes #36435 from HyukjinKwon/SPARK-35912.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 docs/sql-migration-guide.md|  2 +-
 .../main/scala/org/apache/spark/sql/internal/SQLConf.scala | 11 +++
 .../main/scala/org/apache/spark/sql/DataFrameReader.scala  | 13 +++--
 .../spark/sql/execution/datasources/csv/CSVSuite.scala | 10 ++
 .../spark/sql/execution/datasources/json/JsonSuite.scala   | 14 +-
 5 files changed, 46 insertions(+), 4 deletions(-)

diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index 32b90da1917..59b8d47d306 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -30,7 +30,7 @@ license: |
 
   - Since Spark 3.3, the functions `lpad` and `rpad` have been overloaded to 
support byte sequences. When the first argument is a byte sequence, the 
optional padding pattern must also be a byte sequence and the result is a 
BINARY value. The default padding pattern in this case is the zero byte. To 
restore the legacy behavior of always returning string types, set 
`spark.sql.legacy.lpadRpadAlwaysReturnString` to `true`.
 
-  - Since Spark 3.3, Spark turns a non-nullable schema into nullable for API 
`DataFrameReader.schema(schema: StructType).json(jsonDataset: Dataset[String])` 
and `DataFrameReader.schema(schema: StructType).csv(csvDataset: 
Dataset[String])` when the schema is specified by the user and contains 
non-nullable fields.
+  - Since Spark 3.3, Spark turns a non-nullable schema into nullable for API 
`DataFrameReader.schema(schema: StructType).json(jsonDataset: Dataset[String])` 
and `DataFrameReader.schema(schema: StructType).csv(csvDataset: 
Dataset[String])` when the schema is specified by the user and contains 
non-nullable fields. To restore the legacy behavior of respecting the 
nullability, set `spark.sql.legacy.respectNullabilityInTextDatasetConversion` 
to `true`.
 
   - Since Spark 3.3, when the date or timestamp pattern is not specified, 
Spark converts an input string to a date/timestamp using the `CAST` expression 
approach. The changes affect CSV/JSON datasources and parsing of partition 
values. In Spark 3.2 or earlier, when the date or timestamp pattern is not set, 
Spark uses the default patterns: `-MM-dd` for dates and `-MM-dd 
HH:mm:ss` for timestamps. After the changes, Spark still recognizes the pattern 
together with
 
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 8876d780799..4c0eccbf35d 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -3025,6 +3025,17 @@ object SQLConf {
 .intConf
 .createOptional
 
+  val LEGACY_RE