date:20220926

[spark] branch master updated: [SPARK-40557][CONNECT] Update generated proto files for Spark Connect

2022-09-26 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 072575c9e6f [SPARK-40557][CONNECT] Update generated proto files for 
Spark Connect
072575c9e6f is described below

commit 072575c9e6fc304f09e01ad0ee180c8f309ede91
Author: Martin Grund 
AuthorDate: Tue Sep 27 10:55:47 2022 +0900

[SPARK-40557][CONNECT] Update generated proto files for Spark Connect

### What changes were proposed in this pull request?
This patch cleans up the generated proto files from the initial Spark 
Connect import. The previous files had a Databricks specific go module path 
embedded in the generated Python descriptor. This is now removed. No new 
functionality added.

### Why are the changes needed?

Cleanup.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
The generated files are used during the regular testing for Spark Connect.

Closes #37993 from grundprinzip/spark-connect-clean1.

Authored-by: Martin Grund 
Signed-off-by: Hyukjin Kwon 
---
 connect/src/main/buf.gen.yaml   |  9 -
 python/pyspark/sql/connect/proto/base_pb2.py| 14 --
 python/pyspark/sql/connect/proto/commands_pb2.py|  6 ++
 python/pyspark/sql/connect/proto/expressions_pb2.py |  6 ++
 python/pyspark/sql/connect/proto/relations_pb2.py   | 10 +++---
 python/pyspark/sql/connect/proto/types_pb2.py   |  6 ++
 6 files changed, 13 insertions(+), 38 deletions(-)

diff --git a/connect/src/main/buf.gen.yaml b/connect/src/main/buf.gen.yaml
index 01e31d3c8b4..e3e15e549e8 100644
--- a/connect/src/main/buf.gen.yaml
+++ b/connect/src/main/buf.gen.yaml
@@ -26,15 +26,6 @@ plugins:
 out: gen/proto/python
   - remote: buf.build/grpc/plugins/python:v1.47.0-1
 out: gen/proto/python
-  - remote: buf.build/protocolbuffers/plugins/go:v1.28.0-1
-out: gen/proto/go
-opt:
-  - paths=source_relative
-  - remote: buf.build/grpc/plugins/go:v1.2.0-1
-out: gen/proto/go
-opt:
-  - paths=source_relative
-  - require_unimplemented_servers=false
   - remote: buf.build/grpc/plugins/ruby:v1.47.0-1
 out: gen/proto/ruby
   - remote: buf.build/protocolbuffers/plugins/ruby:v21.2.0-1
diff --git a/python/pyspark/sql/connect/proto/base_pb2.py 
b/python/pyspark/sql/connect/proto/base_pb2.py
index 3adb77f77d6..910f9d644e3 100644
--- a/python/pyspark/sql/connect/proto/base_pb2.py
+++ b/python/pyspark/sql/connect/proto/base_pb2.py
@@ -28,16 +28,12 @@ from google.protobuf import symbol_database as 
_symbol_database
 _sym_db = _symbol_database.Default()
 
 
-from pyspark.sql.connect.proto import (
-commands_pb2 as spark_dot_connect_dot_commands__pb2,
-)
-from pyspark.sql.connect.proto import (
-relations_pb2 as spark_dot_connect_dot_relations__pb2,
-)
+from pyspark.sql.connect.proto import commands_pb2 as 
spark_dot_connect_dot_commands__pb2
+from pyspark.sql.connect.proto import relations_pb2 as 
spark_dot_connect_dot_relations__pb2
 
 
 DESCRIPTOR = _descriptor_pool.Default().AddSerializedFile(
-
b'\n\x18spark/connect/base.proto\x12\rspark.connect\x1a\x1cspark/connect/commands.proto\x1a\x1dspark/connect/relations.proto"t\n\x04Plan\x12-\n\x04root\x18\x01
 
\x01(\x0b\x32\x17.spark.connect.RelationH\x00R\x04root\x12\x32\n\x07\x63ommand\x18\x02
 
\x01(\x0b\x32\x16.spark.connect.CommandH\x00R\x07\x63ommandB\t\n\x07op_type"\xdb\x01\n\x07Request\x12\x1b\n\tclient_id\x18\x01
 \x01(\tR\x08\x63lientId\x12\x45\n\x0cuser_context\x18\x02 
\x01(\x0b\x32".spark.connect.Request.UserContextR\x0buse [...]
+
b'\n\x18spark/connect/base.proto\x12\rspark.connect\x1a\x1cspark/connect/commands.proto\x1a\x1dspark/connect/relations.proto"t\n\x04Plan\x12-\n\x04root\x18\x01
 
\x01(\x0b\x32\x17.spark.connect.RelationH\x00R\x04root\x12\x32\n\x07\x63ommand\x18\x02
 
\x01(\x0b\x32\x16.spark.connect.CommandH\x00R\x07\x63ommandB\t\n\x07op_type"\xdb\x01\n\x07Request\x12\x1b\n\tclient_id\x18\x01
 \x01(\tR\x08\x63lientId\x12\x45\n\x0cuser_context\x18\x02 
\x01(\x0b\x32".spark.connect.Request.UserContextR\x0buse [...]
 )
 
 _builder.BuildMessageAndEnumDescriptors(DESCRIPTOR, globals())
@@ -45,9 +41,7 @@ _builder.BuildTopDescriptorsAndMessages(DESCRIPTOR, 
"spark.connect.base_pb2", gl
 if _descriptor._USE_C_DESCRIPTORS == False:
 
 DESCRIPTOR._options = None
-DESCRIPTOR._serialized_options = (
-
b"\n\036org.apache.spark.connect.protoP\001Z)github.com/databricks/spark-connect/proto"
-)
+DESCRIPTOR._serialized_options = 
b"\n\036org.apache.spark.connect.protoP\001"
 _RESPONSE_METRICS_METRICOBJECT_EXECUTIONMETRICSENTRY._options = None
 _RESPONSE_METRICS_METRICOBJECT_EXECUTIONMETRICSENTRY._serialized_options = 
b"8\001"
 _PLAN._serialized_start = 104
diff --git

[spark] branch master updated: [SPARK-40561][PS] Implement `min_count` in `GroupBy.min`

2022-09-26 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 211ce40888d [SPARK-40561][PS] Implement `min_count` in `GroupBy.min`
211ce40888d is described below

commit 211ce40888dcaaa3c3ffbd316109e17d0caad4e3
Author: Ruifeng Zheng 
AuthorDate: Tue Sep 27 09:53:09 2022 +0800

[SPARK-40561][PS] Implement `min_count` in `GroupBy.min`

### What changes were proposed in this pull request?
Implement `min_count` in `GroupBy.min`

### Why are the changes needed?
for API coverage

### Does this PR introduce _any_ user-facing change?
yes, new parameter `min_count` supported
```
>>> df.groupby("D").min(min_count=3).sort_index()
 A  BC
D
a  1.0  False  3.0
b  NaN   None  NaN

```

### How was this patch tested?
added UT and doctest

Closes #37998 from zhengruifeng/ps_groupby_min.

Authored-by: Ruifeng Zheng 
Signed-off-by: Ruifeng Zheng 
---
 python/pyspark/pandas/groupby.py| 41 ++---
 python/pyspark/pandas/tests/test_groupby.py |  2 ++
 2 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/python/pyspark/pandas/groupby.py b/python/pyspark/pandas/groupby.py
index 6d36cfecce6..7085d2ec059 100644
--- a/python/pyspark/pandas/groupby.py
+++ b/python/pyspark/pandas/groupby.py
@@ -643,16 +643,23 @@ class GroupBy(Generic[FrameLike], metaclass=ABCMeta):
 bool_to_numeric=True,
 )
 
-def min(self, numeric_only: Optional[bool] = False) -> FrameLike:
+def min(self, numeric_only: Optional[bool] = False, min_count: int = -1) 
-> FrameLike:
 """
 Compute min of group values.
 
+.. versionadded:: 3.3.0
+
 Parameters
 --
 numeric_only : bool, default False
 Include only float, int, boolean columns. If None, will attempt to 
use
 everything, then use only numeric data.
 
+.. versionadded:: 3.4.0
+min_count : bool, default -1
+The required number of valid values to perform the operation. If 
fewer
+than min_count non-NA values are present the result will be NA.
+
 .. versionadded:: 3.4.0
 
 See Also
@@ -663,7 +670,7 @@ class GroupBy(Generic[FrameLike], metaclass=ABCMeta):
 Examples
 
 >>> df = ps.DataFrame({"A": [1, 2, 1, 2], "B": [True, False, False, 
True],
-..."C": [3, 4, 3, 4], "D": ["a", "b", "b", "a"]})
+..."C": [3, 4, 3, 4], "D": ["a", "a", "b", "a"]})
 >>> df.groupby("A").min().sort_index()
B  C  D
 A
@@ -677,9 +684,37 @@ class GroupBy(Generic[FrameLike], metaclass=ABCMeta):
 A
 1  False  3
 2  False  4
+
+>>> df.groupby("D").min().sort_index()
+   A  B  C
+D
+a  1  False  3
+b  1  False  3
+
+
+>>> df.groupby("D").min(min_count=3).sort_index()
+ A  BC
+D
+a  1.0  False  3.0
+b  NaN   None  NaN
 """
+if not isinstance(min_count, int):
+raise TypeError("min_count must be integer")
+
+if min_count > 0:
+
+def min(col: Column) -> Column:
+return F.when(
+F.count(F.when(~F.isnull(col), F.lit(0))) < min_count, 
F.lit(None)
+).otherwise(F.min(col))
+
+else:
+
+def min(col: Column) -> Column:
+return F.min(col)
+
 return self._reduce_for_stat_function(
-F.min, accepted_spark_types=(NumericType, BooleanType) if 
numeric_only else None
+min, accepted_spark_types=(NumericType, BooleanType) if 
numeric_only else None
 )
 
 # TODO: sync the doc.
diff --git a/python/pyspark/pandas/tests/test_groupby.py 
b/python/pyspark/pandas/tests/test_groupby.py
index 4a57a3421df..f0b3a04be17 100644
--- a/python/pyspark/pandas/tests/test_groupby.py
+++ b/python/pyspark/pandas/tests/test_groupby.py
@@ -1401,8 +1401,10 @@ class GroupByTest(PandasOnSparkTestCase, TestUtils):
 
 def test_min(self):
 self._test_stat_func(lambda groupby_obj: groupby_obj.min())
+self._test_stat_func(lambda groupby_obj: groupby_obj.min(min_count=2))
 self._test_stat_func(lambda groupby_obj: 
groupby_obj.min(numeric_only=None))
 self._test_stat_func(lambda groupby_obj: 
groupby_obj.min(numeric_only=True))
+self._test_stat_func(lambda groupby_obj: 
groupby_obj.min(numeric_only=True, min_count=2))
 
 def test_max(self):
 self._test_stat_func(lambda groupby_obj: groupby_obj.max())


-
To

svn commit: r57014 - /dev/spark/v3.3.1-rc2-bin/

2022-09-26 Thread yumwang

Author: yumwang
Date: Tue Sep 27 00:54:34 2022
New Revision: 57014

Log:
Apache Spark v3.3.1-rc2

Added:
dev/spark/v3.3.1-rc2-bin/
dev/spark/v3.3.1-rc2-bin/SparkR_3.3.1.tar.gz   (with props)
dev/spark/v3.3.1-rc2-bin/SparkR_3.3.1.tar.gz.asc
dev/spark/v3.3.1-rc2-bin/SparkR_3.3.1.tar.gz.sha512
dev/spark/v3.3.1-rc2-bin/pyspark-3.3.1.tar.gz   (with props)
dev/spark/v3.3.1-rc2-bin/pyspark-3.3.1.tar.gz.asc
dev/spark/v3.3.1-rc2-bin/pyspark-3.3.1.tar.gz.sha512
dev/spark/v3.3.1-rc2-bin/spark-3.3.1-bin-hadoop2.tgz   (with props)
dev/spark/v3.3.1-rc2-bin/spark-3.3.1-bin-hadoop2.tgz.asc
dev/spark/v3.3.1-rc2-bin/spark-3.3.1-bin-hadoop2.tgz.sha512
dev/spark/v3.3.1-rc2-bin/spark-3.3.1-bin-hadoop3-scala2.13.tgz   (with 
props)
dev/spark/v3.3.1-rc2-bin/spark-3.3.1-bin-hadoop3-scala2.13.tgz.asc
dev/spark/v3.3.1-rc2-bin/spark-3.3.1-bin-hadoop3-scala2.13.tgz.sha512
dev/spark/v3.3.1-rc2-bin/spark-3.3.1-bin-hadoop3.tgz   (with props)
dev/spark/v3.3.1-rc2-bin/spark-3.3.1-bin-hadoop3.tgz.asc
dev/spark/v3.3.1-rc2-bin/spark-3.3.1-bin-hadoop3.tgz.sha512
dev/spark/v3.3.1-rc2-bin/spark-3.3.1-bin-without-hadoop.tgz   (with props)
dev/spark/v3.3.1-rc2-bin/spark-3.3.1-bin-without-hadoop.tgz.asc
dev/spark/v3.3.1-rc2-bin/spark-3.3.1-bin-without-hadoop.tgz.sha512
dev/spark/v3.3.1-rc2-bin/spark-3.3.1.tgz   (with props)
dev/spark/v3.3.1-rc2-bin/spark-3.3.1.tgz.asc
dev/spark/v3.3.1-rc2-bin/spark-3.3.1.tgz.sha512

Added: dev/spark/v3.3.1-rc2-bin/SparkR_3.3.1.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.3.1-rc2-bin/SparkR_3.3.1.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.3.1-rc2-bin/SparkR_3.3.1.tar.gz.asc
==
--- dev/spark/v3.3.1-rc2-bin/SparkR_3.3.1.tar.gz.asc (added)
+++ dev/spark/v3.3.1-rc2-bin/SparkR_3.3.1.tar.gz.asc Tue Sep 27 00:54:34 2022
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJHBAABCgAxFiEEhnJ9Q+c6QV9noLGhTmiz5s1HNlMFAmMxxpYTHHl1bXdhbmdA
+YXBhY2hlLm9yZwAKCRBOaLPmzUc2Ux1eD/45yzmu90koq1UPSOU73/WYlSGRVPTd
+afaGT5NEXqYyQbqkGIaDrd8ga/uKAXAmNiyZH4AsJn/Br6QDKR0sVgcG20z+QWn0
+xNfr/wRVKxGQlclNtIYjapTWPaa0WCLzyFseDN9NDwmLsoAcEaOulN+hnRlJ39ou
+nTC8EjRnohu3qRnPpX10j2m5VNpp41+bSXjlwN+uID8iDF06YW08JqRixYsUBVgD
+abv616xpkKgpFlg13iGathS/gybi0g5xRcHdCHfVx7IWnYDZlfGqnPURqoO3zTVO
+pI43eJG+7T76zPuST1dm8E5g/4IZu7cUgP2AUEgNDu8X2WwqOEYFIqssfRkeAbEJ
+onZ2TGr4hlwOGKLElIcDZuxR2iQlnzYdlVVujISCmDfv9KqKmGhSlRxrm4Zd8kQz
+V11iHePNdBTuP2AcDqGq0wjnSFaBa3JMv0bSrYzZT1WJXz31foKo2yZgX0SZOVyE
+tLvQTPH8NQWDQiMdcIaM3LmoQyMEjGGH7UGaa7/PfxxaCYm23KHr7ia+joGR8iMv
+sR5CHHm/5erJYMvd0x9QqPuHGLgXpOlXEN/88sS+UI5r8d3/hFnrX6yGp5EhB9ND
+IlTlA+3p7JHkEbWva5wrzuMZbDNUmjuq0RTxDY0/ocmZF2//BFwNSwIJXkM+mlFt
+n7GlQtpMsSa2gQ==
+=MLEK
+-END PGP SIGNATURE-

Added: dev/spark/v3.3.1-rc2-bin/SparkR_3.3.1.tar.gz.sha512
==
--- dev/spark/v3.3.1-rc2-bin/SparkR_3.3.1.tar.gz.sha512 (added)
+++ dev/spark/v3.3.1-rc2-bin/SparkR_3.3.1.tar.gz.sha512 Tue Sep 27 00:54:34 2022
@@ -0,0 +1 @@
+84f14789024788fa0cf688a2456a7f738da5f503a82d3d59c2042c725ee97d7d40f6be1b938e749aaad5c859784a440a579a21663767b79adee7cdc34db72f8a
  SparkR_3.3.1.tar.gz

Added: dev/spark/v3.3.1-rc2-bin/pyspark-3.3.1.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.3.1-rc2-bin/pyspark-3.3.1.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.3.1-rc2-bin/pyspark-3.3.1.tar.gz.asc
==
--- dev/spark/v3.3.1-rc2-bin/pyspark-3.3.1.tar.gz.asc (added)
+++ dev/spark/v3.3.1-rc2-bin/pyspark-3.3.1.tar.gz.asc Tue Sep 27 00:54:34 2022
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJHBAABCgAxFiEEhnJ9Q+c6QV9noLGhTmiz5s1HNlMFAmMxxpsTHHl1bXdhbmdA
+YXBhY2hlLm9yZwAKCRBOaLPmzUc2U6X2EADOU45dZxwVH+8O5Fl96v3YI0wEfcdK
+VEvqb+weYfucXwT6ecrbT8/gw/ZoK/jLiJDAJUiU4IfA1jdRRsTYAuYt22NFJZA9
+SwIkk5kUwbOcrlZ4J0bvHcKD7DTiMeHyfP0t3WnaTW1kVe9tTggbGSECu+tAU+5S
+s8LkrNYHmm73Ujoc8uqjdXgs1KO4RmejRUa3YZ/BKLPx5kCucJ1YR1bGs+g2eLxJ
+6gxBgM2HHewWB2fj644Vzotg+pbBLWfsFDmLyjK5IUvUBDuLx0mOZMtGkt0LwJi3
+/iN0+x53H0s46lc3/+3/i93QmxwPox+jrVFq7R9a9eclc/V6MV3JPIVneb0AqF9u
+xJLTt9jiaOIYa4SlAoYdRVWYzE/9vhNtR+ARS5bFRpFn3jm69YrKhDLDjEDa8GLT
+Hyc4U/1liBVl3eOjwyrChNxjR6kmvqDIiHM5SCFwOxRTjeozxqNcJNgtlNwSXTdv
+QvCqZoUnzw8YqCb4D42deN3T5th4drpRu6tK+XizEYhDf2sf1stgowoCa/ZjksSp
+IP3qn7y8T5N3h2GQ0xkHXDBvDdYhHMM81gsATgnNUUjgb+gOHVqW4WmYTn+f+dlG
+GQogqkpwuOSrg7NIFBOP0PegOvZDZ9KyAGWKgtW1lwR4La5OsJLGlV5pSBpkqOR5
+o9kBjXpMJn8tDg==
+=WNPp
+-END PGP SIGNATURE-

Added:

[spark] branch master updated: [SPARK-35242][SQL] Support changing session catalog's default database

2022-09-26 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 576c4046742 [SPARK-35242][SQL] Support changing session catalog's 
default database
576c4046742 is described below

commit 576c4046742e1fd5a0e93b87e9b29bac0df83788
Author: Gabor Roczei 
AuthorDate: Tue Sep 27 08:28:26 2022 +0800

[SPARK-35242][SQL] Support changing session catalog's default database

### What changes were proposed in this pull request?

This PR is a follow-up PR for #32364. It has been closed by github-actions 
because it hasn't been updated in a while. The previous PR has added a new 
custom parameter (spark.sql.catalog.$SESSION_CATALOG_NAME.defaultDatabase) to 
configure the session catalog's default database which is required by some use 
cases where the user does not have access to the default database.

Therefore I have created a new PR based on this and added these changes in 
addition:

- Rebased / updated the previous PR to the latest master branch version
- Deleted the DEFAULT_DATABASE  static member from 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 and refactored the code regarding this

### Why are the changes needed?

If our user does not have any permissions for the Hive default database in 
Ranger, it will fail with the following error:

```
22/08/26 18:36:21 INFO  metastore.RetryingMetaStoreClient: [main]: 
RetryingMetaStoreClient proxy=class 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient 
ugi=hrt_10ROOT.HWX.SITE (auth:KERBEROS) retries=1 delay=1 lifetime=0
org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: 
MetaException(message:Permission denied: user [hrt_10] does not have [USE] 
privilege on [default])
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:110)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:223)
  at 
org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:150)
  at 
org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:144)
```
The idea is that we introduce a new configuration parameter where we can 
set a different database name for the default database. Our user has enough 
permissions for this  in Ranger.

For example:

```spark-shell --conf 
spark.sql.catalog.spark_catalog.defaultDatabase=other_db```

### Does this PR introduce _any_ user-facing change?

There will be a new configuration parameter as I mentioned above but the 
default value is "default" as it was previously.

### How was this patch tested?

1) With github action (all tests passed)

https://github.com/roczei/spark/actions/runs/2934863118

2) Manually tested with Ranger + Hive

Scenario a) hrt_10 does not have access to the default database in Hive:

```
[hrt_10quasar-thbnqr-2 ~]$ spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
22/08/26 18:14:18 WARN  conf.HiveConf: [main]: HiveConf of name 
hive.masking.algo does not exist
22/08/26 18:14:30 WARN  cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: 
[dispatcher-event-loop-17]: Attempted to request executors before the AM has 
registered!

...

scala> spark.sql("use other")
22/08/26 18:18:47 INFO  conf.HiveConf: [main]: Found configuration file 
file:/etc/hive/conf/hive-site.xml
22/08/26 18:18:48 WARN  conf.HiveConf: [main]: HiveConf of name 
hive.masking.algo does not exist
22/08/26 18:18:48 WARN  client.HiveClientImpl: [main]: Detected HiveConf 
hive.execution.engine is 'tez' and will be reset to 'mr' to disable useless 
hive logic
Hive Session ID = 2188764e-d0dc-41b3-b714-f89b03cb3d6d
22/08/26 18:18:48 INFO  SessionState: [main]: Hive Session ID = 
2188764e-d0dc-41b3-b714-f89b03cb3d6d
22/08/26 18:18:50 INFO  metastore.HiveMetaStoreClient: [main]: HMS client 
filtering is enabled.
22/08/26 18:18:50 INFO  metastore.HiveMetaStoreClient: [main]: Trying to 
connect to metastore with URI 
thrift://quasar-thbnqr-4.quasar-thbnqr.root.hwx.site:9083
22/08/26 18:18:50 INFO  metastore.HiveMetaStoreClient: [main]: 
HMSC::open(): Could not find delegation token. Creating KERBEROS-based thrift 
connection.
22/08/26 18:18:50 INFO  metastore.HiveMetaStoreClient: [main]: Opened a 
connection to metastore, current connections: 1
22/08/26 18:18:50 INFO  metastore.HiveMetaStoreClient: [main]: Connected to 
metastore.
22/08/26 18:18:50 INFO  metastore.RetryingMetaStoreClient: [main]: 
RetryingMetaStoreClient proxy=class

[spark] branch master updated (7e39d9bfef3 -> 7230f598688)

2022-09-26 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 7e39d9bfef3 [SPARK-40552][BUILD][INFRA] Upgrade `protobuf-python` to 
4.21.6
 add 7230f598688 [SPARK-40357][SQL] Migrate window type check failures onto 
error classes

No new revisions were added by this update.

Summary of changes:
 core/src/main/resources/error/error-classes.json   |  50 +
 .../org/apache/spark/SparkThrowableHelper.scala|   3 +-
 .../sql/catalyst/analysis/TypeCheckResult.scala|   2 +-
 .../catalyst/expressions/windowExpressions.scala   |  85 ++-
 .../results/postgreSQL/window_part3.sql.out|  15 ++-
 .../native/windowFrameCoercion.sql.out |  71 -
 .../sql-tests/results/udf/udf-window.sql.out   |  99 +++--
 .../resources/sql-tests/results/window.sql.out | 116 ++--
 .../spark/sql/DataFrameWindowFramesSuite.scala | 118 +++--
 9 files changed, 484 insertions(+), 75 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (778acd411e3 -> 7e39d9bfef3)

2022-09-26 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 778acd411e3 [SPARK-40478][DOCS] Add create datasource table options 
docs
 add 7e39d9bfef3 [SPARK-40552][BUILD][INFRA] Upgrade `protobuf-python` to 
4.21.6

No new revisions were added by this update.

Summary of changes:
 dev/create-release/spark-rm/Dockerfile | 2 +-
 dev/requirements.txt   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (8fdaf548bcc -> 778acd411e3)

2022-09-26 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 8fdaf548bcc [SPARK-40560][SQL] Rename `message` to `messageTemplate` 
in the `STANDARD` format of errors
 add 778acd411e3 [SPARK-40478][DOCS] Add create datasource table options 
docs

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-ddl-create-table-datasource.md | 13 +
 1 file changed, 13 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-40560][SQL] Rename `message` to `messageTemplate` in the `STANDARD` format of errors

2022-09-26 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8fdaf548bcc [SPARK-40560][SQL] Rename `message` to `messageTemplate` 
in the `STANDARD` format of errors
8fdaf548bcc is described below

commit 8fdaf548bcc51630f7cfae8a17930c987b29fbd3
Author: Max Gekk 
AuthorDate: Mon Sep 26 14:31:55 2022 +0300

[SPARK-40560][SQL] Rename `message` to `messageTemplate` in the `STANDARD` 
format of errors

### What changes were proposed in this pull request?
In the `STANDARD` format of error messages, rename the `message` field to 
`messageTemplate`.

### Why are the changes needed?
Because the field contains a template of an error message, actually.

### Does this PR introduce _any_ user-facing change?
Yes.

### How was this patch tested?
By running the affected test suites:
```
$ build/sbt "core/testOnly *SparkThrowableSuite"
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly 
org.apache.spark.sql.hive.thriftserver.CliSuite"
$ build/sbt -Phive -Phive-thriftserver "test:testOnly 
*ThriftServerWithSparkContextInBinarySuite"
```

Closes #37997 from MaxGekk/messageTemplate.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 core/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala | 8 
 core/src/main/scala/org/apache/spark/SparkThrowableHelper.scala   | 2 +-
 core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala| 6 +++---
 .../scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala   | 2 +-
 .../sql/hive/thriftserver/ThriftServerWithSparkContextSuite.scala | 2 +-
 5 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala 
b/core/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala
index 8d4ae3a877d..9d6dd9dde07 100644
--- a/core/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala
+++ b/core/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala
@@ -74,12 +74,12 @@ class ErrorClassesJsonReader(jsonFileURLs: Seq[URL]) {
 assert(errorInfo.subClass.isDefined == subErrorClass.isDefined)
 
 if (subErrorClass.isEmpty) {
-  errorInfo.messageFormat
+  errorInfo.messageTemplate
 } else {
   val errorSubInfo = errorInfo.subClass.get.getOrElse(
 subErrorClass.get,
 throw SparkException.internalError(s"Cannot find sub error class 
'$errorClass'"))
-  errorInfo.messageFormat + " " + errorSubInfo.messageFormat
+  errorInfo.messageTemplate + " " + errorSubInfo.messageTemplate
 }
   }
 
@@ -102,7 +102,7 @@ private case class ErrorInfo(
 sqlState: Option[String]) {
   // For compatibility with multi-line error messages
   @JsonIgnore
-  val messageFormat: String = message.mkString("\n")
+  val messageTemplate: String = message.mkString("\n")
 }
 
 /**
@@ -114,5 +114,5 @@ private case class ErrorInfo(
 private case class ErrorSubInfo(message: Seq[String]) {
   // For compatibility with multi-line error messages
   @JsonIgnore
-  val messageFormat: String = message.mkString("\n")
+  val messageTemplate: String = message.mkString("\n")
 }
diff --git a/core/src/main/scala/org/apache/spark/SparkThrowableHelper.scala 
b/core/src/main/scala/org/apache/spark/SparkThrowableHelper.scala
index d503f400d00..9073a73dec4 100644
--- a/core/src/main/scala/org/apache/spark/SparkThrowableHelper.scala
+++ b/core/src/main/scala/org/apache/spark/SparkThrowableHelper.scala
@@ -92,7 +92,7 @@ private[spark] object SparkThrowableHelper {
   if (errorSubClass != null) g.writeStringField("errorSubClass", 
errorSubClass)
   if (format == STANDARD) {
 val finalClass = errorClass + Option(errorSubClass).map("." + 
_).getOrElse("")
-g.writeStringField("message", 
errorReader.getMessageTemplate(finalClass))
+g.writeStringField("messageTemplate", 
errorReader.getMessageTemplate(finalClass))
   }
   val sqlState = e.getSqlState
   if (sqlState != null) g.writeStringField("sqlState", sqlState)
diff --git a/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala 
b/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
index 266683b1eca..191304bc353 100644
--- a/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
+++ b/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
@@ -128,7 +128,7 @@ class SparkThrowableSuite extends SparkFunSuite {
   test("Message format invariants") {
 val messageFormats = errorReader.errorInfoMap
   .filterKeys(!_.startsWith("_LEGACY_ERROR_TEMP_"))
-  .values.toSeq.flatMap { i => Seq(i.messageFormat) }
+  .values.toSeq.flatMap { i => Seq(i.messageTemplate) }
 checkCondition(messageFormats, s => s != null)
 checkIfUnique(messageFormats)

[spark] branch master updated: [SPARK-40501][SQL] Add PushProjectionThroughLimit for Optimizer

2022-09-26 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c02aef01bbc [SPARK-40501][SQL] Add PushProjectionThroughLimit for 
Optimizer
c02aef01bbc is described below

commit c02aef01bbc1530884c23850224293336a5430a2
Author: panbingkun 
AuthorDate: Mon Sep 26 15:50:07 2022 +0800

[SPARK-40501][SQL] Add PushProjectionThroughLimit for Optimizer

### What changes were proposed in this pull request?
The pr aim to add `PushProjectionThroughLimit` for `Optimizer`, improve 
query performance.

### Why are the changes needed?
"normalize" the order of project and limit operator **{project(..., 
limit(...)) => limit(..., project(...))}**, so that we can have more chances to 
merge adjacent projects or limits **{eg: by SpecialLimits}**.

---
When I query a big table(size / per day: 10T, column size: 1219) with limit 
1
 A.Scenario 1(run sql in spark-sql) - The results will be fetched soon 
- The optimization of CollectLimitExec has taken effect
1.SQL: select * from xxx where ..._day = '20220919' limit 1
https://user-images.githubusercontent.com/15246973/191184857-fa3d7f08-f0ea-4d70-a406-48828a3bb761.png;>
2.Spark UI:
https://user-images.githubusercontent.com/15246973/191204519-86de05d3-40d4-4acd-9833-86bf318ecb72.png;>

 B.Scenario 2(run sql in spark-shell) - It took a long time to fetch 
out(still running after 20 minutes...)
1.Code: spark.sql("select * from xxx where ..._day = '20220919' limit 
1").show()
https://user-images.githubusercontent.com/15246973/191211875-c29c3bae-1339-414b-84bc-2195545b8c35.png;>
2.Spark UI:
https://user-images.githubusercontent.com/15246973/191212244-22108810-dd66-46bd-bea7-a7dab70a1a06.png;>

 C.Scenario 3(run sql in spark-shell) - The results will be fetched soon
1.Code: spark.sql("select * from xxx where ..._day = '20220919'").show(1)
https://user-images.githubusercontent.com/15246973/191215182-bf278f71-c3ee-4028-8372-c9ed69431b6f.png;>
2.Spark UI:
https://user-images.githubusercontent.com/15246973/191215437-3a291b34-5257-485a-bc18-6c0e0865d7ce.png;>

## The diff between Scenario 2 and Scenario3 is focus on "Optimized Logical 
Plan"
https://user-images.githubusercontent.com/15246973/191216863-367c764d-2aa0-4c79-b240-0ebb6735937f.png;>
https://user-images.githubusercontent.com/15246973/191217175-02213b0c-5d09-4154-85f7-18751734afd0.png;>

# After pr:
 Scenario 2(run sql in spark-shell) - The results will be fetched soon 
- The optimization of CollectLimitExec has taken effect
1.Code: spark.sql("select * from xxx where ..._day = '20220919' limit 
1").show()
https://user-images.githubusercontent.com/15246973/191880203-2f951644-b59b-4dc8-9e80-c02c458fc28b.png;>
2.Spark UI:
https://user-images.githubusercontent.com/15246973/191219627-82655d01-1d5e-44ee-af49-e8da51c9ca72.png;>

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Add new UT & Pass GA.

Closes #37941 from panbingkun/improve_shell_limit.

Authored-by: panbingkun 
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/catalyst/optimizer/Optimizer.scala   |  1 +
 .../optimizer/PushProjectionThroughLimit.scala | 39 ++
 .../PushProjectionThroughLimitSuite.scala  | 90 ++
 3 files changed, 130 insertions(+)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
index 3ac75554a2b..2664fd63806 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
@@ -81,6 +81,7 @@ abstract class Optimizer(catalogManager: CatalogManager)
   Seq(
 // Operator push down
 PushProjectionThroughUnion,
+PushProjectionThroughLimit,
 ReorderJoin,
 EliminateOuterJoin,
 PushDownPredicates,
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PushProjectionThroughLimit.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PushProjectionThroughLimit.scala
new file mode 100644
index 000..6280cc5e42c
--- /dev/null
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PushProjectionThroughLimit.scala
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0

[spark] branch master updated: [SPARK-40530][SQL] Add error-related developer APIs

2022-09-26 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2bf26dfbd70 [SPARK-40530][SQL] Add error-related developer APIs
2bf26dfbd70 is described below

commit 2bf26dfbd708cbbbe8a51dda6972296055661e07
Author: Wenchen Fan 
AuthorDate: Mon Sep 26 15:35:38 2022 +0800

[SPARK-40530][SQL] Add error-related developer APIs

### What changes were proposed in this pull request?

Third-party Spark plugins may define their own errors using the same 
framework as Spark: put error definition in json files. This PR moves some 
error-related code to a new file and marks them as developer APIs, so that 
others can reuse them instead of writing its own json reader.

### Why are the changes needed?

make it easier for Spark plugins to define errors.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

Closes #37969 from cloud-fan/error.

Authored-by: Wenchen Fan 
Signed-off-by: Wenchen Fan 
---
 .../org/apache/spark/ErrorClassesJSONReader.scala  | 118 +
 .../org/apache/spark/SparkThrowableHelper.scala| 105 ++
 .../org/apache/spark/SparkThrowableSuite.scala |  54 +++---
 3 files changed, 163 insertions(+), 114 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala 
b/core/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala
new file mode 100644
index 000..8d4ae3a877d
--- /dev/null
+++ b/core/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import java.net.URL
+
+import scala.collection.JavaConverters._
+import scala.collection.immutable.SortedMap
+
+import com.fasterxml.jackson.annotation.JsonIgnore
+import com.fasterxml.jackson.core.`type`.TypeReference
+import com.fasterxml.jackson.databind.json.JsonMapper
+import com.fasterxml.jackson.module.scala.DefaultScalaModule
+import org.apache.commons.text.StringSubstitutor
+
+import org.apache.spark.annotation.DeveloperApi
+
+/**
+ * A reader to load error information from one or more JSON files. Note that, 
if one error appears
+ * in more than one JSON files, the latter wins. Please read 
core/src/main/resources/error/README.md
+ * for more details.
+ */
+@DeveloperApi
+class ErrorClassesJsonReader(jsonFileURLs: Seq[URL]) {
+  assert(jsonFileURLs.nonEmpty)
+
+  private def readAsMap(url: URL): SortedMap[String, ErrorInfo] = {
+val mapper: JsonMapper = JsonMapper.builder()
+  .addModule(DefaultScalaModule)
+  .build()
+mapper.readValue(url, new TypeReference[SortedMap[String, ErrorInfo]]() {})
+  }
+
+  // Exposed for testing
+  private[spark] val errorInfoMap = jsonFileURLs.map(readAsMap).reduce(_ ++ _)
+
+  def getErrorMessage(errorClass: String, messageParameters: Map[String, 
String]): String = {
+val messageTemplate = getMessageTemplate(errorClass)
+val sub = new StringSubstitutor(messageParameters.asJava)
+sub.setEnableUndefinedVariableException(true)
+try {
+  sub.replace(messageTemplate.replaceAll("<([a-zA-Z0-9_-]+)>", 
"\\$\\{$1\\}"))
+} catch {
+  case _: IllegalArgumentException => throw SparkException.internalError(
+s"Undefined error message parameter for error class: '$errorClass'. " +
+  s"Parameters: $messageParameters")
+}
+  }
+
+  def getMessageTemplate(errorClass: String): String = {
+val errorClasses = errorClass.split("\\.")
+assert(errorClasses.length == 1 || errorClasses.length == 2)
+
+val mainErrorClass = errorClasses.head
+val subErrorClass = errorClasses.tail.headOption
+val errorInfo = errorInfoMap.getOrElse(
+  mainErrorClass,
+  throw SparkException.internalError(s"Cannot find main error class 
'$errorClass'"))
+assert(errorInfo.subClass.isDefined == subErrorClass.isDefined)
+
+if (subErrorClass.isEmpty) {
+  errorInfo.messageFormat
+} else {
+

[spark] branch master updated: [SPARK-40540][SQL] Migrate compilation errors onto error classes

2022-09-26 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 39f60578dc3 [SPARK-40540][SQL] Migrate compilation errors onto error 
classes
39f60578dc3 is described below

commit 39f60578dc3c189e7f5cfb2a74f0502408aa123e
Author: Max Gekk 
AuthorDate: Mon Sep 26 10:26:27 2022 +0300

[SPARK-40540][SQL] Migrate compilation errors onto error classes

### What changes were proposed in this pull request?
In the PR, I propose to migrate 100 compilation errors onto temporary error 
classes with the prefix `_LEGACY_ERROR_TEMP_10xx`. The error message will not 
include the error classes, so, in this way we will preserve the existing 
behaviour.

### Why are the changes needed?
The migration on temporary error classes allows to gather statistics about 
errors and detect most popular error classes. After that we could prioritise 
the work on migration.

The new error class name prefix `_LEGACY_ERROR_TEMP_` proposed here kind of 
marks the error as developer-facing, not user-facing. Developers can still get 
the error class programmatically via the `SparkThrowable` interface, so that 
they can build error infra with it. End users won't see the error class in the 
message. This allows us to do the error migration very quickly, and we can 
refine the error classes and mark them as user-facing later (naming them 
properly, adding tests, etc.).

### Does this PR introduce _any_ user-facing change?
No. The error messages should be almost the same by default.

### How was this patch tested?
By running the modified test suites:
```
$ build/sbt "core/testOnly *SparkThrowableSuite"
$ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite"
$ build/sbt "test:testOnly *AnalysisErrorSuite"
$ build/sbt "test:testOnly *.HiveUDAFSuite"
$ build/sbt "test:testOnly *.DataFrameSuite"
$ build/sbt "test:testOnly *SQLQuerySuite"
```

Closes #37973 from MaxGekk/legacy-error-temp-compliation.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 core/src/main/resources/error/error-classes.json   | 502 +++
 .../spark/sql/errors/QueryCompilationErrors.scala  | 536 ++---
 .../sql/catalyst/analysis/AnalysisErrorSuite.scala |   2 +-
 .../results/ansi/string-functions.sql.out  |  32 +-
 .../results/ceil-floor-with-scale-param.sql.out|  32 +-
 .../sql-tests/results/change-column.sql.out|  34 +-
 .../results/columnresolution-negative.sql.out  |  15 +-
 .../test/resources/sql-tests/results/count.sql.out |   7 +-
 .../sql-tests/results/csv-functions.sql.out|  24 +-
 .../sql-tests/results/group-by-ordinal.sql.out |   4 +-
 .../sql-tests/results/join-lateral.sql.out |  15 +-
 .../sql-tests/results/json-functions.sql.out   |  72 ++-
 .../sql-tests/results/order-by-ordinal.sql.out |  45 +-
 .../test/resources/sql-tests/results/pivot.sql.out |  14 +-
 .../results/postgreSQL/window_part3.sql.out|  21 +-
 .../sql-tests/results/show_columns.sql.out |   8 +-
 .../results/sql-compatibility-functions.sql.out|  14 +-
 .../sql-tests/results/string-functions.sql.out |  32 +-
 .../sql-tests/results/table-aliases.sql.out|  30 +-
 .../results/table-valued-functions.sql.out |   2 +-
 .../sql-tests/results/timestamp-ntz.sql.out|  16 +-
 .../sql-tests/results/udaf/udaf-group-by.sql.out   |  15 +-
 .../resources/sql-tests/results/udaf/udaf.sql.out  |  31 +-
 .../sql-tests/results/udf/udf-pivot.sql.out|  14 +-
 .../sql-tests/results/udf/udf-udaf.sql.out |  31 +-
 .../sql-tests/results/udf/udf-window.sql.out   |   7 +-
 .../resources/sql-tests/results/window.sql.out |  25 +-
 .../org/apache/spark/sql/DataFrameSuite.scala  |  20 +-
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala |  41 +-
 .../spark/sql/connector/DataSourceV2SQLSuite.scala |  13 +-
 .../execution/command/DescribeTableSuiteBase.scala |   2 +-
 .../execution/command/v1/DescribeTableSuite.scala  |   4 +-
 .../spark/sql/hive/execution/HiveUDAFSuite.scala   |  18 +-
 33 files changed, 1428 insertions(+), 250 deletions(-)

diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index 5694b2c9d0f..aff6576af73 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -1163,5 +1163,507 @@
 "message" : [
   "."
 ]
+  },
+  "_LEGACY_ERROR_TEMP_1000" : {
+"message" : [
+  "LEGACY store assignment policy is disallowed in Spark data source V2. 
Please set the configuration  to other values."
+]
+  },
+  "_LEGACY_ERROR_TEMP_1001" : {
+"message" : [
+  "USING column `` cannot be resolved on the  side of the 
join.

[spark] branch master updated (b4014eb13a1 -> 0deba9fdd1a)

2022-09-26 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from b4014eb13a1 [SPARK-40330][PS] Implement `Series.searchsorted`
 add 0deba9fdd1a [SPARK-40554][PS] Make `ddof` in `DataFrame.sem` and 
`Series.sem` accept arbitary integers

No new revisions were added by this update.

Summary of changes:
 python/pyspark/pandas/generic.py  | 18 +-
 python/pyspark/pandas/tests/test_stats.py |  6 ++
 2 files changed, 19 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-40557][CONNECT] Update generated proto files for Spark Connect

[spark] branch master updated: [SPARK-40561][PS] Implement `min_count` in `GroupBy.min`

svn commit: r57014 - /dev/spark/v3.3.1-rc2-bin/

[spark] branch master updated: [SPARK-35242][SQL] Support changing session catalog's default database

[spark] branch master updated (7e39d9bfef3 -> 7230f598688)

[spark] branch master updated (778acd411e3 -> 7e39d9bfef3)

[spark] branch master updated (8fdaf548bcc -> 778acd411e3)

[spark] branch master updated: [SPARK-40560][SQL] Rename `message` to `messageTemplate` in the `STANDARD` format of errors

[spark] branch master updated: [SPARK-40501][SQL] Add PushProjectionThroughLimit for Optimizer

[spark] branch master updated: [SPARK-40530][SQL] Add error-related developer APIs

[spark] branch master updated: [SPARK-40540][SQL] Migrate compilation errors onto error classes

[spark] branch master updated (b4014eb13a1 -> 0deba9fdd1a)

12 matches

Site Navigation

Mail list logo

Footer information