from:"dongjoon"

(spark) branch master updated (5765c0150659 -> 3d2b7fea7fe0)

2024-04-11 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 5765c0150659 [SPARK-47820][INFRA] Run `ANSI` SQL CI twice per day
 add 3d2b7fea7fe0 [SPARK-4][PYTHON][SS][TESTS] Add spark connect test 
for python streaming data source

No new revisions were added by this update.

Summary of changes:
 dev/sparktestsupport/modules.py  |  1 +
 ...das_map.py => test_parity_python_streaming_datasource.py} | 12 ++--
 python/pyspark/sql/tests/test_python_streaming_datasource.py |  6 +-
 3 files changed, 8 insertions(+), 11 deletions(-)
 copy python/pyspark/sql/tests/connect/{test_parity_pandas_map.py => 
test_parity_python_streaming_datasource.py} (78%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47820][INFRA] Run `ANSI` SQL CI twice per day

2024-04-11 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5765c0150659 [SPARK-47820][INFRA] Run `ANSI` SQL CI twice per day
5765c0150659 is described below

commit 5765c0150659deb43674bb167b632f4bf045ce06
Author: Dongjoon Hyun 
AuthorDate: Thu Apr 11 10:03:55 2024 -0700

[SPARK-47820][INFRA] Run `ANSI` SQL CI twice per day

### What changes were proposed in this pull request?

This PR aims to run `ANSI` SQL CI twice per day for Apache Spark 4.0.0.
- https://github.com/apache/spark/actions/workflows/build_ansi.yml

### Why are the changes needed?

To detect ANSI failure more easily.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Since this is a daily CI, this should be tested after merging.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46010 from dongjoon-hyun/SPARK-47820.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/build_ansi.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/workflows/build_ansi.yml b/.github/workflows/build_ansi.yml
index b39c1ec20e22..d9f587ae203b 100644
--- a/.github/workflows/build_ansi.yml
+++ b/.github/workflows/build_ansi.yml
@@ -21,7 +21,7 @@ name: "Build / ANSI (master, Hadoop 3, JDK 17, Scala 2.13)"
 
 on:
   schedule:
-- cron: '0 1 * * *'
+- cron: '0 1,13 * * *'
 
 jobs:
   run-build:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47799][BUILD] Add `-g` to javac compile parameters when using SBT package jar

2024-04-11 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f98d4e644e1b [SPARK-47799][BUILD] Add `-g` to javac compile parameters 
when using SBT package jar
f98d4e644e1b is described below

commit f98d4e644e1b5d4a6d515c69cd22a5c0bcf4990b
Author: sychen 
AuthorDate: Thu Apr 11 09:08:41 2024 -0700

[SPARK-47799][BUILD] Add `-g` to javac compile parameters when using SBT 
package jar

### What changes were proposed in this pull request?
This PR aims to add `-g` to javac compile parameters when using SBT package 
jar.

> -g
Generates all debugging information, including local variables. By default, 
only line number and source file information is generated.

https://docs.oracle.com/en/java/javase/17/docs/specs/man/javac.html

### Why are the changes needed?
`maven-compiler-plugin` defaults to debug=true, `plexus-compiler-javac` 
will add the parameter `-g`.

SBT does not have this behavior by default, which leads to some differences 
between the jars of maven and sbt builds, although the code logic is the same.


https://maven.apache.org/plugins/maven-compiler-plugin/compile-mojo.html#debug


https://github.com/apache/maven-compiler-plugin/blob/736da68adf543cf56cd82a68e5ad28d397ace2f4/src/main/java/org/apache/maven/plugin/compiler/AbstractCompilerMojo.java#L734


https://github.com/codehaus-plexus/plexus-compiler/blob/6ae79d7f2feca3a02e75f7661468abac6a9a0a11/plexus-compilers/plexus-compiler-javac/src/main/java/org/codehaus/plexus/compiler/javac/JavacCompiler.java#L279-L285

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
GA

```bash
./build/sbt common-utils/package
```

local test

### Current

https://github.com/apache/spark/assets/3898450/6233068c-b15b-46e4-a070-d3d36db540a4";>

### PR
https://github.com/apache/spark/assets/3898450/a5c156b1-d1a2-4847-9265-0c608a129091";>

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #45983 from cxzl25/SPARK-47799.

Authored-by: sychen 
Signed-off-by: Dongjoon Hyun 
---
 project/SparkBuild.scala | 1 +
 1 file changed, 1 insertion(+)

diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index 951d5970c845..bcaa51ec30ff 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -311,6 +311,7 @@ object SparkBuild extends PomBuild {
 
 (Compile / javacOptions) ++= Seq(
   "-encoding", UTF_8.name(),
+  "-g",
   "--release", javaVersion.value
 ),
 // This -target and Xlint:unchecked options cannot be set in the Compile 
configuration scope since


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (7c639a1579de -> d0e7645a5a34)

2024-04-11 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 7c639a1579de [SPARK-47817][PYTHON][PS][BUILD] Update `pandas` to 2.2.2
 add d0e7645a5a34 [SPARK-47797][K8S] Skip deleting pod from k8s if the pod 
does not exists

No new revisions were added by this update.

Summary of changes:
 .../cluster/k8s/ExecutorPodsLifecycleManager.scala |  8 +--
 .../k8s/ExecutorPodsLifecycleManagerSuite.scala| 26 ++
 2 files changed, 32 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47817][PYTHON][PS][BUILD] Update `pandas` to 2.2.2

2024-04-11 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7c639a1579de [SPARK-47817][PYTHON][PS][BUILD] Update `pandas` to 2.2.2
7c639a1579de is described below

commit 7c639a1579de9ca052965a1133fb1f159dd72701
Author: Bjørn Jørgensen 
AuthorDate: Thu Apr 11 07:47:39 2024 -0700

[SPARK-47817][PYTHON][PS][BUILD] Update `pandas` to 2.2.2

### What changes were proposed in this pull request?
Update `pandas` from 2.2.1 to 2.2.2

### Why are the changes needed?
[Release notes](https://pandas.pydata.org/docs/whatsnew/v2.2.2.html)

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46009 from bjornjorgensen/pandas2.2.2.

Authored-by: Bjørn Jørgensen 
Signed-off-by: Dongjoon Hyun 
---
 dev/infra/Dockerfile   | 4 ++--
 python/pyspark/pandas/supported_api_gen.py | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile
index 378264b7afa3..0b0a478b4bf4 100644
--- a/dev/infra/Dockerfile
+++ b/dev/infra/Dockerfile
@@ -86,10 +86,10 @@ RUN mkdir -p /usr/local/pypy/pypy3.8 && \
 ln -sf /usr/local/pypy/pypy3.8/bin/pypy /usr/local/bin/pypy3.8 && \
 ln -sf /usr/local/pypy/pypy3.8/bin/pypy /usr/local/bin/pypy3
 RUN curl -sS https://bootstrap.pypa.io/get-pip.py | pypy3
-RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas<=2.2.1' scipy coverage 
matplotlib lxml
+RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas<=2.2.2' scipy coverage 
matplotlib lxml
 
 
-ARG BASIC_PIP_PKGS="numpy pyarrow>=15.0.0 six==1.16.0 pandas<=2.2.1 scipy 
plotly>=4.8 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 
scikit-learn>=1.3.2"
+ARG BASIC_PIP_PKGS="numpy pyarrow>=15.0.0 six==1.16.0 pandas<=2.2.2 scipy 
plotly>=4.8 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 
scikit-learn>=1.3.2"
 # Python deps for Spark Connect
 ARG CONNECT_PIP_PKGS="grpcio==1.62.0 grpcio-status==1.62.0 protobuf==4.25.1 
googleapis-common-protos==1.56.4"
 
diff --git a/python/pyspark/pandas/supported_api_gen.py 
b/python/pyspark/pandas/supported_api_gen.py
index b24223c581c6..bbf0b3cbc3d6 100644
--- a/python/pyspark/pandas/supported_api_gen.py
+++ b/python/pyspark/pandas/supported_api_gen.py
@@ -38,7 +38,7 @@ from pyspark.pandas.exceptions import 
PandasNotImplementedError
 MAX_MISSING_PARAMS_SIZE = 5
 COMMON_PARAMETER_SET = {"kwargs", "args", "cls"}
 MODULE_GROUP_MATCH = [(pd, ps), (pdw, psw), (pdg, psg)]
-PANDAS_LATEST_VERSION = "2.2.1"
+PANDAS_LATEST_VERSION = "2.2.2"
 
 RST_HEADER = """
 =


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-46812][CONNECT][FOLLOW-UP] Make `handleCreateResourceProfileCommand` private

2024-04-10 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a54e53bc0c3c [SPARK-46812][CONNECT][FOLLOW-UP] Make 
`handleCreateResourceProfileCommand` private
a54e53bc0c3c is described below

commit a54e53bc0c3c0f4c593a7ad74c3742ff9d770236
Author: Ruifeng Zheng 
AuthorDate: Wed Apr 10 23:12:12 2024 -0700

[SPARK-46812][CONNECT][FOLLOW-UP] Make `handleCreateResourceProfileCommand` 
private

### What changes were proposed in this pull request?
Make `handleCreateResourceProfileCommand` private

### Why are the changes needed?
it should not be exposed to users

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #45998 from zhengruifeng/private_handleCreateResourceProfileCommand.

Authored-by: Ruifeng Zheng 
Signed-off-by: Dongjoon Hyun 
---
 .../org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
index 690f2bfded3b..96db45c5c63e 100644
--- 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
+++ 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
@@ -3421,7 +3421,7 @@ class SparkConnectPlanner(
 .build())
   }
 
-  def handleCreateResourceProfileCommand(
+  private def handleCreateResourceProfileCommand(
   createResourceProfileCommand: CreateResourceProfileCommand,
   responseObserver: StreamObserver[proto.ExecutePlanResponse]): Unit = {
 val rp = createResourceProfileCommand.getProfile


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.4 updated: [MINOR][DOCS] Make the link of spark properties with YARN more accurate

2024-04-10 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new b1ea200f6078 [MINOR][DOCS] Make the link of spark properties with YARN 
more accurate
b1ea200f6078 is described below

commit b1ea200f6078158586a4bc13b39511adb71e7a57
Author: beliefer 
AuthorDate: Wed Apr 10 20:33:43 2024 -0700

[MINOR][DOCS] Make the link of spark properties with YARN more accurate

### What changes were proposed in this pull request?
This PR propose to make the link of spark properties with YARN more 
accurate.

### Why are the changes needed?
Currently, the link of `YARN Spark Properties` is just the page of 
`running-on-yarn.html`.
We should add the anchor point.

### Does this PR introduce _any_ user-facing change?
'Yes'.
More convenient for readers to read.

### How was this patch tested?
N/A

### Was this patch authored or co-authored using generative AI tooling?
'No'.

Closes #45994 from beliefer/accurate-yarn-link.

Authored-by: beliefer 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit aca3d1025e2d85c02737456bfb01163c87ca3394)
Signed-off-by: Dongjoon Hyun 
---
 docs/job-scheduling.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/job-scheduling.md b/docs/job-scheduling.md
index 8694ee82e1b8..9639054f6129 100644
--- a/docs/job-scheduling.md
+++ b/docs/job-scheduling.md
@@ -57,7 +57,7 @@ Resource allocation can be configured as follows, based on 
the cluster type:
   on the cluster (`spark.executor.instances` as configuration property), while 
`--executor-memory`
   (`spark.executor.memory` configuration property) and `--executor-cores` 
(`spark.executor.cores` configuration
   property) control the resources per executor. For more information, see the
-  [YARN Spark Properties](running-on-yarn.html).
+  [YARN Spark Properties](running-on-yarn.html#spark-properties).
 
 A second option available on Mesos is _dynamic sharing_ of CPU cores. In this 
mode, each Spark application
 still has a fixed and independent memory allocation (set by 
`spark.executor.memory`), but when the


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.5 updated: [MINOR][DOCS] Make the link of spark properties with YARN more accurate

2024-04-10 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 7d1e77c3072e [MINOR][DOCS] Make the link of spark properties with YARN 
more accurate
7d1e77c3072e is described below

commit 7d1e77c3072e278d2552a57746bf3ab7abc58c41
Author: beliefer 
AuthorDate: Wed Apr 10 20:33:43 2024 -0700

[MINOR][DOCS] Make the link of spark properties with YARN more accurate

### What changes were proposed in this pull request?
This PR propose to make the link of spark properties with YARN more 
accurate.

### Why are the changes needed?
Currently, the link of `YARN Spark Properties` is just the page of 
`running-on-yarn.html`.
We should add the anchor point.

### Does this PR introduce _any_ user-facing change?
'Yes'.
More convenient for readers to read.

### How was this patch tested?
N/A

### Was this patch authored or co-authored using generative AI tooling?
'No'.

Closes #45994 from beliefer/accurate-yarn-link.

Authored-by: beliefer 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit aca3d1025e2d85c02737456bfb01163c87ca3394)
Signed-off-by: Dongjoon Hyun 
---
 docs/job-scheduling.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/job-scheduling.md b/docs/job-scheduling.md
index 0875bd5558e5..8f10d0788e63 100644
--- a/docs/job-scheduling.md
+++ b/docs/job-scheduling.md
@@ -57,7 +57,7 @@ Resource allocation can be configured as follows, based on 
the cluster type:
   on the cluster (`spark.executor.instances` as configuration property), while 
`--executor-memory`
   (`spark.executor.memory` configuration property) and `--executor-cores` 
(`spark.executor.cores` configuration
   property) control the resources per executor. For more information, see the
-  [YARN Spark Properties](running-on-yarn.html).
+  [YARN Spark Properties](running-on-yarn.html#spark-properties).
 
 A second option available on Mesos is _dynamic sharing_ of CPU cores. In this 
mode, each Spark application
 still has a fixed and independent memory allocation (set by 
`spark.executor.memory`), but when the


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [MINOR][DOCS] Make the link of spark properties with YARN more accurate

2024-04-10 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new aca3d1025e2d [MINOR][DOCS] Make the link of spark properties with YARN 
more accurate
aca3d1025e2d is described below

commit aca3d1025e2d85c02737456bfb01163c87ca3394
Author: beliefer 
AuthorDate: Wed Apr 10 20:33:43 2024 -0700

[MINOR][DOCS] Make the link of spark properties with YARN more accurate

### What changes were proposed in this pull request?
This PR propose to make the link of spark properties with YARN more 
accurate.

### Why are the changes needed?
Currently, the link of `YARN Spark Properties` is just the page of 
`running-on-yarn.html`.
We should add the anchor point.

### Does this PR introduce _any_ user-facing change?
'Yes'.
More convenient for readers to read.

### How was this patch tested?
N/A

### Was this patch authored or co-authored using generative AI tooling?
'No'.

Closes #45994 from beliefer/accurate-yarn-link.

Authored-by: beliefer 
Signed-off-by: Dongjoon Hyun 
---
 docs/job-scheduling.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/job-scheduling.md b/docs/job-scheduling.md
index da007608adf3..21f02a1c4d00 100644
--- a/docs/job-scheduling.md
+++ b/docs/job-scheduling.md
@@ -53,7 +53,7 @@ Resource allocation can be configured as follows, based on 
the cluster type:
   on the cluster (`spark.executor.instances` as configuration property), while 
`--executor-memory`
   (`spark.executor.memory` configuration property) and `--executor-cores` 
(`spark.executor.cores` configuration
   property) control the resources per executor. For more information, see the
-  [YARN Spark Properties](running-on-yarn.html).
+  [YARN Spark Properties](running-on-yarn.html#spark-properties).
 
 Note that none of the modes currently provide memory sharing across 
applications. If you would like to share
 data this way, we recommend running a single server application that can serve 
multiple requests by querying


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47781][SPARK-47791][SPARK-47798][DOCS][FOLLOWUP] Update the decimal mapping remarks with JDBC data sources

2024-04-10 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 10a308774b71 [SPARK-47781][SPARK-47791][SPARK-47798][DOCS][FOLLOWUP] 
Update the decimal mapping remarks with JDBC data sources
10a308774b71 is described below

commit 10a308774b71cf774f1e661c5f55829b61c00521
Author: Kent Yao 
AuthorDate: Wed Apr 10 14:05:49 2024 -0700

[SPARK-47781][SPARK-47791][SPARK-47798][DOCS][FOLLOWUP] Update the decimal 
mapping remarks with JDBC data sources

### What changes were proposed in this pull request?

Followup of SPARK-47781, SPARK-47791 and SPARK-47798, to update the decimal 
mapping remarks with JDBC data sources

### Why are the changes needed?

doc revision

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

doc build

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #45984 from yaooqinn/SPARK-47781-F.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 docs/sql-data-sources-jdbc.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/sql-data-sources-jdbc.md b/docs/sql-data-sources-jdbc.md
index 84e2d7f06381..d99e7bc9e519 100644
--- a/docs/sql-data-sources-jdbc.md
+++ b/docs/sql-data-sources-jdbc.md
@@ -544,7 +544,7 @@ are also available to connect MySQL, may have different 
mapping rules.
 
   DECIMAL(p,s) [UNSIGNED]
   DecimalType(min(38, p),(min(18,s)))
-  The column type is bounded to DecimalType(38, 18), thus if any value 
of this column have a actual presion greater 38 will fail with 
DECIMAL_PRECISION_EXCEEDS_MAX_PRECISION 
+  The column type is bounded to DecimalType(38, 18), if 'p>38', the 
fraction part will be truncated if exceeded. And if any value of this column 
have an actual precision greater 38 will fail with 
NUMERIC_VALUE_OUT_OF_RANGE.WITHOUT_SUGGESTION error
 
 
   DATE
@@ -845,7 +845,7 @@ as the activated JDBC Driver. Note that, different JDBC 
drivers, or different ve
 
   numeric, decimal
   DecimalType
-  
+  Since PostgreSQL 15, 's' can be negative. If 's<0' it'll be adjusted 
to DecimalType(min(p-s, 38), 0); Otherwise, DecimalType(p, s), and if 'p>38', 
the fraction part will be truncated if exceeded. And if any value of this 
column have an actual precision greater 38 will fail with 
NUMERIC_VALUE_OUT_OF_RANGE.WITHOUT_SUGGESTION error
 
 
   character varying(n), varchar(n)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.5 updated: [SPARK-47790][BUILD][3.5] Upgrade `commons-io` to 2.16.1

2024-04-10 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 2c43d92511b2 [SPARK-47790][BUILD][3.5] Upgrade `commons-io` to 2.16.1
2c43d92511b2 is described below

commit 2c43d92511b27fc0c106e97c1e9e8f5253e4b894
Author: Dongjoon Hyun 
AuthorDate: Wed Apr 10 01:37:00 2024 -0700

[SPARK-47790][BUILD][3.5] Upgrade `commons-io` to 2.16.1

### What changes were proposed in this pull request?

This PR aims to upgrade `commons-io` to 2.16.1.

### Why are the changes needed?

To bring the latest bug fixes
- https://commons.apache.org/proper/commons-io/changes-report.html#a2.16.1 
(2024-04-04)
- https://commons.apache.org/proper/commons-io/changes-report.html#a2.16.0 
(2024-03-25)
- https://commons.apache.org/proper/commons-io/changes-report.html#a2.15.1 
(2023-11-24)
- https://commons.apache.org/proper/commons-io/changes-report.html#a2.15.0 
(2023-10-21)
- https://commons.apache.org/proper/commons-io/changes-report.html#a2.14.0 
(2023-09-24)

### Does this PR introduce _any_ user-facing change?

Yes, this is a dependency change.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45974 from dongjoon-hyun/SPARK-47790-3.5.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 8d2a54189edd..378cdb121150 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -43,7 +43,7 @@ commons-compiler/3.1.9//commons-compiler-3.1.9.jar
 commons-compress/1.23.0//commons-compress-1.23.0.jar
 commons-crypto/1.1.0//commons-crypto-1.1.0.jar
 commons-dbcp/1.4//commons-dbcp-1.4.jar
-commons-io/2.13.0//commons-io-2.13.0.jar
+commons-io/2.16.1//commons-io-2.16.1.jar
 commons-lang/2.6//commons-lang-2.6.jar
 commons-lang3/3.12.0//commons-lang3-3.12.0.jar
 commons-logging/1.1.3//commons-logging-1.1.3.jar
diff --git a/pom.xml b/pom.xml
index 2954fc0d97cb..34cbefbeb3f7 100644
--- a/pom.xml
+++ b/pom.xml
@@ -190,7 +190,7 @@
 3.0.3
 1.16.1
 1.23.0
-2.13.0
+2.16.1
 
 2.6
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47790][BUILD] Upgrade `commons-io` to 2.16.1

2024-04-10 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0d7c07047a62 [SPARK-47790][BUILD] Upgrade `commons-io` to 2.16.1
0d7c07047a62 is described below

commit 0d7c07047a628bd42eb53eb49935f5e3f81ea1a1
Author: Dongjoon Hyun 
AuthorDate: Wed Apr 10 00:22:40 2024 -0700

[SPARK-47790][BUILD] Upgrade `commons-io` to 2.16.1

### What changes were proposed in this pull request?

This PR aims to upgrade `commons-io` to `2.16.1`.

### Why are the changes needed?

`2.16.1` is a maintenance release after `2.16.0`.
- https://commons.apache.org/proper/commons-io/changes-report.html#a2.16.1 
(2024-04-04)

### Does this PR introduce _any_ user-facing change?

Yes, this is a dependency change.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45973 from dongjoon-hyun/SPARK-47790.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index e51943fb0afe..f0454184a5f1 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -43,7 +43,7 @@ commons-compiler/3.1.9//commons-compiler-3.1.9.jar
 commons-compress/1.26.1//commons-compress-1.26.1.jar
 commons-crypto/1.1.0//commons-crypto-1.1.0.jar
 commons-dbcp/1.4//commons-dbcp-1.4.jar
-commons-io/2.16.0//commons-io-2.16.0.jar
+commons-io/2.16.1//commons-io-2.16.1.jar
 commons-lang/2.6//commons-lang-2.6.jar
 commons-lang3/3.14.0//commons-lang3-3.14.0.jar
 commons-math3/3.6.1//commons-math3-3.6.1.jar
diff --git a/pom.xml b/pom.xml
index 7b588e80e7ce..95ed02d412a9 100644
--- a/pom.xml
+++ b/pom.xml
@@ -191,7 +191,7 @@
 3.0.3
 1.16.1
 1.26.1
-2.16.0
+2.16.1
 
 2.6
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47706][BUILD] Bump json4s 4.0.7

2024-04-09 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7996b031f7f2 [SPARK-47706][BUILD] Bump json4s 4.0.7
7996b031f7f2 is described below

commit 7996b031f7f227d0f4112104c6fc5ab160fc84c9
Author: Cheng Pan 
AuthorDate: Tue Apr 9 23:03:04 2024 -0700

[SPARK-47706][BUILD] Bump json4s 4.0.7

### What changes were proposed in this pull request?

Bump json4s from 3.7.0-M11 to 4.0.7

### Why are the changes needed?

4.0.7 is the latest stable version of json4s.

https://mvnrepository.com/artifact/org.json4s/json4s-jackson

### Does this PR introduce _any_ user-facing change?

No, all Mima complaints are private API.

### How was this patch tested?

Pass GHA.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45838 from pan3793/SPARK-47706.

Authored-by: Cheng Pan 
Signed-off-by: Dongjoon Hyun 
---
 .../org/apache/spark/sql/protobuf/ProtobufFunctionsSuite.scala   | 3 +--
 dev/deps/spark-deps-hadoop-3-hive-2.3| 9 +
 pom.xml  | 2 +-
 project/MimaExcludes.scala   | 6 +-
 4 files changed, 12 insertions(+), 8 deletions(-)

diff --git 
a/connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufFunctionsSuite.scala
 
b/connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufFunctionsSuite.scala
index fb8a68f1812b..5233e0688349 100644
--- 
a/connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufFunctionsSuite.scala
+++ 
b/connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufFunctionsSuite.scala
@@ -22,7 +22,6 @@ import java.time.Duration
 import scala.jdk.CollectionConverters._
 
 import com.google.protobuf.{Any => AnyProto, BoolValue, ByteString, 
BytesValue, DoubleValue, DynamicMessage, FloatValue, Int32Value, Int64Value, 
StringValue, UInt32Value, UInt64Value}
-import org.json4s.StringInput
 import org.json4s.jackson.JsonMethods
 
 import org.apache.spark.sql.{AnalysisException, Column, DataFrame, QueryTest, 
Row}
@@ -1339,7 +1338,7 @@ class ProtobufFunctionsSuite extends QueryTest with 
SharedSparkSession with Prot
 
 // Takes json string and return a json with all the extra whitespace 
removed.
 def compactJson(json: String): String = {
-  val jsonValue = JsonMethods.parse(StringInput(json))
+  val jsonValue = JsonMethods.parse(json)
   JsonMethods.compact(jsonValue)
 }
 
diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 0decaaea6fd6..e51943fb0afe 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -147,10 +147,11 @@ joda-time/2.12.7//joda-time-2.12.7.jar
 jodd-core/3.5.2//jodd-core-3.5.2.jar
 jpam/1.1//jpam-1.1.jar
 json/1.8//json-1.8.jar
-json4s-ast_2.13/3.7.0-M11//json4s-ast_2.13-3.7.0-M11.jar
-json4s-core_2.13/3.7.0-M11//json4s-core_2.13-3.7.0-M11.jar
-json4s-jackson_2.13/3.7.0-M11//json4s-jackson_2.13-3.7.0-M11.jar
-json4s-scalap_2.13/3.7.0-M11//json4s-scalap_2.13-3.7.0-M11.jar
+json4s-ast_2.13/4.0.7//json4s-ast_2.13-4.0.7.jar
+json4s-core_2.13/4.0.7//json4s-core_2.13-4.0.7.jar
+json4s-jackson-core_2.13/4.0.7//json4s-jackson-core_2.13-4.0.7.jar
+json4s-jackson_2.13/4.0.7//json4s-jackson_2.13-4.0.7.jar
+json4s-scalap_2.13/4.0.7//json4s-scalap_2.13-4.0.7.jar
 jsr305/3.0.0//jsr305-3.0.0.jar
 jta/1.1//jta-1.1.jar
 jul-to-slf4j/2.0.12//jul-to-slf4j-2.0.12.jar
diff --git a/pom.xml b/pom.xml
index 9202823b38c5..7b588e80e7ce 100644
--- a/pom.xml
+++ b/pom.xml
@@ -1119,7 +1119,7 @@
   
 org.json4s
 json4s-jackson_${scala.binary.version}
-3.7.0-M11
+4.0.7
 
   
 com.fasterxml.jackson.core
diff --git a/project/MimaExcludes.scala b/project/MimaExcludes.scala
index 4016c5f8b3e5..a907adea2c50 100644
--- a/project/MimaExcludes.scala
+++ b/project/MimaExcludes.scala
@@ -84,7 +84,11 @@ object MimaExcludes {
 
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.jdbc.MySQLDialect#MySQLSQLBuilder.this"),
 
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.jdbc.MySQLDialect#MySQLSQLQueryBuilder.this"),
 
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.jdbc.OracleDialect#OracleSQLBuilder.this"),
-
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.jdbc.OracleDialect#OracleSQLQueryBuilder.this")
+
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.jdbc.OracleDialect#OracleSQLQueryBuilder.this"),
+// SPARK-47706: Bump json4s

(spark) branch branch-3.5 updated: [SPARK-47083][BUILD] Upgrade `commons-codec` to 1.16.1

2024-04-09 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 7bec38751dac [SPARK-47083][BUILD] Upgrade `commons-codec` to 1.16.1
7bec38751dac is described below

commit 7bec38751dacd58d48e21f3f913a6fa8e21bd7d9
Author: panbingkun 
AuthorDate: Sun Feb 18 18:56:46 2024 -0800

[SPARK-47083][BUILD] Upgrade `commons-codec` to 1.16.1

The pr aims to upgrade `commons-codec` from `1.16.0` to `1.16.1`.

1.The new version brings some bug fixed, eg:
- Fix possible IndexOutOfBoundException in PhoneticEngine.encode method 
#223. Fixes [CODEC-315](https://issues.apache.org/jira/browse/CODEC-315)
- Fix possible IndexOutOfBoundsException in 
PercentCodec.insertAlwaysEncodeChars() method #222. Fixes 
[CODEC-314](https://issues.apache.org/jira/browse/CODEC-314).

2.The full release notes:

https://commons.apache.org/proper/commons-codec/changes-report.html#a1.16.1

No.

Pass GA.

No.

Closes #45152 from panbingkun/SPARK-47083.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index a070dcccd009..8d2a54189edd 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -36,7 +36,7 @@ cats-kernel_2.12/2.1.1//cats-kernel_2.12-2.1.1.jar
 chill-java/0.10.0//chill-java-0.10.0.jar
 chill_2.12/0.10.0//chill_2.12-0.10.0.jar
 commons-cli/1.5.0//commons-cli-1.5.0.jar
-commons-codec/1.16.0//commons-codec-1.16.0.jar
+commons-codec/1.16.1//commons-codec-1.16.1.jar
 commons-collections/3.2.2//commons-collections-3.2.2.jar
 commons-collections4/4.4//commons-collections4-4.4.jar
 commons-compiler/3.1.9//commons-compiler-3.1.9.jar
diff --git a/pom.xml b/pom.xml
index 5c1cd8d7f792..2954fc0d97cb 100644
--- a/pom.xml
+++ b/pom.xml
@@ -188,7 +188,7 @@
 
2.15.2
 1.1.10.5
 3.0.3
-1.16.0
+1.16.1
 1.23.0
 2.13.0
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.5 updated: [SPARK-47182][BUILD] Exclude `commons-(io|lang3)` transitive dependencies from `commons-compress` and `avro*`

2024-04-09 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 73e10b41b23d [SPARK-47182][BUILD] Exclude `commons-(io|lang3)` 
transitive dependencies from `commons-compress` and `avro*`
73e10b41b23d is described below

commit 73e10b41b23d08ac1abe5e0e25ba5167183feb15
Author: Dongjoon Hyun 
AuthorDate: Mon Feb 26 22:57:01 2024 -0800

[SPARK-47182][BUILD] Exclude `commons-(io|lang3)` transitive dependencies 
from `commons-compress` and `avro*`

### Why are the changes needed?

This PR aims to exclude `commons-(io|lang3)` transitive dependencies from 
`commons-compress`, `avro`, and `avro-mapred` dependencies.

### Does this PR introduce _any_ user-facing change?

Apache Spark define and use our own versions. The exclusion of the 
transitive dependencies will clarify that.


https://github.com/apache/spark/blob/1a408033daf458f1ceebbe14a560355a1a2c0a70/pom.xml#L198


https://github.com/apache/spark/blob/1a408033daf458f1ceebbe14a560355a1a2c0a70/pom.xml#L194

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45278 from dongjoon-hyun/SPARK-47182.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 pom.xml | 28 
 1 file changed, 28 insertions(+)

diff --git a/pom.xml b/pom.xml
index 965f88ee14d5..5c1cd8d7f792 100644
--- a/pom.xml
+++ b/pom.xml
@@ -625,6 +625,16 @@
 org.apache.commons
 commons-compress
 ${commons-compress.version}
+
+  
+commons-io
+commons-io
+  
+  
+org.apache.commons
+commons-lang3
+  
+
   
   
 org.apache.commons
@@ -1458,6 +1468,16 @@
 org.apache.avro
 avro
 ${avro.version}
+
+  
+commons-io
+commons-io
+  
+  
+org.apache.commons
+commons-lang3
+  
+
   
   
 org.apache.avro
@@ -1497,6 +1517,14 @@
 com.github.luben
 zstd-jni
   
+  
+commons-io
+commons-io
+  
+  
+org.apache.commons
+commons-lang3
+

(spark) branch master updated (0eb58a06a7d4 -> d1e1a1907412)

2024-04-09 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 0eb58a06a7d4 [SPARK-47785][BUILD][TESTS] Upgrade `bouncycastle` to 1.78
 add d1e1a1907412 [SPARK-47787][BUILD] Upgrade `commons-compress` to 1.26.1

No new revisions were added by this update.

Summary of changes:
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47785][BUILD][TESTS] Upgrade `bouncycastle` to 1.78

2024-04-09 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0eb58a06a7d4 [SPARK-47785][BUILD][TESTS] Upgrade `bouncycastle` to 1.78
0eb58a06a7d4 is described below

commit 0eb58a06a7d4c7ed95af6cbfd6be32ad49d48a7b
Author: Dongjoon Hyun 
AuthorDate: Tue Apr 9 12:52:57 2024 -0700

[SPARK-47785][BUILD][TESTS] Upgrade `bouncycastle` to 1.78

### What changes were proposed in this pull request?

This PR aims to upgrade `bouncycastle` to 1.78.

### Why are the changes needed?

To use the latest bug fixed version.
- https://github.com/bcgit/bc-java/blob/main/docs/releasenotes.html

### Does this PR introduce _any_ user-facing change?

No, this is a test dependency.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45962 from dongjoon-hyun/SPARK-47785.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pom.xml b/pom.xml
index f264b47d701b..b92ac576443d 100644
--- a/pom.xml
+++ b/pom.xml
@@ -214,7 +214,7 @@
 3.1.0
 1.1.0
 1.6.0
-1.77
+1.78
 1.13.0
 5.0.1
 4.1.108.Final


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (1586652b6421 -> 2793397140af)

2024-04-09 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 1586652b6421 [MINOR][PS][TESTS] Make `test_compare` deterministic
 add 2793397140af [SPARK-47782][BUILD] Remove redundant json4s-jackson 
definition in sql/api POM

No new revisions were added by this update.

Summary of changes:
 sql/api/pom.xml | 7 ---
 1 file changed, 7 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [MINOR][PS][TESTS] Make `test_compare` deterministic

2024-04-09 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1586652b6421 [MINOR][PS][TESTS] Make `test_compare` deterministic
1586652b6421 is described below

commit 1586652b64212c1e9f4bb5346a2d5f83a271883e
Author: Ruifeng Zheng 
AuthorDate: Tue Apr 9 07:45:05 2024 -0700

[MINOR][PS][TESTS] Make `test_compare` deterministic

### What changes were proposed in this pull request?
Make `test_compare` deterministic

### Why are the changes needed?
it fails in some env:
```
AssertionError: DataFrame.index are different
DataFrame.index values are different (80.0 %)
[left]:  Int64Index([3, 4, 5, 6, 7], dtype='int64')
[right]: Int64Index([4, 3, 7, 6, 5], dtype='int64')
```

### Does this PR introduce _any_ user-facing change?
no, test only

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #45959 from zhengruifeng/fix_test_compare_series.

Authored-by: Ruifeng Zheng 
Signed-off-by: Dongjoon Hyun 
---
 python/pyspark/pandas/tests/diff_frames_ops/test_compare_series.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/python/pyspark/pandas/tests/diff_frames_ops/test_compare_series.py 
b/python/pyspark/pandas/tests/diff_frames_ops/test_compare_series.py
index c548f8a2d32c..2befaa6ed950 100644
--- a/python/pyspark/pandas/tests/diff_frames_ops/test_compare_series.py
+++ b/python/pyspark/pandas/tests/diff_frames_ops/test_compare_series.py
@@ -131,7 +131,7 @@ class CompareSeriesMixin:
 )
 
 with ps.option_context("compute.eager_check", False):
-self.assert_eq(expected, psser1.compare(psser2))
+self.assert_eq(expected, psser1.compare(psser2).sort_index())
 
 
 class CompareSeriesTests(


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.4 updated: [SPARK-47774][INFRA][3.4] Remove redundant rules from `MimaExcludes`

2024-04-09 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 05f72fec83cf [SPARK-47774][INFRA][3.4] Remove redundant rules from 
`MimaExcludes`
05f72fec83cf is described below

commit 05f72fec83cf3201ca76f0a6410b0ec7d87aa204
Author: Dongjoon Hyun 
AuthorDate: Tue Apr 9 00:52:58 2024 -0700

[SPARK-47774][INFRA][3.4] Remove redundant rules from `MimaExcludes`

### What changes were proposed in this pull request?

This PR aims to remove redundant rules from `MimaExcludes` for Apache Spark 
3.4.x.

Previously, these rules were required due to the `dev/mima` limitation 
which is fixed at
- https://github.com/apache/spark/pull/45938

### Why are the changes needed?

To minimize the exclusion rules for Apache Spark 3.4.x by removing the 
rules related to the following `private class`.

- `DeployMessages`

https://github.com/apache/spark/blob/d3c75540788cf4ce86558feb38c197fdc1c8300e/core/src/main/scala/org/apache/spark/deploy/DeployMessage.scala#L34

- `ShuffleBlockFetcherIterator`

https://github.com/apache/spark/blob/d3c75540788cf4ce86558feb38c197fdc1c8300e/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala#L85-L86

- `BlockManagerMessages`

https://github.com/apache/spark/blob/d3c75540788cf4ce86558feb38c197fdc1c8300e/core/src/main/scala/org/apache/spark/storage/BlockManagerMessages.scala#L25

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45949 from dongjoon-hyun/SPARK-47774-3.4.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 project/MimaExcludes.scala | 21 -
 1 file changed, 21 deletions(-)

diff --git a/project/MimaExcludes.scala b/project/MimaExcludes.scala
index 0b0fdefd6b68..5e97a8d9551c 100644
--- a/project/MimaExcludes.scala
+++ b/project/MimaExcludes.scala
@@ -54,14 +54,6 @@ object MimaExcludes {
 
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.ml.classification.OneVsRest.extractInstances"),
 
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.ml.classification.OneVsRestModel.extractInstances"),
 
-// [SPARK-39703][SPARK-39062] Mima complains with Scala 2.13 for the 
changes in DeployMessages
-
ProblemFilters.exclude[MissingTypesProblem]("org.apache.spark.deploy.DeployMessages$LaunchExecutor$"),
-
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.deploy.DeployMessages#RequestExecutors.requestedTotal"),
-
ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.deploy.DeployMessages#RequestExecutors.copy"),
-
ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.deploy.DeployMessages#RequestExecutors.copy$default$2"),
-
ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.deploy.DeployMessages#RequestExecutors.this"),
-
ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.deploy.DeployMessages#RequestExecutors.apply"),
-
 // [SPARK-38679][CORE] Expose the number of partitions in a stage to 
TaskContext
 
ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.TaskContext.numPartitions"),
 
@@ -115,25 +107,12 @@ object MimaExcludes {
 
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages#Shutdown.productElementName"),
 
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages#Shutdown.productElementNames"),
 
-// [SPARK-40950][CORE] Fix isRemoteAddressMaxedOut performance overhead on 
scala 2.13
-
ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.storage.ShuffleBlockFetcherIterator#FetchRequest.blocks"),
-
ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.storage.ShuffleBlockFetcherIterator#FetchRequest.copy"),
-
ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.storage.ShuffleBlockFetcherIterator#FetchRequest.copy$default$2"),
-
ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.storage.ShuffleBlockFetcherIterator#FetchRequest.this"),
-
ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.storage.ShuffleBlockFetcherIterator#FetchRequest.apply"),
-
 // [SPARK-41072][SS] Add the error class STREAM_FAILED to 
StreamingQueryException
 
ProblemFilters.

(spark) branch branch-3.5 updated: [SPARK-47774][INFRA][3.5] Remove redundant rules from `MimaExcludes`

2024-04-09 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new d424a4bcf2a1 [SPARK-47774][INFRA][3.5] Remove redundant rules from 
`MimaExcludes`
d424a4bcf2a1 is described below

commit d424a4bcf2a1c79005e8d0489db2ba844de6fe06
Author: Dongjoon Hyun 
AuthorDate: Tue Apr 9 00:23:00 2024 -0700

[SPARK-47774][INFRA][3.5] Remove redundant rules from `MimaExcludes`

### What changes were proposed in this pull request?

This PR aims to remove redundant rules from `MimaExcludes` for Apache Spark 
3.5.x.

Previously, these rules were required due to the `dev/mima` limitation 
which is fixed at
- https://github.com/apache/spark/pull/45938

### Why are the changes needed?

To minimize the exclusion rules for Apache Spark 3.5.x by removing the 
rules related to the following `private class`.

- `HadoopFSUtils`

https://github.com/apache/spark/blob/f0752f2701b1b8d5fbc38912edd9cd9325693bef/core/src/main/scala/org/apache/spark/util/HadoopFSUtils.scala#L36

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45948 from dongjoon-hyun/SPARK-47774-3.5.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 project/MimaExcludes.scala | 5 -
 1 file changed, 5 deletions(-)

diff --git a/project/MimaExcludes.scala b/project/MimaExcludes.scala
index 376ddfde1b93..ae026165addc 100644
--- a/project/MimaExcludes.scala
+++ b/project/MimaExcludes.scala
@@ -41,11 +41,6 @@ object MimaExcludes {
 
ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.sql.types.SQLUserDefinedType"),
 // [SPARK-43165][SQL] Move canWrite to DataTypeUtils
 
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.types.DataType.canWrite"),
-// [SPARK-43195][CORE] Remove unnecessary serializable wrapper in 
HadoopFSUtils
-
ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.util.HadoopFSUtils$SerializableBlockLocation"),
-
ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.util.HadoopFSUtils$SerializableBlockLocation$"),
-
ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.util.HadoopFSUtils$SerializableFileStatus"),
-
ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.util.HadoopFSUtils$SerializableFileStatus$"),
 // [SPARK-43792][SQL][PYTHON][CONNECT] Add optional pattern for 
Catalog.listCatalogs
 
ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.sql.catalog.Catalog.listCatalogs"),
 // [SPARK-43881][SQL][PYTHON][CONNECT] Add optional pattern for 
Catalog.listDatabases


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47774][INFRA] Remove redundant rules from `MimaExcludes`

2024-04-09 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 992b69dbc327 [SPARK-47774][INFRA] Remove redundant rules from 
`MimaExcludes`
992b69dbc327 is described below

commit 992b69dbc3279824d7cc3b330a70a1bd5a7ab2b9
Author: Dongjoon Hyun 
AuthorDate: Tue Apr 9 00:15:43 2024 -0700

[SPARK-47774][INFRA] Remove redundant rules from `MimaExcludes`

### What changes were proposed in this pull request?

This PR aims to remove redundant rules from `MimaExcludes` for Apache Spark 
4.0.0.

Previously, these rules were required due to the `dev/mima` limitation 
which is fixed at
- https://github.com/apache/spark/pull/45938

### Why are the changes needed?

To minimize the exclusion rules for Apache Spark 4.0.0 by removing the 
following `private class` rules.

- `BasePythonRunner`

https://github.com/apache/spark/blob/319edfdc5cd6731d1d630a8beeea5b23a2326f07/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala#L102

- `CoarseGrainedClusterMessage`

https://github.com/apache/spark/blob/319edfdc5cd6731d1d630a8beeea5b23a2326f07/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedClusterMessage.scala#L30

- `MethodIdentifier`

https://github.com/apache/spark/blob/319edfdc5cd6731d1d630a8beeea5b23a2326f07/common/utils/src/main/scala/org/apache/spark/util/ClosureCleaner.scala#L1002

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45944 from dongjoon-hyun/SPARK-47774.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 project/MimaExcludes.scala | 6 --
 1 file changed, 6 deletions(-)

diff --git a/project/MimaExcludes.scala b/project/MimaExcludes.scala
index 630dd1d77cc7..4016c5f8b3e5 100644
--- a/project/MimaExcludes.scala
+++ b/project/MimaExcludes.scala
@@ -38,17 +38,11 @@ object MimaExcludes {
 // [SPARK-44863][UI] Add a button to download thread dump as a txt in 
Spark UI
 
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.status.api.v1.ThreadStackTrace.*"),
 
ProblemFilters.exclude[MissingTypesProblem]("org.apache.spark.status.api.v1.ThreadStackTrace$"),
-// [SPARK-44705][PYTHON] Make PythonRunner single-threaded
-
ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.api.python.BasePythonRunner#ReaderIterator.this"),
-// [SPARK-44198][CORE] Support propagation of the log level to the 
executors
-
ProblemFilters.exclude[MissingTypesProblem]("org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages$SparkAppConfig$"),
 //[SPARK-46399][Core] Add exit status to the Application End event for the 
use of Spark Listener
 
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.scheduler.SparkListenerApplicationEnd.*"),
 
ProblemFilters.exclude[MissingTypesProblem]("org.apache.spark.scheduler.SparkListenerApplicationEnd$"),
 // [SPARK-45427][CORE] Add RPC SSL settings to SSLOptions and 
SparkTransportConf
 
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.network.netty.SparkTransportConf.fromSparkConf"),
-// [SPARK-45136][CONNECT] Enhance ClosureCleaner with Ammonite support
-
ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.util.MethodIdentifier$"),
 // [SPARK-45022][SQL] Provide context for dataset API errors
 
ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.QueryContext.contextType"),
 
ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.QueryContext.code"),


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47754][SQL] Postgres: Support reading multidimensional arrays

2024-04-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 702180614900 [SPARK-47754][SQL] Postgres: Support reading 
multidimensional arrays
702180614900 is described below

commit 702180614900bdaf245a194da0043b8b51de3b4b
Author: Kent Yao 
AuthorDate: Mon Apr 8 22:45:21 2024 -0700

[SPARK-47754][SQL] Postgres: Support reading multidimensional arrays

### What changes were proposed in this pull request?

Because the ResultSetMetadata cannot distinguish a single-dimensional array 
from multidimensional arrays. Thus, we always read multidimensional arrays as 
single-dimensional ones, For example, `text[][]` is mapping to 
`ArrayType(StringType)` and `int[][][]` is `ArrayType(IntegerType)`, this 
result in errors when converting a ResultSet with multidimensional arrays to 
InternalRows.

This PR supports reading multidimensional arrays from PostgreSQL data 
sources. To achieve this, the simplest way is to add a new developer API to 
retrieve it from the information schema of Postgres.


https://www.postgresql.org/docs/16/catalog-pg-attribute.html#CATALOG-PG-ATTRIBUTE

It is possible to use functions like `array_dims` to retrieve the dimension 
of an array column, but it is not easy to inject without causing breaking 
changes or to determine the dimension based on the actual data.
### Why are the changes needed?

We have supported writing multidimensional arrays to Postgres, so we shall 
improve postgres reading abilities too.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

new tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #45917 from yaooqinn/SPARK-47754.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/jdbc/PostgresIntegrationSuite.scala  | 21 ++---
 .../sql/execution/datasources/jdbc/JDBCRDD.scala   |  2 +-
 .../sql/execution/datasources/jdbc/JdbcUtils.scala | 18 ---
 .../org/apache/spark/sql/jdbc/JdbcDialects.scala   | 16 ++
 .../apache/spark/sql/jdbc/PostgresDialect.scala| 36 +-
 .../org/apache/spark/sql/jdbc/JDBCSuite.scala  |  3 +-
 6 files changed, 76 insertions(+), 20 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
index 69573e9bddb1..1cd8a77e8442 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
@@ -23,7 +23,6 @@ import java.text.SimpleDateFormat
 import java.time.LocalDateTime
 import java.util.Properties
 
-import org.apache.spark.SparkException
 import org.apache.spark.sql.{Column, Row}
 import org.apache.spark.sql.catalyst.expressions.Literal
 import org.apache.spark.sql.types._
@@ -514,19 +513,17 @@ class PostgresIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 
 sql("select array(array(1, 2), array(3, 4)) as col0").write
   .jdbc(jdbcUrl, "double_dim_array", new Properties)
+
+checkAnswer(
+  spark.read.jdbc(jdbcUrl, "double_dim_array", new Properties),
+  Row(Seq(Seq(1, 2), Seq(3, 4
+
 sql("select array(array(array(1, 2), array(3, 4)), array(array(5, 6), 
array(7, 8))) as col0")
   .write.jdbc(jdbcUrl, "triple_dim_array", new Properties)
-// Reading multi-dimensional array is not supported yet.
-checkError(
-  exception = intercept[SparkException] {
-spark.read.jdbc(jdbcUrl, "double_dim_array", new Properties).collect()
-  },
-  errorClass = null)
-checkError(
-  exception = intercept[SparkException] {
-spark.read.jdbc(jdbcUrl, "triple_dim_array", new Properties).collect()
-  },
-  errorClass = null)
+
+checkAnswer(
+  spark.read.jdbc(jdbcUrl, "triple_dim_array", new Properties),
+  Row(Seq(Seq(Seq(1, 2), Seq(3, 4)), Seq(Seq(5, 6), Seq(7, 8)
   }
 
   test("SPARK-47701: Reading complex type") {
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala
index 7eff4bd376bc..8c430e231e39 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala
@@ -67,7 +67,7 @@ object

(spark) branch branch-3.4 updated: [SPARK-47770][INFRA] Fix `GenerateMIMAIgnore.isPackagePrivateModule` to return `false` instead of failing

2024-04-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new d3c75540788c [SPARK-47770][INFRA] Fix 
`GenerateMIMAIgnore.isPackagePrivateModule` to return `false` instead of failing
d3c75540788c is described below

commit d3c75540788cf4ce86558feb38c197fdc1c8300e
Author: Dongjoon Hyun 
AuthorDate: Tue Apr 9 11:14:49 2024 +0800

[SPARK-47770][INFRA] Fix `GenerateMIMAIgnore.isPackagePrivateModule` to 
return `false` instead of failing

### What changes were proposed in this pull request?

This PR aims to fix `GenerateMIMAIgnore.isPackagePrivateModule` to work 
correctly.

For example, `Metadata` is a case class inside package private 
`DefaultParamsReader` class. Currently, MIMA fails at this class analysis.


https://github.com/apache/spark/blob/f8e652e88320528a70e605a6a3cf986725e153a5/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala#L474-L485

The root cause is `isPackagePrivateModule` fails due to 
`scala.ScalaReflectionException`. We can simply make `isPackagePrivateModule` 
return `false`  instead of failing.
```
Error instrumenting 
class:org.apache.spark.ml.util.DefaultParamsReader$Metadata
Exception in thread "main" scala.ScalaReflectionException: type 
Serializable is not a class
at scala.reflect.api.Symbols$SymbolApi.asClass(Symbols.scala:284)
at scala.reflect.api.Symbols$SymbolApi.asClass$(Symbols.scala:284)
at 
scala.reflect.internal.Symbols$SymbolContextApiImpl.asClass(Symbols.scala:99)
at 
scala.reflect.runtime.JavaMirrors$JavaMirror.classToScala1(JavaMirrors.scala:1085)
at 
scala.reflect.runtime.JavaMirrors$JavaMirror.$anonfun$classToScala$1(JavaMirrors.scala:1040)
at 
scala.reflect.runtime.JavaMirrors$JavaMirror.$anonfun$toScala$1(JavaMirrors.scala:150)
at 
scala.reflect.runtime.TwoWayCaches$TwoWayCache.toScala(TwoWayCaches.scala:50)
at 
scala.reflect.runtime.JavaMirrors$JavaMirror.toScala(JavaMirrors.scala:148)
at 
scala.reflect.runtime.JavaMirrors$JavaMirror.classToScala(JavaMirrors.scala:1040)
at 
scala.reflect.runtime.JavaMirrors$JavaMirror.typeToScala(JavaMirrors.scala:1148)
at 
scala.reflect.runtime.JavaMirrors$JavaMirror$FromJavaClassCompleter.$anonfun$completeRest$2(JavaMirrors.scala:816)
at 
scala.reflect.runtime.JavaMirrors$JavaMirror$FromJavaClassCompleter.$anonfun$completeRest$1(JavaMirrors.scala:816)
at 
scala.reflect.runtime.JavaMirrors$JavaMirror$FromJavaClassCompleter.completeRest(JavaMirrors.scala:810)
at 
scala.reflect.runtime.JavaMirrors$JavaMirror$FromJavaClassCompleter.complete(JavaMirrors.scala:806)
at 
scala.reflect.internal.Symbols$Symbol.completeInfo(Symbols.scala:1575)
at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1538)
at 
scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$13.scala$reflect$runtime$SynchronizedSymbols$SynchronizedSymbol$$super$info(SynchronizedSymbols.scala:221)
at 
scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol.info(SynchronizedSymbols.scala:158)
at 
scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol.info$(SynchronizedSymbols.scala:158)
at 
scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$13.info(SynchronizedSymbols.scala:221)
at 
scala.reflect.internal.Symbols$Symbol.initialize(Symbols.scala:1733)
at 
scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol.privateWithin(SynchronizedSymbols.scala:109)
at 
scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol.privateWithin$(SynchronizedSymbols.scala:107)
at 
scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$13.privateWithin(SynchronizedSymbols.scala:221)
at 
scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$13.privateWithin(SynchronizedSymbols.scala:221)
at 
org.apache.spark.tools.GenerateMIMAIgnore$.isPackagePrivateModule(GenerateMIMAIgnore.scala:48)
at 
org.apache.spark.tools.GenerateMIMAIgnore$.$anonfun$privateWithin$1(GenerateMIMAIgnore.scala:67)
at scala.collection.immutable.List.foreach(List.scala:334)
at 
org.apache.spark.tools.GenerateMIMAIgnore$.privateWithin(GenerateMIMAIgnore.scala:61)
at 
org.apache.spark.tools.GenerateMIMAIgnore$.main(GenerateMIMAIgnore.scala:125)
at 
org.apache.spark.tools.GenerateMIMAIgnore.main(GenerateMIMAIgnore.scala)
```

### Why are the changes needed?

**BEFORE**
```
$ dev/mima | grep org.apache.spark.ml.util.DefaultParamsReader
Using SPARK_LOCAL_IP=localhost
Using SPAR

(spark) branch branch-3.5 updated: [SPARK-47770][INFRA] Fix `GenerateMIMAIgnore.isPackagePrivateModule` to return `false` instead of failing

2024-04-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new f0752f2701b1 [SPARK-47770][INFRA] Fix 
`GenerateMIMAIgnore.isPackagePrivateModule` to return `false` instead of failing
f0752f2701b1 is described below

commit f0752f2701b1b8d5fbc38912edd9cd9325693bef
Author: Dongjoon Hyun 
AuthorDate: Tue Apr 9 11:14:49 2024 +0800

[SPARK-47770][INFRA] Fix `GenerateMIMAIgnore.isPackagePrivateModule` to 
return `false` instead of failing

### What changes were proposed in this pull request?

This PR aims to fix `GenerateMIMAIgnore.isPackagePrivateModule` to work 
correctly.

For example, `Metadata` is a case class inside package private 
`DefaultParamsReader` class. Currently, MIMA fails at this class analysis.


https://github.com/apache/spark/blob/f8e652e88320528a70e605a6a3cf986725e153a5/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala#L474-L485

The root cause is `isPackagePrivateModule` fails due to 
`scala.ScalaReflectionException`. We can simply make `isPackagePrivateModule` 
return `false`  instead of failing.
```
Error instrumenting 
class:org.apache.spark.ml.util.DefaultParamsReader$Metadata
Exception in thread "main" scala.ScalaReflectionException: type 
Serializable is not a class
at scala.reflect.api.Symbols$SymbolApi.asClass(Symbols.scala:284)
at scala.reflect.api.Symbols$SymbolApi.asClass$(Symbols.scala:284)
at 
scala.reflect.internal.Symbols$SymbolContextApiImpl.asClass(Symbols.scala:99)
at 
scala.reflect.runtime.JavaMirrors$JavaMirror.classToScala1(JavaMirrors.scala:1085)
at 
scala.reflect.runtime.JavaMirrors$JavaMirror.$anonfun$classToScala$1(JavaMirrors.scala:1040)
at 
scala.reflect.runtime.JavaMirrors$JavaMirror.$anonfun$toScala$1(JavaMirrors.scala:150)
at 
scala.reflect.runtime.TwoWayCaches$TwoWayCache.toScala(TwoWayCaches.scala:50)
at 
scala.reflect.runtime.JavaMirrors$JavaMirror.toScala(JavaMirrors.scala:148)
at 
scala.reflect.runtime.JavaMirrors$JavaMirror.classToScala(JavaMirrors.scala:1040)
at 
scala.reflect.runtime.JavaMirrors$JavaMirror.typeToScala(JavaMirrors.scala:1148)
at 
scala.reflect.runtime.JavaMirrors$JavaMirror$FromJavaClassCompleter.$anonfun$completeRest$2(JavaMirrors.scala:816)
at 
scala.reflect.runtime.JavaMirrors$JavaMirror$FromJavaClassCompleter.$anonfun$completeRest$1(JavaMirrors.scala:816)
at 
scala.reflect.runtime.JavaMirrors$JavaMirror$FromJavaClassCompleter.completeRest(JavaMirrors.scala:810)
at 
scala.reflect.runtime.JavaMirrors$JavaMirror$FromJavaClassCompleter.complete(JavaMirrors.scala:806)
at 
scala.reflect.internal.Symbols$Symbol.completeInfo(Symbols.scala:1575)
at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1538)
at 
scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$13.scala$reflect$runtime$SynchronizedSymbols$SynchronizedSymbol$$super$info(SynchronizedSymbols.scala:221)
at 
scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol.info(SynchronizedSymbols.scala:158)
at 
scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol.info$(SynchronizedSymbols.scala:158)
at 
scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$13.info(SynchronizedSymbols.scala:221)
at 
scala.reflect.internal.Symbols$Symbol.initialize(Symbols.scala:1733)
at 
scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol.privateWithin(SynchronizedSymbols.scala:109)
at 
scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol.privateWithin$(SynchronizedSymbols.scala:107)
at 
scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$13.privateWithin(SynchronizedSymbols.scala:221)
at 
scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$13.privateWithin(SynchronizedSymbols.scala:221)
at 
org.apache.spark.tools.GenerateMIMAIgnore$.isPackagePrivateModule(GenerateMIMAIgnore.scala:48)
at 
org.apache.spark.tools.GenerateMIMAIgnore$.$anonfun$privateWithin$1(GenerateMIMAIgnore.scala:67)
at scala.collection.immutable.List.foreach(List.scala:334)
at 
org.apache.spark.tools.GenerateMIMAIgnore$.privateWithin(GenerateMIMAIgnore.scala:61)
at 
org.apache.spark.tools.GenerateMIMAIgnore$.main(GenerateMIMAIgnore.scala:125)
at 
org.apache.spark.tools.GenerateMIMAIgnore.main(GenerateMIMAIgnore.scala)
```

### Why are the changes needed?

**BEFORE**
```
$ dev/mima | grep org.apache.spark.ml.util.DefaultParamsReader
Using SPARK_LOCAL_IP=localhost
Using SPAR

(spark) branch master updated: [SPARK-47737][PYTHON] Bump PyArrow to 10.0.0

2024-04-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 04bf981781ba [SPARK-47737][PYTHON] Bump PyArrow to 10.0.0
04bf981781ba is described below

commit 04bf981781ba79d4b2d5a493ea32935eaa177709
Author: Haejoon Lee 
AuthorDate: Mon Apr 8 09:44:49 2024 -0700

[SPARK-47737][PYTHON] Bump PyArrow to 10.0.0

### What changes were proposed in this pull request?

This PR proposes to bump PyArrow version up to 10.0.0

### Why are the changes needed?

To leverage the new features from the latest version.

### Does this PR introduce _any_ user-facing change?

No API changes, but the PyArrow version from user-facing documentation will 
be changed.

### How was this patch tested?

The existing CI should pass.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45892 from itholic/bump_arrow_10.

Authored-by: Haejoon Lee 
Signed-off-by: Dongjoon Hyun 
---
 dev/create-release/spark-rm/Dockerfile | 2 +-
 python/docs/source/getting_started/install.rst | 2 +-
 python/docs/source/migration_guide/pyspark_upgrade.rst | 1 +
 python/docs/source/user_guide/sql/arrow_pandas.rst | 2 +-
 python/packaging/classic/setup.py  | 2 +-
 python/packaging/connect/setup.py  | 2 +-
 python/pyspark/sql/pandas/utils.py | 2 +-
 7 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/dev/create-release/spark-rm/Dockerfile 
b/dev/create-release/spark-rm/Dockerfile
index 2cd50999c4cc..f51b24d58394 100644
--- a/dev/create-release/spark-rm/Dockerfile
+++ b/dev/create-release/spark-rm/Dockerfile
@@ -37,7 +37,7 @@ ENV DEBCONF_NONINTERACTIVE_SEEN true
 # These arguments are just for reuse and not really meant to be customized.
 ARG APT_INSTALL="apt-get install --no-install-recommends -y"
 
-ARG PIP_PKGS="sphinx==4.5.0 mkdocs==1.1.2 numpy==1.20.3 
pydata_sphinx_theme==0.13.3 ipython==7.19.0 nbsphinx==0.8.0 numpydoc==1.1.0 
jinja2==3.1.2 twine==3.4.1 sphinx-plotly-directive==0.1.3 
sphinx-copybutton==0.5.2 pandas==1.5.3 pyarrow==3.0.0 plotly==5.4.0 
markupsafe==2.0.1 docutils<0.17 grpcio==1.62.0 protobuf==4.21.6 
grpcio-status==1.62.0 googleapis-common-protos==1.56.4"
+ARG PIP_PKGS="sphinx==4.5.0 mkdocs==1.1.2 numpy==1.20.3 
pydata_sphinx_theme==0.13.3 ipython==7.19.0 nbsphinx==0.8.0 numpydoc==1.1.0 
jinja2==3.1.2 twine==3.4.1 sphinx-plotly-directive==0.1.3 
sphinx-copybutton==0.5.2 pandas==1.5.3 pyarrow==10.0.1 plotly==5.4.0 
markupsafe==2.0.1 docutils<0.17 grpcio==1.62.0 protobuf==4.21.6 
grpcio-status==1.62.0 googleapis-common-protos==1.56.4"
 ARG GEM_PKGS="bundler:2.3.8"
 
 # Install extra needed repos and refresh.
diff --git a/python/docs/source/getting_started/install.rst 
b/python/docs/source/getting_started/install.rst
index 6aa89a689480..4c0551433d5a 100644
--- a/python/docs/source/getting_started/install.rst
+++ b/python/docs/source/getting_started/install.rst
@@ -157,7 +157,7 @@ PackageSupported version Note
 == = 
==
 `py4j` >=0.10.9.7Required
 `pandas`   >=1.4.4   Required for pandas API 
on Spark and Spark Connect; Optional for Spark SQL
-`pyarrow`  >=4.0.0   Required for pandas API 
on Spark and Spark Connect; Optional for Spark SQL
+`pyarrow`  >=10.0.0  Required for pandas API 
on Spark and Spark Connect; Optional for Spark SQL
 `numpy`>=1.21Required for pandas API 
on Spark and MLLib DataFrame-based API; Optional for Spark SQL
 `grpcio`   >=1.62.0  Required for Spark Connect
 `grpcio-status`>=1.62.0  Required for Spark Connect
diff --git a/python/docs/source/migration_guide/pyspark_upgrade.rst 
b/python/docs/source/migration_guide/pyspark_upgrade.rst
index 1ca5d7aad5d1..36c1eacaf2c7 100644
--- a/python/docs/source/migration_guide/pyspark_upgrade.rst
+++ b/python/docs/source/migration_guide/pyspark_upgrade.rst
@@ -25,6 +25,7 @@ Upgrading from PySpark 3.5 to 4.0
 * In Spark 4.0, it is recommended to use Pandas version 2.0.0 or above with 
PySpark for optimal compatibility.
 * In Spark 4.0, the minimum supported version for Pandas has been raised from 
1.0.5 to 1.4.4 in PySpark.
 * In Spark 4.0, the minimum supported version for Numpy has been raised from 
1.15 to 1.21 in PySpark.
+* In Spark 4.0, the minimum supported version for PyArrow has been raised from 
4.0.0 to 10

(spark) branch master updated: [SPARK-47725][INFRA] Set up the CI for pyspark-connect package

2024-04-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b1998bd4145d [SPARK-47725][INFRA] Set up the CI for pyspark-connect 
package
b1998bd4145d is described below

commit b1998bd4145d60d9ea3b569b64604a0881335b17
Author: Hyukjin Kwon 
AuthorDate: Mon Apr 8 09:39:06 2024 -0700

[SPARK-47725][INFRA] Set up the CI for pyspark-connect package

### What changes were proposed in this pull request?

This PR proposes to set up a scheduled job for `pyspark-connect` package. 
The CI:
1. Build Spark
2. Package `pyspark-connect` with test cases
3. Remove `python/lib/pyspark.zip` and `python/lib/py4j.zip` to make sure 
we don't use JVM
4. Run the test cases packaged together within `pyspark-connect`.

### Why are the changes needed?

In order to make sure on the feature coverage in `pyspark-connect`.

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

Manually tested in my fork, 
https://github.com/HyukjinKwon/spark/actions/runs/8598881063

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45870 from HyukjinKwon/do-not-merge-ci.

Authored-by: Hyukjin Kwon 
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/build_python_connect.yml | 96 ++
 python/packaging/connect/setup.py  | 16 +++-
 .../pyspark/pandas/data_type_ops/datetime_ops.py   |  3 +-
 python/pyspark/sql/connect/avro/functions.py   |  8 +-
 python/pyspark/sql/connect/catalog.py  |  5 +-
 python/pyspark/sql/connect/column.py   |  5 +-
 python/pyspark/sql/connect/conf.py |  5 +-
 python/pyspark/sql/connect/dataframe.py|  2 +-
 python/pyspark/sql/connect/functions/builtin.py|  3 +-
 .../pyspark/sql/connect/functions/partitioning.py  |  3 +-
 python/pyspark/sql/connect/group.py|  5 +-
 python/pyspark/sql/connect/observation.py  |  3 +-
 python/pyspark/sql/connect/protobuf/functions.py   |  8 +-
 python/pyspark/sql/connect/readwriter.py   |  3 +-
 python/pyspark/sql/connect/session.py  |  5 +-
 python/pyspark/sql/connect/streaming/query.py  |  2 +-
 python/pyspark/sql/connect/streaming/readwriter.py |  3 +-
 python/pyspark/sql/connect/window.py   |  5 +-
 python/pyspark/sql/session.py  |  2 +-
 .../sql/tests/connect/client/test_artifact.py  |  6 +-
 .../sql/tests/connect/client/test_reattach.py  |  2 +
 .../sql/tests/connect/test_connect_basic.py|  2 +
 .../sql/tests/connect/test_connect_function.py |  2 +
 .../sql/tests/connect/test_connect_session.py  |  6 +-
 .../sql/tests/connect/test_parity_udf_profiler.py  |  3 +
 .../pyspark/sql/tests/connect/test_parity_udtf.py  | 11 ++-
 python/pyspark/sql/tests/connect/test_resources.py | 14 ++--
 python/pyspark/sql/tests/test_arrow.py |  4 +-
 python/pyspark/sql/tests/test_udf.py   | 11 ++-
 python/pyspark/sql/tests/test_udtf.py  |  9 +-
 python/pyspark/tests/test_memory_profiler.py   |  4 +-
 31 files changed, 223 insertions(+), 33 deletions(-)

diff --git a/.github/workflows/build_python_connect.yml 
b/.github/workflows/build_python_connect.yml
new file mode 100644
index ..2f80eac9624f
--- /dev/null
+++ b/.github/workflows/build_python_connect.yml
@@ -0,0 +1,96 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+name: Build / Spark Connect Python-only (master, Python 3.11)
+
+on:
+  schedule:
+- cron: '0 19 * * *'
+
+jobs:
+  # Build: build Spark and run the tests for specified modules using SBT
+  build:
+name: "Build modules: pyspark-connect"
+runs-on: ubuntu-latest
+timeout-minutes: 300
+steps:
+  - name: Checkout Spark repository
+uses: actions/checkout@v4
+  - name: Cache Scala, SBT and Maven
+uses: actions/cache@v4
+

(spark) branch master updated: [SPARK-47709][BUILD] Upgrade tink to 1.13.0

2024-04-06 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 644687b66e1a [SPARK-47709][BUILD] Upgrade tink to 1.13.0
644687b66e1a is described below

commit 644687b66e1a62459f76db19b9a43f0b871a4291
Author: yangjie01 
AuthorDate: Sat Apr 6 10:59:06 2024 -0700

[SPARK-47709][BUILD] Upgrade tink to 1.13.0

### What changes were proposed in this pull request?
This pr aims to upgrade tink from 1.12.0 to 1.13.0.

### Why are the changes needed?
According to the release notes, the new version is 20% faster than the 
previous one in terms of `AES-GCM`
- AES-GCM is now about 20% faster.

The full release notes as follows:
- https://github.com/tink-crypto/tink-java/releases/tag/v1.13.0

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #45843 from LuciferYang/SPARK-47709.

Authored-by: yangjie01 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 6ca93894bd81..1cfda563a1d2 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -266,7 +266,7 @@ stax-api/1.0.1//stax-api-1.0.1.jar
 stream/2.9.6//stream-2.9.6.jar
 super-csv/2.2.0//super-csv-2.2.0.jar
 threeten-extra/1.7.1//threeten-extra-1.7.1.jar
-tink/1.12.0//tink-1.12.0.jar
+tink/1.13.0//tink-1.13.0.jar
 transaction-api/1.1//transaction-api-1.1.jar
 txw2/3.0.2//txw2-3.0.2.jar
 univocity-parsers/2.9.1//univocity-parsers-2.9.1.jar
diff --git a/pom.xml b/pom.xml
index 9b51548e1c0f..5bea02f22246 100644
--- a/pom.xml
+++ b/pom.xml
@@ -215,7 +215,7 @@
 1.1.0
 1.6.0
 1.77
-1.12.0
+1.13.0
 5.0.1
 4.1.108.Final
 2.0.65.Final


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47727][PYTHON] Make SparkConf to root level to for both SparkSession and SparkContext

2024-04-06 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 60a3fbcab4e5 [SPARK-47727][PYTHON] Make SparkConf to root level to for 
both SparkSession and SparkContext
60a3fbcab4e5 is described below

commit 60a3fbcab4e53f89b7128b7561d85a9d2aa76840
Author: Hyukjin Kwon 
AuthorDate: Sat Apr 6 10:54:27 2024 -0700

[SPARK-47727][PYTHON] Make SparkConf to root level to for both SparkSession 
and SparkContext

### What changes were proposed in this pull request?

This PR proposes to make SparkConf to root level to for both `SparkSession` 
and `SparkContext`.

### Why are the changes needed?

`SparkConf` is special. `SparkSession.builder.options` can take it as an 
option, and this instance can be created without JVM access. So it can be 
shared with pure Python `pysaprk-connect` package.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CI in this PR should verify them.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45873 from HyukjinKwon/SPARK-47727.

Lead-authored-by: Hyukjin Kwon 
Co-authored-by: Hyukjin Kwon 
Signed-off-by: Dongjoon Hyun 
---
 dev/sparktestsupport/modules.py  |  2 +-
 python/pyspark/__init__.py   |  5 ++---
 python/pyspark/{core => }/conf.py| 37 ++--
 python/pyspark/core/context.py   |  3 +--
 python/pyspark/sql/session.py| 10 -
 python/pyspark/sql/tests/test_session.py |  3 ---
 python/pyspark/testing/utils.py  |  6 +-
 7 files changed, 34 insertions(+), 32 deletions(-)

diff --git a/dev/sparktestsupport/modules.py b/dev/sparktestsupport/modules.py
index d3ffa79ebe68..701203414702 100644
--- a/dev/sparktestsupport/modules.py
+++ b/dev/sparktestsupport/modules.py
@@ -430,9 +430,9 @@ pyspark_core = Module(
 source_file_regexes=["python/(?!pyspark/(ml|mllib|sql|streaming))"],
 python_test_goals=[
 # doctests
+"pyspark.conf",
 "pyspark.core.rdd",
 "pyspark.core.context",
-"pyspark.core.conf",
 "pyspark.core.broadcast",
 "pyspark.accumulators",
 "pyspark.core.files",
diff --git a/python/pyspark/__init__.py b/python/pyspark/__init__.py
index 032da1857a87..15c21df0c6bf 100644
--- a/python/pyspark/__init__.py
+++ b/python/pyspark/__init__.py
@@ -53,20 +53,19 @@ from typing import cast, Any, Callable, TypeVar, Union
 from pyspark.util import is_remote_only
 
 if not is_remote_only():
-from pyspark.core.conf import SparkConf
 from pyspark.core.rdd import RDD, RDDBarrier
 from pyspark.core.files import SparkFiles
 from pyspark.core.status import StatusTracker, SparkJobInfo, SparkStageInfo
 from pyspark.core.broadcast import Broadcast
-from pyspark.core import conf, rdd, files, status, broadcast
+from pyspark.core import rdd, files, status, broadcast
 
 # for backward compatibility references.
-sys.modules["pyspark.conf"] = conf
 sys.modules["pyspark.rdd"] = rdd
 sys.modules["pyspark.files"] = files
 sys.modules["pyspark.status"] = status
 sys.modules["pyspark.broadcast"] = broadcast
 
+from pyspark.conf import SparkConf
 from pyspark.util import InheritableThread, inheritable_thread_target
 from pyspark.storagelevel import StorageLevel
 from pyspark.accumulators import Accumulator, AccumulatorParam
diff --git a/python/pyspark/core/conf.py b/python/pyspark/conf.py
similarity index 90%
rename from python/pyspark/core/conf.py
rename to python/pyspark/conf.py
index fe7879c3501b..ca03266a11c6 100644
--- a/python/pyspark/core/conf.py
+++ b/python/pyspark/conf.py
@@ -18,12 +18,14 @@
 __all__ = ["SparkConf"]
 
 import sys
-from typing import Dict, List, Optional, Tuple, cast, overload
-
-from py4j.java_gateway import JVMView, JavaObject
+from typing import Dict, List, Optional, Tuple, cast, overload, TYPE_CHECKING
 
+from pyspark.util import is_remote_only
 from pyspark.errors import PySparkRuntimeError
 
+if TYPE_CHECKING:
+from py4j.java_gateway import JVMView, JavaObject
+
 
 class SparkConf:
 """
@@ -60,11 +62,10 @@ class SparkConf:
 
 Examples
 
->>> from pyspark.core.conf import SparkConf
->>> from pyspark.core.context import SparkContext
+>>> from pyspark import SparkConf, SparkContext
 >>> conf = SparkConf()
 >>> conf.setMaster("local").setAppName("My app")
-
+
 >>> conf.get("spark.master")
 'local&#x

(spark) branch master updated: [SPARK-47738][BUILD] Upgrade Kafka to 3.7.0

2024-04-06 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d69df596e512 [SPARK-47738][BUILD] Upgrade Kafka to 3.7.0
d69df596e512 is described below

commit d69df596e5124ef9ea744549b21c28c9d1d00704
Author: panbingkun 
AuthorDate: Sat Apr 6 10:48:21 2024 -0700

[SPARK-47738][BUILD] Upgrade Kafka to 3.7.0

### What changes were proposed in this pull request?
The pr aims to upgrade `Kafka` from `3.6.1` to `3.7.0`.

### Why are the changes needed?
https://downloads.apache.org/kafka/3.7.0/RELEASE_NOTES.html

### Does this PR introduce _any_ user-facing change?
NO.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45893 from panbingkun/SPARK-47738.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 .../main/scala/org/apache/spark/sql/kafka010/ConsumerStrategy.scala| 3 +--
 .../scala/org/apache/spark/streaming/kafka010/ConsumerStrategy.scala   | 3 +--
 pom.xml| 2 +-
 3 files changed, 3 insertions(+), 5 deletions(-)

diff --git 
a/connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/ConsumerStrategy.scala
 
b/connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/ConsumerStrategy.scala
index 10d5062848b5..ab41e53d8ffb 100644
--- 
a/connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/ConsumerStrategy.scala
+++ 
b/connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/ConsumerStrategy.scala
@@ -24,7 +24,6 @@ import scala.jdk.CollectionConverters._
 
 import org.apache.kafka.clients.admin.Admin
 import org.apache.kafka.clients.consumer.{Consumer, KafkaConsumer}
-import 
org.apache.kafka.clients.consumer.internals.NoOpConsumerRebalanceListener
 import org.apache.kafka.common.TopicPartition
 
 import org.apache.spark.internal.Logging
@@ -127,7 +126,7 @@ private[kafka010] case class 
SubscribePatternStrategy(topicPattern: String)
   kafkaParams: ju.Map[String, Object]): Consumer[Array[Byte], Array[Byte]] 
= {
 val updatedKafkaParams = setAuthenticationConfigIfNeeded(kafkaParams)
 val consumer = new KafkaConsumer[Array[Byte], 
Array[Byte]](updatedKafkaParams)
-consumer.subscribe(ju.regex.Pattern.compile(topicPattern), new 
NoOpConsumerRebalanceListener())
+consumer.subscribe(ju.regex.Pattern.compile(topicPattern))
 consumer
   }
 
diff --git 
a/connector/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/ConsumerStrategy.scala
 
b/connector/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/ConsumerStrategy.scala
index a0b0e92666eb..693ddd31d9a8 100644
--- 
a/connector/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/ConsumerStrategy.scala
+++ 
b/connector/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/ConsumerStrategy.scala
@@ -23,7 +23,6 @@ import java.util.Locale
 import scala.jdk.CollectionConverters._
 
 import org.apache.kafka.clients.consumer._
-import 
org.apache.kafka.clients.consumer.internals.NoOpConsumerRebalanceListener
 import org.apache.kafka.common.TopicPartition
 
 import org.apache.spark.internal.Logging
@@ -147,7 +146,7 @@ private case class SubscribePattern[K, V](
   def onStart(currentOffsets: ju.Map[TopicPartition, jl.Long]): Consumer[K, V] 
= {
 val updatedKafkaParams = setAuthenticationConfigIfNeeded(kafkaParams)
 val consumer = new KafkaConsumer[K, V](updatedKafkaParams)
-consumer.subscribe(pattern, new NoOpConsumerRebalanceListener())
+consumer.subscribe(pattern)
 val toSeek = if (currentOffsets.isEmpty) {
   offsets
 } else {
diff --git a/pom.xml b/pom.xml
index 5fe86dd80b2a..9b51548e1c0f 100644
--- a/pom.xml
+++ b/pom.xml
@@ -137,7 +137,7 @@
 
 2.3
 
-3.6.1
+3.7.0
 
 10.16.1.1
 1.13.1


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.4 updated: [SPARK-45445][BUILD][3.4] Upgrade snappy to 1.1.10.5

2024-04-05 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 6ab31d45a730 [SPARK-45445][BUILD][3.4] Upgrade snappy to 1.1.10.5
6ab31d45a730 is described below

commit 6ab31d45a730e6d4ef5dfbaaefdd8c75f6c1f637
Author: panbingkun 
AuthorDate: Fri Apr 5 15:38:25 2024 -0700

[SPARK-45445][BUILD][3.4] Upgrade snappy to 1.1.10.5

### What changes were proposed in this pull request?

This is a backport of #43254.

The pr aims to upgrade snappy to 1.1.10.5.

### Why are the changes needed?
- Although the `1.1.10.4` version was upgraded approximately 2-3 weeks ago, 
the new version includes some bug fixes, eg:
https://github.com/apache/spark/assets/15246973/6c7f05f7-382f-4e82-bb68-22fc50895b94";>
- Full release notes: https://github.com/xerial/snappy-java/releases

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45902 from dongjoon-hyun/SPARK-45445-3.4.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-2-hive-2.3 | 2 +-
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-2-hive-2.3 
b/dev/deps/spark-deps-hadoop-2-hive-2.3
index a94fbcd0ca77..1db890613dc4 100644
--- a/dev/deps/spark-deps-hadoop-2-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-2-hive-2.3
@@ -248,7 +248,7 @@ scala-xml_2.12/2.1.0//scala-xml_2.12-2.1.0.jar
 shims/0.9.38//shims-0.9.38.jar
 slf4j-api/2.0.6//slf4j-api-2.0.6.jar
 snakeyaml/1.33//snakeyaml-1.33.jar
-snappy-java/1.1.10.3//snappy-java-1.1.10.3.jar
+snappy-java/1.1.10.5//snappy-java-1.1.10.5.jar
 spire-macros_2.12/0.17.0//spire-macros_2.12-0.17.0.jar
 spire-platform_2.12/0.17.0//spire-platform_2.12-0.17.0.jar
 spire-util_2.12/0.17.0//spire-util_2.12-0.17.0.jar
diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 99665da7d16a..1dd10ccfc218 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -235,7 +235,7 @@ scala-xml_2.12/2.1.0//scala-xml_2.12-2.1.0.jar
 shims/0.9.38//shims-0.9.38.jar
 slf4j-api/2.0.6//slf4j-api-2.0.6.jar
 snakeyaml/1.33//snakeyaml-1.33.jar
-snappy-java/1.1.10.3//snappy-java-1.1.10.3.jar
+snappy-java/1.1.10.5//snappy-java-1.1.10.5.jar
 spire-macros_2.12/0.17.0//spire-macros_2.12-0.17.0.jar
 spire-platform_2.12/0.17.0//spire-platform_2.12-0.17.0.jar
 spire-util_2.12/0.17.0//spire-util_2.12-0.17.0.jar
diff --git a/pom.xml b/pom.xml
index 7e07de2f0efc..2f080dbc24b6 100644
--- a/pom.xml
+++ b/pom.xml
@@ -185,7 +185,7 @@
 1.9.13
 2.14.2
 
2.14.2
-1.1.10.3
+1.1.10.5
 3.0.3
 1.15
 1.22


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.5 updated: [SPARK-45445][BUILD][3.5] Upgrade snappy to 1.1.10.5

2024-04-05 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new b2bfbc79f982 [SPARK-45445][BUILD][3.5] Upgrade snappy to 1.1.10.5
b2bfbc79f982 is described below

commit b2bfbc79f982ad2d2e694c43c79ab98256c507e0
Author: panbingkun 
AuthorDate: Fri Apr 5 14:51:18 2024 -0700

[SPARK-45445][BUILD][3.5] Upgrade snappy to 1.1.10.5

### What changes were proposed in this pull request?

This is a backport of #43254.

The pr aims to upgrade snappy to 1.1.10.5.

### Why are the changes needed?
- Although the `1.1.10.4` version was upgraded approximately 2-3 weeks ago, 
the new version includes some bug fixes, eg:
https://github.com/apache/spark/assets/15246973/6c7f05f7-382f-4e82-bb68-22fc50895b94";>
- Full release notes: https://github.com/xerial/snappy-java/releases

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45901 from dongjoon-hyun/SPARK-45445.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 1cd7d5a8f2d7..a070dcccd009 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -238,7 +238,7 @@ shims/0.9.45//shims-0.9.45.jar
 slf4j-api/2.0.7//slf4j-api-2.0.7.jar
 snakeyaml-engine/2.6//snakeyaml-engine-2.6.jar
 snakeyaml/2.0//snakeyaml-2.0.jar
-snappy-java/1.1.10.3//snappy-java-1.1.10.3.jar
+snappy-java/1.1.10.5//snappy-java-1.1.10.5.jar
 spire-macros_2.12/0.17.0//spire-macros_2.12-0.17.0.jar
 spire-platform_2.12/0.17.0//spire-platform_2.12-0.17.0.jar
 spire-util_2.12/0.17.0//spire-util_2.12-0.17.0.jar
diff --git a/pom.xml b/pom.xml
index 3dcaa2dd8ac7..965f88ee14d5 100644
--- a/pom.xml
+++ b/pom.xml
@@ -186,7 +186,7 @@
 1.9.13
 2.15.2
 
2.15.2
-1.1.10.3
+1.1.10.5
 3.0.3
 1.16.0
 1.23.0


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.4 updated: [SPARK-47111][SQL][TESTS][3.4] Upgrade `PostgreSQL` JDBC driver to 42.7.2 and docker image to 16.2

2024-04-05 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 2a453b1c469b [SPARK-47111][SQL][TESTS][3.4] Upgrade `PostgreSQL` JDBC 
driver to 42.7.2 and docker image to 16.2
2a453b1c469b is described below

commit 2a453b1c469bf469bacdfe854213319671231b50
Author: Dongjoon Hyun 
AuthorDate: Fri Apr 5 12:57:33 2024 -0700

[SPARK-47111][SQL][TESTS][3.4] Upgrade `PostgreSQL` JDBC driver to 42.7.2 
and docker image to 16.2

### What changes were proposed in this pull request?

This is a backport of #45191 .

This PR aims to upgrade `PostgreSQL` JDBC driver and docker images.
- JDBC Driver: `org.postgresql:postgresql` to 42.7.2
- Docker Image: `postgres` from `15.1-alpine` to `16.2-alpine`

### Why are the changes needed?

To use the latest PostgreSQL combination in the following integration tests.

- PostgresIntegrationSuite
- PostgresKrbIntegrationSuite
- v2/PostgresIntegrationSuite
- v2/PostgresNamespaceSuite

### Does this PR introduce _any_ user-facing change?

No. This is a pure test-environment update.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45900 from dongjoon-hyun/SPARK-47111-3.4.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala  | 6 +++---
 .../org/apache/spark/sql/jdbc/PostgresKrbIntegrationSuite.scala | 6 +++---
 .../org/apache/spark/sql/jdbc/v2/PostgresIntegrationSuite.scala | 6 +++---
 .../scala/org/apache/spark/sql/jdbc/v2/PostgresNamespaceSuite.scala | 6 +++---
 pom.xml | 2 +-
 5 files changed, 13 insertions(+), 13 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
index d3229ba50eca..de81c32a45ea 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
@@ -29,9 +29,9 @@ import org.apache.spark.sql.types.{ArrayType, DecimalType, 
FloatType, ShortType}
 import org.apache.spark.tags.DockerTest
 
 /**
- * To run this test suite for a specific version (e.g., postgres:15.1):
+ * To run this test suite for a specific version (e.g., postgres:16.2):
  * {{{
- *   ENABLE_DOCKER_INTEGRATION_TESTS=1 POSTGRES_DOCKER_IMAGE_NAME=postgres:15.1
+ *   ENABLE_DOCKER_INTEGRATION_TESTS=1 POSTGRES_DOCKER_IMAGE_NAME=postgres:16.2
  * ./build/sbt -Pdocker-integration-tests
  * "testOnly org.apache.spark.sql.jdbc.PostgresIntegrationSuite"
  * }}}
@@ -39,7 +39,7 @@ import org.apache.spark.tags.DockerTest
 @DockerTest
 class PostgresIntegrationSuite extends DockerJDBCIntegrationSuite {
   override val db = new DatabaseOnDocker {
-override val imageName = sys.env.getOrElse("POSTGRES_DOCKER_IMAGE_NAME", 
"postgres:15.1-alpine")
+override val imageName = sys.env.getOrElse("POSTGRES_DOCKER_IMAGE_NAME", 
"postgres:16.2-alpine")
 override val env = Map(
   "POSTGRES_PASSWORD" -> "rootpass"
 )
diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresKrbIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresKrbIntegrationSuite.scala
index 4debe24754de..667d8c778618 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresKrbIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresKrbIntegrationSuite.scala
@@ -25,9 +25,9 @@ import 
org.apache.spark.sql.execution.datasources.jdbc.connection.SecureConnecti
 import org.apache.spark.tags.DockerTest
 
 /**
- * To run this test suite for a specific version (e.g., postgres:15.1):
+ * To run this test suite for a specific version (e.g., postgres:16.2):
  * {{{
- *   ENABLE_DOCKER_INTEGRATION_TESTS=1 POSTGRES_DOCKER_IMAGE_NAME=postgres:15.1
+ *   ENABLE_DOCKER_INTEGRATION_TESTS=1 POSTGRES_DOCKER_IMAGE_NAME=postgres:16.2
  * ./build/sbt -Pdocker-integration-tests "testOnly 
*PostgresKrbIntegrationSuite"
  * }}}
  */
@@ -37,7 +37,7 @@ class PostgresKrbIntegrationSuite extends 
DockerKrbJDBCIntegrationSuite {
   override protected val keytabFileName = "postgres.keytab"
 
   override val db = new DatabaseOnDocker {
-override val im

(spark) branch branch-3.4 updated: [SPARK-46411][BUILD][3.4] Change to use `bcprov/bcpkix-jdk18on` for UT

2024-04-05 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 5f8a00b2c15d [SPARK-46411][BUILD][3.4] Change to use 
`bcprov/bcpkix-jdk18on` for UT
5f8a00b2c15d is described below

commit 5f8a00b2c15de5aa792492d418b616105be31cb5
Author: yangjie01 
AuthorDate: Fri Apr 5 11:55:07 2024 -0700

[SPARK-46411][BUILD][3.4] Change to use `bcprov/bcpkix-jdk18on` for UT

### What changes were proposed in this pull request?
This is a backport of https://github.com/apache/spark/pull/44359 .

This PR migrates the test dependency `bcprov/bcpkix` from `jdk15on` to 
`jdk18on`, and upgrades the version from 1.70 to 1.77, the `jdk18on` jars are 
compiled to work with anything from Java 1.8 up.

### Why are the changes needed?
The full release notes as follows:
- https://www.bouncycastle.org/releasenotes.html#r1rv77

### Does this PR introduce _any_ user-facing change?
No, just for test.

### How was this patch tested?
Pass GitHub Actions.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #45898 from dongjoon-hyun/SPARK-46411-3.4.

Authored-by: yangjie01 
Signed-off-by: Dongjoon Hyun 
---
 pom.xml| 6 +++---
 resource-managers/yarn/pom.xml | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/pom.xml b/pom.xml
index e347a592cb17..4a220a32685c 100644
--- a/pom.xml
+++ b/pom.xml
@@ -213,7 +213,7 @@
 1.8
 1.1.0
 1.5.0
-1.70
+1.77
 1.7.0
 4.1.87.Final
 
 
   org.bouncycastle
-  bcprov-jdk15on
+  bcprov-jdk18on
   test
 
 
   org.bouncycastle
-  bcpkix-jdk15on
+  bcpkix-jdk18on
   test
 
   


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.5 updated: [SPARK-47111][SQL][TESTS][3.5] Upgrade `PostgreSQL` JDBC driver to 42.7.2 and docker image to 16.2

2024-04-05 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 44cc67626969 [SPARK-47111][SQL][TESTS][3.5] Upgrade `PostgreSQL` JDBC 
driver to 42.7.2 and docker image to 16.2
44cc67626969 is described below

commit 44cc67626969f6ce4e4616d7bfd1dba5a3a53473
Author: Dongjoon Hyun 
AuthorDate: Fri Apr 5 11:56:34 2024 -0700

[SPARK-47111][SQL][TESTS][3.5] Upgrade `PostgreSQL` JDBC driver to 42.7.2 
and docker image to 16.2

### What changes were proposed in this pull request?

This PR aims to upgrade `PostgreSQL` JDBC driver and docker images.
- JDBC Driver: `org.postgresql:postgresql` from 42.7.0 to 42.7.2
- Docker Image: `postgres` from `15.1-alpine` to `16.2-alpine`

### Why are the changes needed?

To use the latest PostgreSQL combination in the following integration tests.

- PostgresIntegrationSuite
- PostgresKrbIntegrationSuite
- v2/PostgresIntegrationSuite
- v2/PostgresNamespaceSuite

### Does this PR introduce _any_ user-facing change?

No. This is a pure test-environment update.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45899 from dongjoon-hyun/SPARK-47111.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala  | 6 +++---
 .../org/apache/spark/sql/jdbc/PostgresKrbIntegrationSuite.scala | 6 +++---
 .../org/apache/spark/sql/jdbc/v2/PostgresIntegrationSuite.scala | 6 +++---
 .../scala/org/apache/spark/sql/jdbc/v2/PostgresNamespaceSuite.scala | 6 +++---
 pom.xml | 2 +-
 5 files changed, 13 insertions(+), 13 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
index e910402e05e7..23fbf39db3be 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
@@ -30,9 +30,9 @@ import org.apache.spark.sql.types._
 import org.apache.spark.tags.DockerTest
 
 /**
- * To run this test suite for a specific version (e.g., postgres:15.1):
+ * To run this test suite for a specific version (e.g., postgres:16.2):
  * {{{
- *   ENABLE_DOCKER_INTEGRATION_TESTS=1 POSTGRES_DOCKER_IMAGE_NAME=postgres:15.1
+ *   ENABLE_DOCKER_INTEGRATION_TESTS=1 POSTGRES_DOCKER_IMAGE_NAME=postgres:16.2
  * ./build/sbt -Pdocker-integration-tests
  * "testOnly org.apache.spark.sql.jdbc.PostgresIntegrationSuite"
  * }}}
@@ -40,7 +40,7 @@ import org.apache.spark.tags.DockerTest
 @DockerTest
 class PostgresIntegrationSuite extends DockerJDBCIntegrationSuite {
   override val db = new DatabaseOnDocker {
-override val imageName = sys.env.getOrElse("POSTGRES_DOCKER_IMAGE_NAME", 
"postgres:15.1-alpine")
+override val imageName = sys.env.getOrElse("POSTGRES_DOCKER_IMAGE_NAME", 
"postgres:16.2-alpine")
 override val env = Map(
   "POSTGRES_PASSWORD" -> "rootpass"
 )
diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresKrbIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresKrbIntegrationSuite.scala
index 4debe24754de..667d8c778618 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresKrbIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresKrbIntegrationSuite.scala
@@ -25,9 +25,9 @@ import 
org.apache.spark.sql.execution.datasources.jdbc.connection.SecureConnecti
 import org.apache.spark.tags.DockerTest
 
 /**
- * To run this test suite for a specific version (e.g., postgres:15.1):
+ * To run this test suite for a specific version (e.g., postgres:16.2):
  * {{{
- *   ENABLE_DOCKER_INTEGRATION_TESTS=1 POSTGRES_DOCKER_IMAGE_NAME=postgres:15.1
+ *   ENABLE_DOCKER_INTEGRATION_TESTS=1 POSTGRES_DOCKER_IMAGE_NAME=postgres:16.2
  * ./build/sbt -Pdocker-integration-tests "testOnly 
*PostgresKrbIntegrationSuite"
  * }}}
  */
@@ -37,7 +37,7 @@ class PostgresKrbIntegrationSuite extends 
DockerKrbJDBCIntegrationSuite {
   override protected val keytabFileName = "postgres.keytab"
 
   override val db = new DatabaseOnDocker {
-override val imageName = sys.env.getOrElse("POSTGRES_DOCKER_IMAGE_NAME", 
"p

(spark) branch branch-3.5 updated: [SPARK-46411][BUILD][3.5][FOLLOWUP] Fix pom.xml file in common/network-common module

2024-04-05 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new d2cee4d81b2f [SPARK-46411][BUILD][3.5][FOLLOWUP] Fix pom.xml file in 
common/network-common module
d2cee4d81b2f is described below

commit d2cee4d81b2f75de008817409ca716364356c45c
Author: Dongjoon Hyun 
AuthorDate: Fri Apr 5 10:46:41 2024 -0700

[SPARK-46411][BUILD][3.5][FOLLOWUP] Fix pom.xml file in 
common/network-common module

### What changes were proposed in this pull request?

This PR aims to fix `common/network-common/pom.xml`.

### Why are the changes needed?

To fix the cherry-pick mistake.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45897 from dongjoon-hyun/SPARK-46411.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 common/network-common/pom.xml | 13 -
 1 file changed, 13 deletions(-)

diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 3916cf16f1de..27a53b0f9f3b 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -147,19 +147,6 @@
   log4j-slf4j2-impl
   test
 
-<<<<<<< HEAD
-===
-
-  org.bouncycastle
-  bcprov-jdk18on
-  test
-
-
-  org.bouncycastle
-  bcpkix-jdk18on
-  test
-
->>>>>>> 9a75a1d69aa ([SPARK-46411][BUILD] Change to use 
`bcprov/bcpkix-jdk18on` for UT)
 
 
   org.apache.spark


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.5 updated: [SPARK-46411][BUILD] Change to use `bcprov/bcpkix-jdk18on` for UT

2024-04-05 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new dec1e74c24fc [SPARK-46411][BUILD] Change to use 
`bcprov/bcpkix-jdk18on` for UT
dec1e74c24fc is described below

commit dec1e74c24fc07b171c0878815c9e97820889f40
Author: yangjie01 
AuthorDate: Mon Dec 18 12:19:20 2023 -0800

[SPARK-46411][BUILD] Change to use `bcprov/bcpkix-jdk18on` for UT

This PR migrates the test dependency `bcprov/bcpkix` from `jdk15on` to 
`jdk18on`, and upgrades the version from 1.70 to 1.77, the `jdk18on` jars are 
compiled to work with anything from Java 1.8 up.

The full release notes as follows:
- https://www.bouncycastle.org/releasenotes.html#r1rv77

No, just for test.

Pass GitHub Actions.

No

Closes #44359 from LuciferYang/bouncycastle-177.

Lead-authored-by: yangjie01 
Co-authored-by: YangJie 
Signed-off-by: Dongjoon Hyun 
---
 common/network-common/pom.xml  | 13 +
 pom.xml|  6 +++---
 resource-managers/yarn/pom.xml |  4 ++--
 3 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 27a53b0f9f3b..3916cf16f1de 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -147,6 +147,19 @@
   log4j-slf4j2-impl
   test
 
+<<<<<<< HEAD
+===
+
+  org.bouncycastle
+  bcprov-jdk18on
+  test
+
+
+  org.bouncycastle
+  bcpkix-jdk18on
+  test
+
+>>>>>>> 9a75a1d69aa ([SPARK-46411][BUILD] Change to use 
`bcprov/bcpkix-jdk18on` for UT)
 
 
   org.apache.spark
diff --git a/pom.xml b/pom.xml
index 269a42d41f17..b579ed31612e 100644
--- a/pom.xml
+++ b/pom.xml
@@ -218,7 +218,7 @@
 3.1.0
 1.1.0
 1.5.0
-1.70
+1.77
 1.9.0
 4.1.96.Final
 
 
   org.bouncycastle
-  bcprov-jdk15on
+  bcprov-jdk18on
   test
 
 
   org.bouncycastle
-  bcpkix-jdk15on
+  bcpkix-jdk18on
   test
 
   


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.4 updated: [SPARK-44441][BUILD] Upgrade `bcprov-jdk15on` and `bcpkix-jdk15on` to 1.70

2024-04-05 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new d9a6e5d45607 [SPARK-1][BUILD] Upgrade `bcprov-jdk15on` and 
`bcpkix-jdk15on` to 1.70
d9a6e5d45607 is described below

commit d9a6e5d456078868a9794b27ff971f63dd502ef1
Author: yangjie01 
AuthorDate: Sat Jul 15 12:17:07 2023 -0500

[SPARK-1][BUILD] Upgrade `bcprov-jdk15on` and `bcpkix-jdk15on` to 1.70

This pr aims to upgrade `bcprov-jdk15on` and `bcpkix-jdk15on`  from 1.60 to 
1.70

The new version fixed 
[CVE-2020-15522](https://github.com/bcgit/bc-java/wiki/CVE-2020-15522).

The release notes as follows:
- https://www.bouncycastle.org/releasenotes.html#r1rv70

No, just upgrade test dependency

Pass Git Hub Actions

Closes #42015 from LuciferYang/SPARK-1.

Authored-by: yangjie01 
Signed-off-by: Sean Owen 
---
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pom.xml b/pom.xml
index 77218d162c41..e347a592cb17 100644
--- a/pom.xml
+++ b/pom.xml
@@ -213,7 +213,7 @@
 1.8
 1.1.0
 1.5.0
-1.60
+1.70
 1.7.0
 4.1.87.Final

(spark) branch branch-3.4 updated: [SPARK-44393][BUILD] Upgrade `H2` from 2.1.214 to 2.2.220

2024-04-05 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new e5293c7c6add [SPARK-44393][BUILD] Upgrade `H2` from 2.1.214 to 2.2.220
e5293c7c6add is described below

commit e5293c7c6add337f8222558cca61ff55c96c526c
Author: Bjørn Jørgensen 
AuthorDate: Thu Jul 13 10:31:44 2023 +0900

[SPARK-44393][BUILD] Upgrade `H2` from 2.1.214 to 2.2.220

### What changes were proposed in this pull request?
Upgrade H2 from 2.1.214 to 2.2.220

[Changelog](https://www.h2database.com/html/changelog.html)

### Why are the changes needed?
[CVE-2022-45868](https://nvd.nist.gov/vuln/detail/CVE-2022-45868)

The following change in the release note fixes the CVE.

[581ed18](https://github.com/h2database/h2database/commit/581ed18ff9d6b3761d851620ed88a3994a351a0d)
 Merge pull request 
[#3833](https://redirect.github.com/h2database/h2database/issues/3833) from 
katzyn/password

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA

Closes #41963 from bjornjorgensen/h2-2.2.220.

Authored-by: Bjørn Jørgensen 
Signed-off-by: Hyukjin Kwon 
---
 connector/connect/server/pom.xml | 2 +-
 sql/core/pom.xml | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/connector/connect/server/pom.xml b/connector/connect/server/pom.xml
index 40d05f585ae2..c6e1e99bc2c5 100644
--- a/connector/connect/server/pom.xml
+++ b/connector/connect/server/pom.xml
@@ -218,7 +218,7 @@
 
   com.h2database
   h2
-  2.1.214
+  2.2.220
   test
 
   
diff --git a/sql/core/pom.xml b/sql/core/pom.xml
index d44ac00f9734..b6e8aa4195ef 100644
--- a/sql/core/pom.xml
+++ b/sql/core/pom.xml
@@ -160,7 +160,7 @@
 
   com.h2database
   h2
-  2.1.214
+  2.2.220
   test
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47511][SQL][FOLLOWUP] Rename the config REPLACE_NULLIF_USING_WITH_EXPR to be more general

2024-04-05 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6bd0ccf36dd2 [SPARK-47511][SQL][FOLLOWUP] Rename the config 
REPLACE_NULLIF_USING_WITH_EXPR to be more general
6bd0ccf36dd2 is described below

commit 6bd0ccf36dd23eb1887b73a898021a14baf1f2eb
Author: Wenchen Fan 
AuthorDate: Fri Apr 5 07:47:39 2024 -0700

[SPARK-47511][SQL][FOLLOWUP] Rename the config 
REPLACE_NULLIF_USING_WITH_EXPR to be more general

### What changes were proposed in this pull request?

This is a follow-up of
- #45649

`With` is not only used by `NullIf`, but also `Between`. This PR renames 
the config `REPLACE_NULLIF_USING_WITH_EXPR` to be more general, and use it to 
control `Between` as well.

### Why are the changes needed?

have a conf to control all the usages of `With`.

### Does this PR introduce _any_ user-facing change?

no, not released yet

### How was this patch tested?

existing tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #45871 from cloud-fan/with-conf.

Lead-authored-by: Wenchen Fan 
Co-authored-by: Wenchen Fan 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/catalyst/expressions/Between.scala  | 19 ---
 .../sql/catalyst/expressions/nullExpressions.scala|  2 +-
 .../scala/org/apache/spark/sql/internal/SQLConf.scala | 12 +++-
 3 files changed, 20 insertions(+), 13 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Between.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Between.scala
index e5bb31bc34f1..de1122da646b 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Between.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Between.scala
@@ -17,6 +17,8 @@
 
 package org.apache.spark.sql.catalyst.expressions
 
+import org.apache.spark.sql.internal.SQLConf
+
 // scalastyle:off line.size.limit
 @ExpressionDescription(
   usage = "Usage: input [NOT] BETWEEN lower AND upper - evaluate if `input` is 
[not] in between `lower` and `upper`",
@@ -36,13 +38,16 @@ package org.apache.spark.sql.catalyst.expressions
 case class Between private(input: Expression, lower: Expression, upper: 
Expression, replacement: Expression)
   extends RuntimeReplaceable with InheritAnalysisRules  {
   def this(input: Expression, lower: Expression, upper: Expression) = {
-this(input, lower, upper, {
-  val commonExpr = CommonExpressionDef(input)
-  val ref = new CommonExpressionRef(commonExpr)
-  val replacement = And(GreaterThanOrEqual(ref, lower), 
LessThanOrEqual(ref, upper))
-  With(replacement, Seq(commonExpr))
-})
-  };
+this(input, lower, upper,
+  if (!SQLConf.get.getConf(SQLConf.ALWAYS_INLINE_COMMON_EXPR)) {
+With(input) { case Seq(ref) =>
+  And(GreaterThanOrEqual(ref, lower), LessThanOrEqual(ref, upper))
+}
+  } else {
+And(GreaterThanOrEqual(input, lower), LessThanOrEqual(input, upper))
+  }
+)
+  }
 
   override def parameters: Seq[Expression] = Seq(input, lower, upper)
 
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala
index 9b51d6be4c50..010d79f808d1 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala
@@ -160,7 +160,7 @@ case class NullIf(left: Expression, right: Expression, 
replacement: Expression)
 
   def this(left: Expression, right: Expression) = {
 this(left, right,
-  if (SQLConf.get.getConf(SQLConf.REPLACE_NULLIF_USING_WITH_EXPR)) {
+  if (!SQLConf.get.getConf(SQLConf.ALWAYS_INLINE_COMMON_EXPR)) {
 With(left) { case Seq(ref) =>
   If(EqualTo(ref, right), Literal.create(null, left.dataType), ref)
 }
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 23b0778162d6..71256c6c65fc 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -3399,13 +3399,15 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
-  val REPLACE_NULLIF_USING_WITH_EXPR =
-buildConf("spark.databricks.sql.replaceNullIfUsingWithExpr")
+  val ALWAYS_INLINE_COMMON_EXPR =
+buildConf("spark.sql.alwaysInlineCommonExpr")
   .inter

(spark) branch master updated (97e63ff035b5 -> 12d0367cd304)

2024-04-05 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 97e63ff035b5 [SPARK-47735][PYTHON][TESTS] Make 
pyspark.testing.connectutils compatible with pyspark-connect
 add 12d0367cd304 [SPARK-47724][PYTHON][TESTS][FOLLOW-UP] Make testing 
script to inherits SPARK_CONNECT_TESTING_REMOTE env

No new revisions were added by this update.

Summary of changes:
 python/run-tests.py | 3 +++
 1 file changed, 3 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (aeb082e06091 -> 97e63ff035b5)

2024-04-05 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from aeb082e06091 [SPARK-47081][CONNECT][TESTS][FOLLOW-UP] Skip the flaky 
doctests for now
 add 97e63ff035b5 [SPARK-47735][PYTHON][TESTS] Make 
pyspark.testing.connectutils compatible with pyspark-connect

No new revisions were added by this update.

Summary of changes:
 python/pyspark/testing/connectutils.py | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark-website) branch asf-site updated: Update spark connect page based on pr feedback (#512)

2024-04-04 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new d60afe5997 Update spark connect page based on pr feedback (#512)
d60afe5997 is described below

commit d60afe59979f3a8b728e4838acb624a9d3b37722
Author: Matthew Powers 
AuthorDate: Thu Apr 4 16:40:02 2024 -0400

Update spark connect page based on pr feedback (#512)

Addresses the comments from this PR: 
https://github.com/apache/spark-website/pull/511

Also rewords some of the language.
---
 _layouts/home.html|  1 +
 site/index.html   |  1 +
 site/spark-connect/index.html | 42 --
 spark-connect/index.md| 42 --
 4 files changed, 42 insertions(+), 44 deletions(-)

diff --git a/_layouts/home.html b/_layouts/home.html
index 8bae9e1680..ad8f255b7c 100644
--- a/_layouts/home.html
+++ b/_layouts/home.html
@@ -73,6 +73,7 @@
 
 
   SQL and 
DataFrames
+  Spark Connect
   Spark Streaming
   MLlib 
(machine learning)
   GraphX 
(graph)
diff --git a/site/index.html b/site/index.html
index d2abd82fb3..3e6072f1d7 100644
--- a/site/index.html
+++ b/site/index.html
@@ -69,6 +69,7 @@
 
 
   SQL and DataFrames
+  Spark 
Connect
   Spark 
Streaming
   MLlib (machine 
learning)
   GraphX (graph)
diff --git a/site/spark-connect/index.html b/site/spark-connect/index.html
index 6dc75abe37..1877c0f32d 100644
--- a/site/spark-connect/index.html
+++ b/site/spark-connect/index.html
@@ -142,7 +142,7 @@
 
   
 
-  This post explains the Spark Connect architecture, the benefits of 
Spark Connect, and how to upgrade to Spark Connect.
+  This page explains the Spark Connect architecture, the benefits of 
Spark Connect, and how to upgrade to Spark Connect.
 
 Let’s start by exploring the architecture of Spark Connect at a high 
level.
 
@@ -154,9 +154,9 @@
 
 
   A connection is established between the Client and Spark Server
-  The Client converts a DataFrame query to an unresolved logical plan
+  The Client converts a DataFrame query to an unresolved logical plan that 
describes the intent of the operation rather than how it should be executed
   The unresolved logical plan is encoded and sent to the Spark Server
-  The Spark Server runs the query
+  The Spark Server optimizes and runs the query
   The Spark Server sends the results back to the Client
 
 
@@ -164,11 +164,11 @@
 
 Let’s go through these steps in more detail to get a better understanding 
of the inner workings of Spark Connect.
 
-Establishing a connection between the Client and Spark 
Server
+Establishing a connection between the Client and the Spark 
Server
 
 The network communication for Spark Connect uses the https://grpc.io/";>gRPC framework.
 
-gRPC is performant and language agnostic.  Spark Connect uses 
language-agnostic technologies, so it’s portable.
+gRPC is performant and language agnostic which makes Spark Connect 
portable.
 
 Converting a DataFrame query to an unresolved logical 
plan
 
@@ -182,7 +182,7 @@
 GlobalLimit 5
 +- LocalLimit 5
+- SubqueryAlias spark_catalog.default.some_table
-  +- Relation spark_catalog.default.some_table[character#15,franchise#16] 
parquet
+  +- UnresolvedRelation spark_catalog.default.some_table
 
 
 The Client is responsible for creating the unresolved logical plan and 
passing it to the Spark Server for execution.
@@ -197,9 +197,9 @@ GlobalLimit 5
 
 Executing the query on the Spark Server
 
-The Spark Server receives the unresolved logical plan (once the Protocol 
Buffer is deserialized) and executes it just like any other query.
+The Spark Server receives the unresolved logical plan (once the Protocol 
Buffer is deserialized) and analyzes, optimizes, and executes it just like any 
other query.
 
-Spark performs many optimizations to an unresolved logical plan before 
executing the query.  All of these optimizations happen on the Spark Server.
+Spark performs many optimizations to an unresolved logical plan before 
executing the query.  All of these optimizations happen on the Spark Server and 
are independent of the client application.
 
 Spark Connect lets you leverage Spark’s powerful query optimization 
capabilities, even with Clients that don’t depend on Spark or the JVM.
 
@@ -207,9 +207,9 @@ GlobalLimit 5
 
 The Spark Server sends the results back to the Client after executing the 
query.
 
-The results are sent to the client as Apache Arrow record batches.  A 
record batch includes many rows of data.
+The results are sent to the client as Apache Arrow record batches.  A 
single record batch includes many rows of data.
 
-The record batch is streamed to the client, which means it is

(spark) branch master updated: [SPARK-46812][PYTHON][TESTS][FOLLOWUP] Check should_test_connect and pyarrow to skip tests

2024-04-04 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 240923c2750e [SPARK-46812][PYTHON][TESTS][FOLLOWUP] Check 
should_test_connect and pyarrow to skip tests
240923c2750e is described below

commit 240923c2750e7c87d8d28286e25a80cfe3b08494
Author: Dongjoon Hyun 
AuthorDate: Thu Apr 4 12:34:00 2024 -0700

[SPARK-46812][PYTHON][TESTS][FOLLOWUP] Check should_test_connect and 
pyarrow to skip tests

### What changes were proposed in this pull request?

This is a follow-up of SPARK-46812 to skip the tests more robustly and to 
recover PyPy CIs.
- https://github.com/apache/spark/actions/runs/8556900899/job/23447948557

### Why are the changes needed?

- `should_test_connect` covers more edge cases than `have_pandas`.

- `test_resources.py` has Arrow usage too.

https://github.com/apache/spark/blob/25fc67fa114d2c34099c3ab50396870f543c338b/python/pyspark/resource/tests/test_resources.py#L85

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually tests with `pandas` and without `pyarrow`.

```
$ pip3 freeze | grep pyarrow
$ pip3 freeze | grep pandas
pandas==2.2.1
pandas-stubs==1.2.0.53

$ python/run-tests --modules=pyspark-resource --parallelism=1 
--python-executables=python3.10
Running PySpark tests. Output is in 
/Users/dongjoon/APACHE/spark-merge/python/unit-tests.log
Will test against the following Python executables: ['python3.10']
Will test the following Python modules: ['pyspark-resource']
python3.10 python_implementation is CPython
python3.10 version is: Python 3.10.13
Starting test(python3.10): pyspark.resource.profile (temp output: 
/Users/dongjoon/APACHE/spark-merge/python/target/db9cb886-2698-49d9-a663-9b8bea79caba/python3.10__pyspark.resource.profile__8mg46xru.log)
Finished test(python3.10): pyspark.resource.profile (1s)
Starting test(python3.10): pyspark.resource.tests.test_connect_resources 
(temp output: 
/Users/dongjoon/APACHE/spark-merge/python/target/53f979bd-1073-41e6-99ba-8e787edc415b/python3.10__pyspark.resource.tests.test_connect_resources__hrgrs5sk.log)
Finished test(python3.10): pyspark.resource.tests.test_connect_resources 
(0s) ... 1 tests were skipped
Starting test(python3.10): pyspark.resource.tests.test_resources (temp 
output: 
/Users/dongjoon/APACHE/spark-merge/python/target/2b06c671-0199-4827-a0e5-f852a28313fd/python3.10__pyspark.resource.tests.test_resources__jis6mk9a.log)
Finished test(python3.10): pyspark.resource.tests.test_resources (2s) ... 1 
tests were skipped
Tests passed in 4 seconds

Skipped tests in pyspark.resource.tests.test_connect_resources with 
python3.10:
  test_profile_before_sc_for_connect 
(pyspark.resource.tests.test_connect_resources.ResourceProfileTests) ... skip 
(0.002s)

Skipped tests in pyspark.resource.tests.test_resources with python3.10:
  test_profile_before_sc_for_sql 
(pyspark.resource.tests.test_resources.ResourceProfileTests) ... skip (0.001s)
```

### Was this patch authored or co-authored using generative AI tooling?

No.

    Closes #45880 from dongjoon-hyun/SPARK-46812-2.
    
Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 python/pyspark/resource/tests/test_connect_resources.py |  7 +--
 python/pyspark/resource/tests/test_resources.py | 13 +++--
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/python/pyspark/resource/tests/test_connect_resources.py 
b/python/pyspark/resource/tests/test_connect_resources.py
index 40c68029a153..1529a33cb0ad 100644
--- a/python/pyspark/resource/tests/test_connect_resources.py
+++ b/python/pyspark/resource/tests/test_connect_resources.py
@@ -18,10 +18,13 @@ import unittest
 
 from pyspark.resource import ResourceProfileBuilder, TaskResourceRequests, 
ExecutorResourceRequests
 from pyspark.sql import SparkSession
-from pyspark.testing.sqlutils import have_pandas, pandas_requirement_message
+from pyspark.testing.connectutils import (
+should_test_connect,
+connect_requirement_message,
+)
 
 
-@unittest.skipIf(not have_pandas, pandas_requirement_message)
+@unittest.skipIf(not should_test_connect, connect_requirement_message)
 class ResourceProfileTests(unittest.TestCase):
 def test_profile_before_sc_for_connect(self):
 rpb = ResourceProfileBuilder()
diff --git a/python/pyspark/resource/tests/test_resources.py 
b/python/pyspark/resource/tests/test_resources.py
index 6f61d5af2d92..e29a77ed36dd 100644
--- a/python/pyspark/resource/tests/test_resources.py
+++ b/python/pyspark/resource/tests/test_resources.py
@@ -15,10 +15,16 @@
 # limitations under the License.

(spark) branch master updated: [SPARK-47610][CONNECT][FOLLOWUP] Add -Dio.netty.tryReflectionSetAccessible=true for spark-connect-scala-client

2024-04-04 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e3405c1aa93c [SPARK-47610][CONNECT][FOLLOWUP] Add 
-Dio.netty.tryReflectionSetAccessible=true for spark-connect-scala-client
e3405c1aa93c is described below

commit e3405c1aa93cc513d986cb27765cd90cea572be3
Author: Cheng Pan 
AuthorDate: Thu Apr 4 09:58:42 2024 -0700

[SPARK-47610][CONNECT][FOLLOWUP] Add 
-Dio.netty.tryReflectionSetAccessible=true for spark-connect-scala-client

### What changes were proposed in this pull request?

Add `-Dio.netty.tryReflectionSetAccessible=true` for 
`spark-connect-scala-client`

### Why are the changes needed?

The previous change missed spark-connect-scala-client, may be due to bad 
IDEA index.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass GA

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45860 from pan3793/SPARK-47610-followup.

Authored-by: Cheng Pan 
Signed-off-by: Dongjoon Hyun 
---
 connector/connect/bin/spark-connect-scala-client | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/connector/connect/bin/spark-connect-scala-client 
b/connector/connect/bin/spark-connect-scala-client
index 1748f8adad4f..04f170e187cf 100755
--- a/connector/connect/bin/spark-connect-scala-client
+++ b/connector/connect/bin/spark-connect-scala-client
@@ -68,6 +68,7 @@ JVM_ARGS="-XX:+IgnoreUnrecognizedVMOptions \
   --add-opens=java.base/sun.security.action=ALL-UNNAMED \
   --add-opens=java.base/sun.util.calendar=ALL-UNNAMED \
   --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED \
-  -Djdk.reflect.useDirectMethodHandle=false "
+  -Djdk.reflect.useDirectMethodHandle=false \
+  -Dio.netty.tryReflectionSetAccessible=true"
 
 exec java $JVM_ARGS -cp "$SCCLASSPATH" 
org.apache.spark.sql.application.ConnectRepl "$@"


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (5ca3467b8653 -> 25fc67fa114d)

2024-04-04 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 5ca3467b8653 [SPARK-47729][PYTHON][TESTS] Get the proper default port 
for pyspark-connect testcases
 add 25fc67fa114d [SPARK-47728][DOC] Document G1 Concurrent GC metrics

No new revisions were added by this update.

Summary of changes:
 docs/monitoring.md | 10 ++
 1 file changed, 10 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47729][PYTHON][TESTS] Get the proper default port for pyspark-connect testcases

2024-04-04 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5ca3467b8653 [SPARK-47729][PYTHON][TESTS] Get the proper default port 
for pyspark-connect testcases
5ca3467b8653 is described below

commit 5ca3467b86531b971f92c4d9da3ecc2735ae2214
Author: Hyukjin Kwon 
AuthorDate: Thu Apr 4 07:33:24 2024 -0700

[SPARK-47729][PYTHON][TESTS] Get the proper default port for 
pyspark-connect testcases

### What changes were proposed in this pull request?

This PR proposes to get the proper default port for `pyspark-connect` 
testcases.

### Why are the changes needed?

`pyspark-connect` cannot access to the JVM, so cannot get the randomized 
port assigned from JVM.

### Does this PR introduce _any_ user-facing change?

No, `pyspark-connect` is not published yet, and this is a test-only change.

### How was this patch tested?

Manually tested.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45875 from HyukjinKwon/SPARK-47729.

Authored-by: Hyukjin Kwon 
Signed-off-by: Dongjoon Hyun 
---
 python/pyspark/sql/connect/client/core.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/sql/connect/client/core.py 
b/python/pyspark/sql/connect/client/core.py
index dd7fae881aec..b731960bbaf3 100644
--- a/python/pyspark/sql/connect/client/core.py
+++ b/python/pyspark/sql/connect/client/core.py
@@ -56,6 +56,7 @@ import grpc
 from google.protobuf import text_format, any_pb2
 from google.rpc import error_details_pb2
 
+from pyspark.util import is_remote_only
 from pyspark.accumulators import SpecialAccumulatorIds
 from pyspark.loose_version import LooseVersion
 from pyspark.version import __version__
@@ -292,7 +293,7 @@ class DefaultChannelBuilder(ChannelBuilder):
 
 @staticmethod
 def default_port() -> int:
-if "SPARK_TESTING" in os.environ:
+if "SPARK_TESTING" in os.environ and not is_remote_only():
 from pyspark.sql.session import SparkSession as PySparkSession
 
 # In the case when Spark Connect uses the local mode, it starts 
the regular Spark


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (d272a1b4367e -> d75c77562d27)

2024-04-03 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from d272a1b4367e [SPARK-47724][PYTHON][TESTS] Add an environment variable 
for testing remote pure Python library
 add d75c77562d27 [SPARK-46812][PYTHON][TESTS][FOLLOWUP] Skip 
`pandas`-required tests if pandas is not available

No new revisions were added by this update.

Summary of changes:
 python/pyspark/resource/tests/test_connect_resources.py | 2 ++
 python/pyspark/resource/tests/test_resources.py | 2 ++
 2 files changed, 4 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (d87ac8ef49db -> 447f8aff6c26)

2024-04-03 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from d87ac8ef49db [SPARK-47708][CONNECT] Do not log gRPC exception to 
stderr in PySpark
 add 447f8aff6c26 [SPARK-47720][CORE] Update `spark.speculation.multiplier` 
to 3 and `spark.speculation.quantile` to 0.9

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/internal/config/package.scala| 4 ++--
 .../test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala   | 2 ++
 docs/configuration.md | 4 ++--
 docs/core-migration-guide.md  | 2 ++
 4 files changed, 8 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47710][SQL][DOCS] Postgres: Document Mapping Spark SQL Data Types from PostgreSQL

2024-04-03 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a427a4586177 [SPARK-47710][SQL][DOCS] Postgres: Document Mapping Spark 
SQL Data Types from PostgreSQL
a427a4586177 is described below

commit a427a4586177e521d21d4eb5c3c125d1ff65f71d
Author: Kent Yao 
AuthorDate: Wed Apr 3 10:51:02 2024 -0700

[SPARK-47710][SQL][DOCS] Postgres: Document Mapping Spark SQL Data Types 
from PostgreSQL

### What changes were proposed in this pull request?

This PR added a User Document for Mapping Spark SQL Data Types from 
PostgreSQL. The write side document is not included yet which might need 
further verification.

### Why are the changes needed?

doc improvements

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?

add some test for missing PG data types

![image](https://github.com/apache/spark/assets/8326978/7629fd87-b047-48c7-9892-42820f0bb430)

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #45845 from yaooqinn/SPARK-47710.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/jdbc/PostgresIntegrationSuite.scala  |  15 ++
 docs/sql-data-sources-jdbc.md  | 224 -
 2 files changed, 237 insertions(+), 2 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
index f70bd8091204..69573e9bddb1 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
@@ -187,6 +187,9 @@ class PostgresIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 |)""".stripMargin).executeUpdate()
 conn.prepareStatement("CREATE TABLE complex_table (c1 
complex)").executeUpdate()
 conn.prepareStatement("INSERT INTO complex_table VALUES (ROW(true, 
1.0))").executeUpdate()
+conn.prepareStatement("CREATE DOMAIN myint AS integer CHECK (VALUE > 
0)").executeUpdate()
+conn.prepareStatement("CREATE TABLE domain_table (c1 
myint)").executeUpdate()
+conn.prepareStatement("INSERT INTO domain_table VALUES 
(1)").executeUpdate()
   }
 
   test("Type mapping for various types") {
@@ -542,4 +545,16 @@ class PostgresIntegrationSuite extends 
DockerJDBCIntegrationSuite {
   .load()
 checkAnswer(df, Row("[3,7)"))
   }
+
+  test("SPARK-47710: Reading Domain Types") {
+val df = spark.read.jdbc(jdbcUrl, "domain_table", new Properties)
+checkAnswer(df, Row(1))
+  }
+
+  test("SPARK-47710: Reading Object Identifier Types") {
+val df = spark.read.format("jdbc")
+  .option("url", jdbcUrl)
+  .option("query", "SELECT 1::oid, 'bar'::regclass, 
'integer'::regtype").load()
+checkAnswer(df, Row(1, "bar", "integer"))
+  }
 }
diff --git a/docs/sql-data-sources-jdbc.md b/docs/sql-data-sources-jdbc.md
index 6dfdf07bae11..9887e6a98ebd 100644
--- a/docs/sql-data-sources-jdbc.md
+++ b/docs/sql-data-sources-jdbc.md
@@ -432,7 +432,7 @@ SELECT * FROM resultTable
 
 ### Mapping Spark SQL Data Types from MySQL
 
-The below table describe the data type conversions from MySQL data types to 
Spark SQL Data Types,
+The below table describes the data type conversions from MySQL data types to 
Spark SQL Data Types,
 when reading data from a MySQL table using the built-in jdbc data source with 
the MySQL Connector/J
 as the activated JDBC Driver. Note that, different JDBC drivers, such as Maria 
Connector/J, which
 are also available to connect MySQL, may have different mapping rules.
@@ -681,7 +681,7 @@ are also available to connect MySQL, may have different 
mapping rules.
 
 ### Mapping Spark SQL Data Types to MySQL
 
-The below table describe the data type conversions from Spark SQL Data Types 
to MySQL data types,
+The below table describes the data type conversions from Spark SQL Data Types 
to MySQL data types,
 when creating, altering, or writing data to a MySQL table using the built-in 
jdbc data source with
 the MySQL Connector/J as the activated JDBC Driver.
 
@@ -789,3 +789,223 @@ The Spark Catalyst data types below are not supported 
with suitable MYSQL types.
 - NullType
 - ObjectType
 - VariantType
+
+
+### Mapping Spark SQL Data Types from PostgreSQL
+
+The below table describes the data type conversions from PostgreSQL data types 
to Spark

(spark) branch master updated (1404d1f801c7 -> efe437ef9f9e)

2024-04-03 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 1404d1f801c7 [SPARK-47715][BUILD][STS] Upgrade hive-service-rpc 4.0.0
 add efe437ef9f9e [SPARK-47711][BUILD][TESTS] Parameterize JDBC Driver 
versions for docker integration tests

No new revisions were added by this update.

Summary of changes:
 pom.xml | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47715][BUILD][STS] Upgrade hive-service-rpc 4.0.0

2024-04-03 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1404d1f801c7 [SPARK-47715][BUILD][STS] Upgrade hive-service-rpc 4.0.0
1404d1f801c7 is described below

commit 1404d1f801c7508245f72f15049f45ad7f7aba27
Author: Cheng Pan 
AuthorDate: Wed Apr 3 07:55:45 2024 -0700

[SPARK-47715][BUILD][STS] Upgrade hive-service-rpc 4.0.0

### What changes were proposed in this pull request?

This PR upgrades hive-service-rpc from 3.1.3 to 4.0.0, which has 3 changes.

- https://issues.apache.org/jira/browse/HIVE-14388 (new added field is 
optional, leave it now and investigate later)
- https://issues.apache.org/jira/browse/HIVE-24230 (not applicable for 
Spark)
- https://issues.apache.org/jira/browse/HIVE-24893 (mark methods as not 
supported and investigate later)

### Why are the changes needed?

Use the latest version of `hive-service-rpc`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass GA.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45854 from pan3793/SPARK-47715.

Authored-by: Cheng Pan 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3  |  2 +-
 pom.xml|  2 +-
 .../main/java/org/apache/hive/service/cli/OperationType.java   |  1 +
 .../org/apache/hive/service/cli/thrift/ThriftCLIService.java   | 10 ++
 4 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index c6913ceeff13..6ca93894bd81 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -89,7 +89,7 @@ hive-jdbc/2.3.9//hive-jdbc-2.3.9.jar
 hive-llap-common/2.3.9//hive-llap-common-2.3.9.jar
 hive-metastore/2.3.9//hive-metastore-2.3.9.jar
 hive-serde/2.3.9//hive-serde-2.3.9.jar
-hive-service-rpc/3.1.3//hive-service-rpc-3.1.3.jar
+hive-service-rpc/4.0.0//hive-service-rpc-4.0.0.jar
 hive-shims-0.23/2.3.9//hive-shims-0.23-2.3.9.jar
 hive-shims-common/2.3.9//hive-shims-common-2.3.9.jar
 hive-shims-scheduler/2.3.9//hive-shims-scheduler-2.3.9.jar
diff --git a/pom.xml b/pom.xml
index ca949a05c81c..295165446e92 100644
--- a/pom.xml
+++ b/pom.xml
@@ -2476,7 +2476,7 @@
   
 ${hive.group}
 hive-service-rpc
-3.1.3
+4.0.0
 
   
 *
diff --git 
a/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/OperationType.java
 
b/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/OperationType.java
index 376eee4443c6..68484e06e1f9 100644
--- 
a/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/OperationType.java
+++ 
b/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/OperationType.java
@@ -27,6 +27,7 @@ public enum OperationType {
 
   UNKNOWN_OPERATION(TOperationType.UNKNOWN),
   EXECUTE_STATEMENT(TOperationType.EXECUTE_STATEMENT),
+  PROCEDURAL_SQL(TOperationType.PROCEDURAL_SQL),
   GET_TYPE_INFO(TOperationType.GET_TYPE_INFO),
   GET_CATALOGS(TOperationType.GET_CATALOGS),
   GET_SCHEMAS(TOperationType.GET_SCHEMAS),
diff --git 
a/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java
 
b/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java
index 815df3eafdde..4b18e2950a3d 100644
--- 
a/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java
+++ 
b/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java
@@ -696,6 +696,16 @@ public abstract class ThriftCLIService extends 
AbstractService implements TCLISe
 }
   }
 
+  @Override
+  public TUploadDataResp UploadData(TUploadDataReq req) throws TException {
+throw new UnsupportedOperationException("Method UploadData has not been 
implemented.");
+  }
+
+  @Override
+  public TDownloadDataResp DownloadData(TDownloadDataReq req) throws 
TException {
+throw new UnsupportedOperationException("Method DownloadData has not been 
implemented.");
+  }
+
   @Override
   public abstract void run();
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47452][INFRA][FOLLOWUP] Enforce to install `six` to `Python 3.10`

2024-04-02 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 62f90ec6d32f [SPARK-47452][INFRA][FOLLOWUP] Enforce to install `six` 
to `Python 3.10`
62f90ec6d32f is described below

commit 62f90ec6d32f708a90329bb8c741482e18a63e56
Author: Dongjoon Hyun 
AuthorDate: Tue Apr 2 23:14:43 2024 -0700

[SPARK-47452][INFRA][FOLLOWUP] Enforce to install `six` to `Python 3.10`

### What changes were proposed in this pull request?

This PR aims to enforce to install `six` to Python 3.10 because `Python 
3.10` is missing `six` and causes `Pandas` detection failures in CIs.
- https://github.com/apache/spark/actions/runs/8525063765/job/23373974516
   - Note that `pandas` is visible in the installed package list, but it 
fails when PySpark detects it due to the missing `six`.

```
$ docker run -it --rm 
ghcr.io/apache/apache-spark-ci-image:master-8345361470 python3.9 -m pip freeze 
| grep six
six==1.16.0
$ docker run -it --rm 
ghcr.io/apache/apache-spark-ci-image:master-8345361470 python3.10 -m pip freeze
| grep six
$ docker run -it --rm 
ghcr.io/apache/apache-spark-ci-image:master-8345361470 python3.11 -m pip freeze 
| grep six
six==1.16.0
$ docker run -it --rm 
ghcr.io/apache/apache-spark-ci-image:master-8345361470 python3.12 -m pip freeze 
| grep six
six==1.16.0
```

- CI failure message example.
  - https://github.com/apache/spark/actions/runs/8525063765/job/23373974096
```
Starting test(python3.10): 
pyspark.ml.tests.connect.test_connect_classification (temp output: 
/__w/spark/spark/python/target/370eb2c4-12f2-411f-96d1-f617f5d59528/python3.10__pyspark.ml.tests.connect.test_connect_classification__v6itdsxy.log)
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
  File 
"/__w/spark/spark/python/pyspark/ml/tests/connect/test_connect_classification.py",
 line 37, in 
class ClassificationTestsOnConnect(ClassificationTestsMixin, 
unittest.TestCase):
NameError: name 'ClassificationTestsMixin' is not defined
```

### Why are the changes needed?

Since Python 3.10 is the default Python version of Ubuntu OS, the behavior 
is different.
```
RUN python3.10 -m pip install numpy pyarrow>=15.0.0 six==1.16.0 ...
...
#20 0.766 Requirement already satisfied: six==1.16.0 in 
/usr/lib/python3/dist-packages (1.16.0)
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Check the docker image built by this PR.
- 
https://github.com/dongjoon-hyun/spark/actions/runs/8533625657/job/23376659246

```
    $ docker pull --platform amd64 
ghcr.io/dongjoon-hyun/apache-spark-ci-image:master-8533625657

$ docker run -it --rm 
ghcr.io/dongjoon-hyun/apache-spark-ci-image:master-8533625657 python3.10 -m pip 
freeze | grep six
six==1.16.0
```

Run tests on new docker image.
```
$ docker run -it --rm -v $PWD:/spark 
ghcr.io/dongjoon-hyun/apache-spark-ci-image:master-8533625657
rootb7f5f56892b0:/# cd /spark
rootb7f5f56892b0:/spark# python/run-tests 
--modules=pyspark-mllib,pyspark-ml,pyspark-ml-connect --parallelism=1 
--python-executables=python3.10
Running PySpark tests. Output is in /spark/python/unit-tests.log
Will test against the following Python executables: ['python3.10']
Will test the following Python modules: ['pyspark-mllib', 'pyspark-ml', 
'pyspark-ml-connect']
python3.10 python_implementation is CPython
python3.10 version is: Python 3.10.12
Starting test(python3.10): 
pyspark.ml.tests.connect.test_connect_classification (temp output: 
/spark/python/target/675eccdc-3c4b-4146-a58b-030302bdc6d7/python3.10__pyspark.ml.tests.connect.test_connect_classification__9habp0rh.log)
Finished test(python3.10): 
pyspark.ml.tests.connect.test_connect_classification (159s)
Starting test(python3.10): pyspark.ml.tests.connect.test_connect_evaluation 
(temp output: 
/spark/python/target/fbac93ba-c72d-40e4-acfe-f3ac01b4932a/python3.10__pyspark.ml.tests.connect.test_connect_evaluation__js11z0ux.log)
Finished test(python3.10): pyspark.ml.tests.connect.test_connect_evaluation 
(36s)
Starting test(python3.10): pyspark.ml.tests.connect.test_connect_feature 
(temp output: 
/spark/python/target/fdb8828e-4241-4e78-a7d6-b2a4beb3cfc1/python3.10__pyspark.ml.tests.connect.test_connect_feature__et5gr30f.log)
Finished test(python3.10): pyspark.ml.tests.connect.test_connect_feature

(spark) branch master updated: [SPARK-47701][SQL][TESTS] Postgres: Add test for Composite and Range types

2024-04-02 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9a20794b252d [SPARK-47701][SQL][TESTS] Postgres: Add test for 
Composite and Range types
9a20794b252d is described below

commit 9a20794b252d207b6864f656e7fab85007911537
Author: Kent Yao 
AuthorDate: Tue Apr 2 22:43:05 2024 -0700

[SPARK-47701][SQL][TESTS] Postgres: Add test for Composite and Range types

### What changes were proposed in this pull request?

Add tests for Composite and Range types of postgres.

### Why are the changes needed?

test improvments

### Does this PR introduce _any_ user-facing change?

no, test-only

### How was this patch tested?

new tests

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #45827 from yaooqinn/SPARK-47701.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/jdbc/PostgresIntegrationSuite.scala  | 26 ++
 1 file changed, 26 insertions(+)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
index 9015434cedd0..f70bd8091204 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
@@ -178,6 +178,15 @@ class PostgresIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 conn.prepareStatement("CREATE TABLE test_bit_array (c1 bit(1)[], c2 
bit(5)[])").executeUpdate()
 conn.prepareStatement("INSERT INTO test_bit_array VALUES (ARRAY[B'1', 
B'0'], " +
   "ARRAY[B'1', B'00010'])").executeUpdate()
+
+conn.prepareStatement(
+  """
+|CREATE TYPE complex AS (
+|b   bool,
+|d   double precision
+|)""".stripMargin).executeUpdate()
+conn.prepareStatement("CREATE TABLE complex_table (c1 
complex)").executeUpdate()
+conn.prepareStatement("INSERT INTO complex_table VALUES (ROW(true, 
1.0))").executeUpdate()
   }
 
   test("Type mapping for various types") {
@@ -516,4 +525,21 @@ class PostgresIntegrationSuite extends 
DockerJDBCIntegrationSuite {
   },
   errorClass = null)
   }
+
+  test("SPARK-47701: Reading complex type") {
+val df = spark.read.jdbc(jdbcUrl, "complex_table", new Properties)
+checkAnswer(df, Row("(t,1)"))
+val df2 = spark.read.format("jdbc")
+  .option("url", jdbcUrl)
+  .option("query", "SELECT (c1).b, (c1).d FROM complex_table").load()
+checkAnswer(df2, Row(true, 1.0d))
+  }
+
+  test("SPARK-47701: Range Types") {
+val df = spark.read.format("jdbc")
+  .option("url", jdbcUrl)
+  .option("query", "SELECT '[3,7)'::int4range")
+  .load()
+checkAnswer(df, Row("[3,7)"))
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-45733][PYTHON][TESTS][FOLLOWUP] Skip `pyspark.sql.tests.connect.client.test_client` if not should_test_connect

2024-04-02 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 360a3f9023d0 [SPARK-45733][PYTHON][TESTS][FOLLOWUP] Skip 
`pyspark.sql.tests.connect.client.test_client` if not should_test_connect
360a3f9023d0 is described below

commit 360a3f9023d08812e3f3c44af9cdac644c5d67b2
Author: Dongjoon Hyun 
AuthorDate: Tue Apr 2 22:30:08 2024 -0700

[SPARK-45733][PYTHON][TESTS][FOLLOWUP] Skip 
`pyspark.sql.tests.connect.client.test_client` if not should_test_connect

### What changes were proposed in this pull request?

This is a follow-up of the following.
- https://github.com/apache/spark/pull/43591

### Why are the changes needed?

This test requires `pandas` which is an optional dependency in Apache Spark.

```
$ python/run-tests --modules=pyspark-connect --parallelism=1 
--python-executables=python3.10  --testnames 
'pyspark.sql.tests.connect.client.test_client'
Running PySpark tests. Output is in 
/Users/dongjoon/APACHE/spark-merge/python/unit-tests.log
Will test against the following Python executables: ['python3.10']
Will test the following Python tests: 
['pyspark.sql.tests.connect.client.test_client']
python3.10 python_implementation is CPython
python3.10 version is: Python 3.10.13
Starting test(python3.10): pyspark.sql.tests.connect.client.test_client 
(temp output: 
/Users/dongjoon/APACHE/spark-merge/python/target/216a8716-3a1f-4cf9-9c7c-63087f29f892/python3.10__pyspark.sql.tests.connect.client.test_client__tydue4ck.log)
Traceback (most recent call last):
  File "/Users/dongjoon/.pyenv/versions/3.10.13/lib/python3.10/runpy.py", 
line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
  File "/Users/dongjoon/.pyenv/versions/3.10.13/lib/python3.10/runpy.py", 
line 86, in _run_code
    exec(code, run_globals)
  File 
"/Users/dongjoon/APACHE/spark-merge/python/pyspark/sql/tests/connect/client/test_client.py",
 line 137, in 
class TestPolicy(DefaultPolicy):
NameError: name 'DefaultPolicy' is not defined
```

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Pass the CIs and manually test without `pandas`.
```
$ pip3 uninstall pandas
$ python/run-tests --modules=pyspark-connect --parallelism=1 
--python-executables=python3.10  --testnames 
'pyspark.sql.tests.connect.client.test_client'
Running PySpark tests. Output is in 
/Users/dongjoon/APACHE/spark-merge/python/unit-tests.log
Will test against the following Python executables: ['python3.10']
Will test the following Python tests: 
['pyspark.sql.tests.connect.client.test_client']
python3.10 python_implementation is CPython
python3.10 version is: Python 3.10.13
Starting test(python3.10): pyspark.sql.tests.connect.client.test_client 
(temp output: 
/Users/dongjoon/APACHE/spark-merge/python/target/acf07ed5-938a-4272-87e1-47e3bf8b988e/python3.10__pyspark.sql.tests.connect.client.test_client__sfdosnek.log)
Finished test(python3.10): pyspark.sql.tests.connect.client.test_client 
(0s) ... 13 tests were skipped
Tests passed in 0 seconds

Skipped tests in pyspark.sql.tests.connect.client.test_client with 
python3.10:
  test_basic_flow 
(pyspark.sql.tests.connect.client.test_client.SparkConnectClientReattachTestCase)
 ... skip (0.002s)
  test_fail_and_retry_during_execute 
(pyspark.sql.tests.connect.client.test_client.SparkConnectClientReattachTestCase)
 ... skip (0.000s)
  test_fail_and_retry_during_reattach 
(pyspark.sql.tests.connect.client.test_client.SparkConnectClientReattachTestCase)
 ... skip (0.000s)
  test_fail_during_execute 
(pyspark.sql.tests.connect.client.test_client.SparkConnectClientReattachTestCase)
 ... skip (0.000s)
  test_channel_builder 
(pyspark.sql.tests.connect.client.test_client.SparkConnectClientTestCase) ... 
skip (0.000s)
  test_channel_builder_with_session 
(pyspark.sql.tests.connect.client.test_client.SparkConnectClientTestCase) ... 
skip (0.000s)
  test_interrupt_all 
(pyspark.sql.tests.connect.client.test_client.SparkConnectClientTestCase) ... 
skip (0.000s)
  test_is_closed 
(pyspark.sql.tests.connect.client.test_client.SparkConnectClientTestCase) ... 
skip (0.000s)
  test_properties 
(pyspark.sql.tests.connect.client.test_client.SparkConnectClientTestCase) ... 
skip (0.000s)
  test_retry 
(pyspark.sql.tests.connect.client.test_client.SparkConnectClientTestCase) ... 
skip (0.000s)
  test_retry_client_unit 
(pyspark.sql.tests.connect.client.test_client.SparkConnectClientTestCase) ... 
skip (0.000s)
  test_use

(spark) branch master updated (2c2a2adc3275 -> 344f640b2f35)

2024-04-02 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 2c2a2adc3275 [SPARK-47655][SS] Integrate timer with Initial State 
handling for state-v2
 add 344f640b2f35 [SPARK-47454][PYTHON][TESTS][FOLLOWUP] Skip 
`test_create_dataframe_from_pandas_with_day_time_interval` if pandas is not 
avaiable

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/tests/test_creation.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (55b5ff6f45fd -> 49b7b6b9fe6b)

2024-04-02 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 55b5ff6f45fd [SPARK-47669][SQL][CONNECT][PYTHON] Add `Column.try_cast`
 add 49b7b6b9fe6b [SPARK-47691][SQL] Postgres: Support multi dimensional 
array on the write side

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/jdbc/PostgresIntegrationSuite.scala  | 23 ++
 .../sql/execution/datasources/jdbc/JdbcUtils.scala | 23 +-
 .../apache/spark/sql/jdbc/PostgresDialect.scala|  2 +-
 3 files changed, 42 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47697][INFRA] Add Scala style check for invalid MDC usage

2024-04-02 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 27fdf96842ea [SPARK-47697][INFRA] Add Scala style check for invalid 
MDC usage
27fdf96842ea is described below

commit 27fdf96842ea5a98ea5835bbf78b33b26db1fd3b
Author: Gengliang Wang 
AuthorDate: Tue Apr 2 15:45:23 2024 -0700

[SPARK-47697][INFRA] Add Scala style check for invalid MDC usage

### What changes were proposed in this pull request?

Add Scala style check for invalid MDC usage to avoid invalid MDC usage 
`s"Task ${MDC(TASK_ID, taskId)} failed"`, which should be `log"Task 
${MDC(TASK_ID, taskId)} failed"`.

### Why are the changes needed?

This makes development and PR review of the structured logging migration 
easier.

### Does this PR introduce _any_ user-facing change?

NO

### How was this patch tested?

Manual test, verified it will throw errors on invalid MDC usage.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45823 from gengliangwang/style.

Authored-by: Gengliang Wang 
Signed-off-by: Dongjoon Hyun 
---
 scalastyle-config.xml | 5 +
 1 file changed, 5 insertions(+)

diff --git a/scalastyle-config.xml b/scalastyle-config.xml
index 50bd5a33ccb2..cd5a576c086f 100644
--- a/scalastyle-config.xml
+++ b/scalastyle-config.xml
@@ -150,6 +150,11 @@ This file is divided into 3 sections:
   // scalastyle:on println]]>
   
 
+  
+s".*\$\{MDC\(
+
+  
+
   
 spark(.sqlContext)?.sparkContext.hadoopConfiguration

(spark) branch master updated: [SPARK-47699][BUILD] Upgrade `gcs-connector` to 2.2.21 and add a note for 3.0.0

2024-04-02 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d11d9cf729ab [SPARK-47699][BUILD] Upgrade `gcs-connector` to 2.2.21 
and add a note for 3.0.0
d11d9cf729ab is described below

commit d11d9cf729ab699c68770337d35043ebf58195cf
Author: Dongjoon Hyun 
AuthorDate: Tue Apr 2 13:31:18 2024 -0700

[SPARK-47699][BUILD] Upgrade `gcs-connector` to 2.2.21 and add a note for 
3.0.0

### What changes were proposed in this pull request?

This PR aims to upgrade `gcs-connector` to 2.2.21 and add a note for 3.0.0.

### Why are the changes needed?

This PR aims to upgrade `gcs-connector` to bring the latest bug fixes.

However, due to the following, we stick to use 2.2.21.
- https://github.com/GoogleCloudDataproc/hadoop-connectors/issues/1114
  - `gcs-connector` 2.2.21 has shaded Guava 32.1.2-jre.
- 
https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/15c8ee41a15d6735442f36333f1d67792c93b9cf/pom.xml#L100

  - `gcs-connector` 3.0.0 has shaded Guava 31.1-jre.
- 
https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/667bf17291dbaa96a60f06df58c7a528bc4a8f79/pom.xml#L97

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually.
```
$ dev/make-distribution.sh -Phadoop-cloud
$ cd dist
$ export KEYFILE=~/.ssh/apache-spark.json
$ export EMAIL=$(jq -r '.client_email' < $KEYFILE)
$ export PRIVATE_KEY_ID=$(jq -r '.private_key_id' < $KEYFILE)
$ export PRIVATE_KEY="$(jq -r '.private_key' < $KEYFILE)"
$ bin/spark-shell \
-c spark.hadoop.fs.gs.auth.service.account.email=$EMAIL \
-c 
spark.hadoop.fs.gs.auth.service.account.private.key.id=$PRIVATE_KEY_ID \
-c 
spark.hadoop.fs.gs.auth.service.account.private.key="$PRIVATE_KEY"
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 4.0.0-SNAPSHOT
  /_/

Using Scala version 2.13.13 (OpenJDK 64-Bit Server VM, Java 21.0.2)
Type in expressions to have them evaluated.
Type :help for more information.
{"ts":"2024-04-02T13:08:31.513-0700","level":"WARN","msg":"Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable","logger":"org.apache.hadoop.util.NativeCodeLoader"}
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = 
local-1712088511841).
Spark session available as 'spark'.

scala> spark.read.text("gs://apache-spark-bucket/README.md").count()
val res0: Long = 124

scala> 
spark.read.orc("examples/src/main/resources/users.orc").write.mode("overwrite").orc("gs://apache-spark-bucket/users.orc")

scala> spark.read.orc("gs://apache-spark-bucket/users.orc").show()
+--+--++
|  name|favorite_color|favorite_numbers|
+--+--++
|Alyssa|  NULL|  [3, 9, 15, 20]|
|   Ben|   red|  []|
+--+--+----+
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45824 from dongjoon-hyun/SPARK-47699.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index a564ec9f044a..c6913ceeff13 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -66,7 +66,7 @@ 
eclipse-collections-api/11.1.0//eclipse-collections-api-11.1.0.jar
 eclipse-collections/11.1.0//eclipse-collections-11.1.0.jar
 esdk-obs-java/3.20.4.2//esdk-obs-java-3.20.4.2.jar
 flatbuffers-java/23.5.26//flatbuffers-java-23.5.26.jar
-gcs-connector/hadoop3-2.2.20/shaded/gcs-connector-hadoop3-2.2.20-shaded.jar
+gcs-connector/hadoop3-2.2.21/shaded/gcs-connector-hadoop3-2.2.21-shaded.jar
 gmetric4j/1.0.10//gmetric4j-1.0.10.jar
 gson/2.2.4//gson-2.2.4.jar
 guava/14.0.1//guava-14.0.1.jar
diff --git a/pom.xml b/pom.xml
index b70d091796a5..ca949a05c81c 100644
--- a/pom.xml
+++ b/pom.xml
@@ -1

(spark) branch master updated (db0975cb2a1c -> e6144e4d75b7)

2024-04-02 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from db0975cb2a1c [SPARK-47602][CORE] Resource managers: Migrate logError 
with variables to structured logging framework
 add e6144e4d75b7 [SPARK-47695][BUILD] Upgrade AWS SDK v2 to 2.24.6

No new revisions were added by this update.

Summary of changes:
 dev/deps/spark-deps-hadoop-3-hive-2.3 |  2 +-
 hadoop-cloud/pom.xml  | 11 +++
 pom.xml   |  2 +-
 3 files changed, 13 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47686][SQL][TESTS] Use `=!=` instead of `!==` in `JoinHintSuite`

2024-04-01 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 00162b82fe3c [SPARK-47686][SQL][TESTS] Use `=!=` instead of `!==` in 
`JoinHintSuite`
00162b82fe3c is described below

commit 00162b82fe3c48e26303394d1a91d026fe8d9b4c
Author: yangjie01 
AuthorDate: Mon Apr 1 23:45:00 2024 -0700

[SPARK-47686][SQL][TESTS] Use `=!=` instead of `!==` in `JoinHintSuite`

### What changes were proposed in this pull request?
This pr use `=!=` instead of `!==` in `JoinHintSuite`. `!==`  is a 
deprecated API since 2.0.0, and its test already exists in `DeprecatedAPISuite`.

### Why are the changes needed?
Clean up deprecated API  usage.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #45812 from LuciferYang/SPARK-47686.

Authored-by: yangjie01 
Signed-off-by: Dongjoon Hyun 
---
 sql/core/src/test/scala/org/apache/spark/sql/JoinHintSuite.scala | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/sql/core/src/test/scala/org/apache/spark/sql/JoinHintSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/JoinHintSuite.scala
index 66746263ff90..53e47f428c3a 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/JoinHintSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/JoinHintSuite.scala
@@ -695,11 +695,11 @@ class JoinHintSuite extends PlanTest with 
SharedSparkSession with AdaptiveSparkP
 val hintAppender = new LogAppender(s"join hint check for equi-join")
 withLogAppender(hintAppender, level = Some(Level.WARN)) {
   assertBroadcastNLJoin(
-df1.hint("SHUFFLE_HASH").join(df2, $"a1" !== $"b1"), BuildRight)
+df1.hint("SHUFFLE_HASH").join(df2, $"a1" =!= $"b1"), BuildRight)
 }
 withLogAppender(hintAppender, level = Some(Level.WARN)) {
   assertBroadcastNLJoin(
-df1.join(df2.hint("MERGE"), $"a1" !== $"b1"), BuildRight)
+df1.join(df2.hint("MERGE"), $"a1" =!= $"b1"), BuildRight)
 }
 val logs = hintAppender.loggingEvents.map(_.getMessage.getFormattedMessage)
   .filter(_.contains("is not supported in the query:"))


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47679][SQL] Use `HiveConf.getConfVars` or Hive conf names directly

2024-04-01 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e9833495e13a [SPARK-47679][SQL] Use `HiveConf.getConfVars` or Hive 
conf names directly
e9833495e13a is described below

commit e9833495e13a6143da91fbadc042297b95008089
Author: Dongjoon Hyun 
AuthorDate: Mon Apr 1 21:17:56 2024 -0700

[SPARK-47679][SQL] Use `HiveConf.getConfVars` or Hive conf names directly

### What changes were proposed in this pull request?

This PR aims to use `HiveConf.getConfVars` or Hive config names directly to 
be robust on Hive incompatibility.

### Why are the changes needed?

Apache Hive 4.0.0 introduced incompatible changes on `ConfVars` enum via 
HIVE-27925.
- https://github.com/apache/hive/pull/4919
- https://github.com/apache/hive/pull/5107

`HiveConf.getConfVars` or config names is more robust way to handle this 
incompatibility.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45804 from dongjoon-hyun/SPARK-47679.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../java/org/apache/hive/service/cli/CLIService.java  |  3 ++-
 .../hive/service/cli/session/HiveSessionImpl.java |  8 
 .../sql/hive/thriftserver/SparkSQLCLIDriver.scala |  5 +++--
 .../apache/spark/sql/hive/thriftserver/CliSuite.scala | 19 +--
 .../hive/thriftserver/HiveThriftServer2Suites.scala   |  8 
 .../sql/hive/thriftserver/SharedThriftServer.scala|  2 +-
 .../spark/sql/hive/thriftserver/UISeleniumSuite.scala |  4 ++--
 .../scala/org/apache/spark/sql/hive/HiveUtils.scala   |  6 +++---
 .../apache/spark/sql/hive/client/HiveClientImpl.scala | 13 ++---
 .../spark/sql/hive/client/IsolatedClientLoader.scala  |  3 +--
 .../org/apache/spark/sql/hive/orc/OrcFileFormat.scala |  3 +--
 .../apache/spark/sql/hive/HiveSessionStateSuite.scala |  6 ++
 .../apache/spark/sql/hive/HiveSharedStateSuite.scala  | 11 +--
 .../org/apache/spark/sql/hive/test/TestHive.scala |  8 
 14 files changed, 47 insertions(+), 52 deletions(-)

diff --git 
a/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/CLIService.java
 
b/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/CLIService.java
index e761c0aa532c..caccb0c4b76f 100644
--- 
a/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/CLIService.java
+++ 
b/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/CLIService.java
@@ -573,7 +573,8 @@ public class CLIService extends CompositeService implements 
ICLIService {
   public String getQueryId(TOperationHandle opHandle) throws HiveSQLException {
 Operation operation = sessionManager.getOperationManager().getOperation(
 new OperationHandle(opHandle));
-final String queryId = 
operation.getParentSession().getHiveConf().getVar(ConfVars.HIVEQUERYID);
+final String queryId = operation.getParentSession().getHiveConf().getVar(
+  HiveConf.getConfVars("hive.query.id"));
 LOG.debug(opHandle + ": getQueryId() " + queryId);
 return queryId;
   }
diff --git 
a/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/HiveSessionImpl.java
 
b/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/HiveSessionImpl.java
index 3f5da0646bd6..e00d2705d417 100644
--- 
a/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/HiveSessionImpl.java
+++ 
b/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/HiveSessionImpl.java
@@ -119,7 +119,7 @@ public class HiveSessionImpl implements HiveSession {
   LOG.warn("Error setting scheduler queue: " + e, e);
 }
 // Set an explicit session name to control the download directory name
-hiveConf.set(ConfVars.HIVESESSIONID.varname,
+hiveConf.set("hive.session.id",
 sessionHandle.getHandleIdentifier().toString());
 // Use thrift transportable formatter
 hiveConf.set(SerDeUtils.LIST_SINK_OUTPUT_FORMATTER, 
ThriftFormatter.class.getName());
@@ -406,7 +406,7 @@ public class HiveSessionImpl implements HiveSession {
 
   @Override
   public HiveConf getHiveConf() {
-hiveConf.setVar(HiveConf.ConfVars.HIVEFETCHOUTPUTSERDE, 
FETCH_WORK_SERDE_CLASS);
+hiveConf.setVar(HiveConf.getConfVars("hive.fetch.output.serde"), 
FETCH_WORK_SERDE_CLASS);
 return hiveConf;
   }
 
@@ -686,8 +686,8 @@ public class HiveSessionImpl implements HiveSession {
   }
 
   private void cleanupPipeoutFile() {
-String lScratchDir = hiveConf.getVar(ConfVars.LOCALSCRATCHDIR);
-String sessionID = hiveConf.getVa

(spark) branch branch-3.4 updated: [SPARK-47676][BUILD] Clean up the removed `VersionsSuite` references

2024-04-01 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new b0afd04afd5f [SPARK-47676][BUILD] Clean up the removed `VersionsSuite` 
references
b0afd04afd5f is described below

commit b0afd04afd5fa434948ca650fd2b1b9ef5c8d503
Author: Dongjoon Hyun 
AuthorDate: Mon Apr 1 16:49:39 2024 -0700

[SPARK-47676][BUILD] Clean up the removed `VersionsSuite` references

### What changes were proposed in this pull request?

This PR aims to clean up the removed `VersionsSuite` reference.

### Why are the changes needed?

At Apache Spark 3.3.0, `VersionsSuite` is removed via SPARK-38036 .
- https://github.com/apache/spark/pull/35335

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45800 from dongjoon-hyun/SPARK-47676.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 128f74b055d3f290003f42259ffa23861eaa69e1)
Signed-off-by: Dongjoon Hyun 
---
 project/SparkBuild.scala   | 1 -
 sql/hive/src/main/scala/org/apache/spark/sql/hive/client/package.scala | 1 -
 2 files changed, 2 deletions(-)

diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index 31516c8c6ffe..1cbd9a612899 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -559,7 +559,6 @@ object SparkParallelTestGrouping {
 "org.apache.spark.sql.catalyst.expressions.MathExpressionsSuite",
 "org.apache.spark.sql.hive.HiveExternalCatalogSuite",
 "org.apache.spark.sql.hive.StatisticsSuite",
-"org.apache.spark.sql.hive.client.VersionsSuite",
 "org.apache.spark.sql.hive.client.HiveClientVersions",
 "org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite",
 "org.apache.spark.ml.classification.LogisticRegressionSuite",
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/package.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/package.scala
index 9304074e866c..eb69f23d2876 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/package.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/package.scala
@@ -101,7 +101,6 @@ package object client {
 "org.pentaho:pentaho-aggdesigner-algorithm"))
 
 // Since HIVE-23980, calcite-core included in Hive package jar.
-// For spark, only VersionsSuite currently creates a hive materialized 
view for testing.
 case object v2_3 extends HiveVersion("2.3.9",
   exclusions = Seq("org.apache.calcite:calcite-core",
 "org.apache.calcite:calcite-druid",


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47676][BUILD] Clean up the removed `VersionsSuite` references

2024-04-01 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 128f74b055d3 [SPARK-47676][BUILD] Clean up the removed `VersionsSuite` 
references
128f74b055d3 is described below

commit 128f74b055d3f290003f42259ffa23861eaa69e1
Author: Dongjoon Hyun 
AuthorDate: Mon Apr 1 16:49:39 2024 -0700

[SPARK-47676][BUILD] Clean up the removed `VersionsSuite` references

### What changes were proposed in this pull request?

This PR aims to clean up the removed `VersionsSuite` reference.

### Why are the changes needed?

At Apache Spark 3.3.0, `VersionsSuite` is removed via SPARK-38036 .
- https://github.com/apache/spark/pull/35335

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45800 from dongjoon-hyun/SPARK-47676.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 project/SparkBuild.scala   | 1 -
 sql/hive/src/main/scala/org/apache/spark/sql/hive/client/package.scala | 1 -
 2 files changed, 2 deletions(-)

diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index c2b1bc03a967..951d5970c845 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -518,7 +518,6 @@ object SparkParallelTestGrouping {
 "org.apache.spark.sql.catalyst.expressions.MathExpressionsSuite",
 "org.apache.spark.sql.hive.HiveExternalCatalogSuite",
 "org.apache.spark.sql.hive.StatisticsSuite",
-"org.apache.spark.sql.hive.client.VersionsSuite",
 "org.apache.spark.sql.hive.client.HiveClientVersions",
 "org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite",
 "org.apache.spark.ml.classification.LogisticRegressionSuite",
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/package.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/package.scala
index a66842de7d83..564c87a0fca8 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/package.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/package.scala
@@ -59,7 +59,6 @@ package object client {
 "org.pentaho:pentaho-aggdesigner-algorithm"))
 
 // Since HIVE-23980, calcite-core included in Hive package jar.
-// For spark, only VersionsSuite currently creates a hive materialized 
view for testing.
 case object v2_3 extends HiveVersion("2.3.9",
   exclusions = Seq("org.apache.calcite:calcite-core",
 "org.apache.calcite:calcite-druid",


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.5 updated: [SPARK-47676][BUILD] Clean up the removed `VersionsSuite` references

2024-04-01 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new fed3eb57289a [SPARK-47676][BUILD] Clean up the removed `VersionsSuite` 
references
fed3eb57289a is described below

commit fed3eb57289a949e5abcecdc64d32b8004d9463d
Author: Dongjoon Hyun 
AuthorDate: Mon Apr 1 16:49:39 2024 -0700

[SPARK-47676][BUILD] Clean up the removed `VersionsSuite` references

### What changes were proposed in this pull request?

This PR aims to clean up the removed `VersionsSuite` reference.

### Why are the changes needed?

At Apache Spark 3.3.0, `VersionsSuite` is removed via SPARK-38036 .
- https://github.com/apache/spark/pull/35335

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45800 from dongjoon-hyun/SPARK-47676.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 128f74b055d3f290003f42259ffa23861eaa69e1)
Signed-off-by: Dongjoon Hyun 
---
 project/SparkBuild.scala   | 1 -
 sql/hive/src/main/scala/org/apache/spark/sql/hive/client/package.scala | 1 -
 2 files changed, 2 deletions(-)

diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index 40c90a3461b0..25f04f7bff31 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -572,7 +572,6 @@ object SparkParallelTestGrouping {
 "org.apache.spark.sql.catalyst.expressions.MathExpressionsSuite",
 "org.apache.spark.sql.hive.HiveExternalCatalogSuite",
 "org.apache.spark.sql.hive.StatisticsSuite",
-"org.apache.spark.sql.hive.client.VersionsSuite",
 "org.apache.spark.sql.hive.client.HiveClientVersions",
 "org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite",
 "org.apache.spark.ml.classification.LogisticRegressionSuite",
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/package.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/package.scala
index 9304074e866c..eb69f23d2876 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/package.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/package.scala
@@ -101,7 +101,6 @@ package object client {
 "org.pentaho:pentaho-aggdesigner-algorithm"))
 
 // Since HIVE-23980, calcite-core included in Hive package jar.
-// For spark, only VersionsSuite currently creates a hive materialized 
view for testing.
 case object v2_3 extends HiveVersion("2.3.9",
   exclusions = Seq("org.apache.calcite:calcite-core",
 "org.apache.calcite:calcite-druid",


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (e86c499b008c -> 72fe58d716d4)

2024-04-01 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from e86c499b008c [SPARK-47674][CORE] Enable 
`spark.metrics.appStatusSource.enabled` by default
 add 72fe58d716d4 [SPARK-47675][K8S][TESTS] Use AWS SDK v2 `2.23.19` in K8s 
IT

No new revisions were added by this update.

Summary of changes:
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47674][CORE] Enable `spark.metrics.appStatusSource.enabled` by default

2024-04-01 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e86c499b008c [SPARK-47674][CORE] Enable 
`spark.metrics.appStatusSource.enabled` by default
e86c499b008c is described below

commit e86c499b008ccc96b49f3fb9343ce67ff642c204
Author: Dongjoon Hyun 
AuthorDate: Mon Apr 1 13:16:35 2024 -0700

[SPARK-47674][CORE] Enable `spark.metrics.appStatusSource.enabled` by 
default

### What changes were proposed in this pull request?

This PR aims to enable `spark.metrics.appStatusSource.enabled` by default.

### Why are the changes needed?

`spark.metrics.appStatusSource.enabled` was introduced at `Apache Spark 
3.0.0` and has been used usefully in the production in order to expose app 
status. We had better enable it by default in `Apache Spark 4.0.0`.

### Does this PR introduce _any_ user-facing change?

This will expose additional metrics.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45798 from dongjoon-hyun/SPARK-47674.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 core/src/main/scala/org/apache/spark/internal/config/Status.scala | 2 +-
 docs/monitoring.md| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/internal/config/Status.scala 
b/core/src/main/scala/org/apache/spark/internal/config/Status.scala
index 7f03f134d187..50dda81fc2e0 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/Status.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/Status.scala
@@ -69,7 +69,7 @@ private[spark] object Status {
 "will be reported for the status of the running spark app.")
   .version("3.0.0")
   .booleanConf
-  .createWithDefault(false)
+  .createWithDefault(true)
 
   val LIVE_UI_LOCAL_STORE_DIR = ConfigBuilder("spark.ui.store.path")
 .doc("Local directory where to cache application information for live UI. 
By default this is " +
diff --git a/docs/monitoring.md b/docs/monitoring.md
index 79bbb93e50d1..5dc470f1f7e0 100644
--- a/docs/monitoring.md
+++ b/docs/monitoring.md
@@ -1187,7 +1187,7 @@ This is the component with the largest amount of 
instrumented metrics
 
 - namespace=appStatus (all metrics of type=counter)
   - **note:** Introduced in Spark 3.0. Conditional to a configuration 
parameter:
-   `spark.metrics.appStatusSource.enabled` (default is false)
+   `spark.metrics.appStatusSource.enabled` (default is true)
   - stages.failedStages.count
   - stages.skippedStages.count
   - stages.completedStages.count


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47645][BUILD][CORE][SQL][YARN] Make Spark build with `-release` instead of `-target`

2024-03-31 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4c1405d10b8f [SPARK-47645][BUILD][CORE][SQL][YARN] Make Spark build 
with `-release` instead of `-target`
4c1405d10b8f is described below

commit 4c1405d10b8fceba7e1486bd4e2e2a24596a71a7
Author: yangjie01 
AuthorDate: Sun Mar 31 21:27:33 2024 -0700

[SPARK-47645][BUILD][CORE][SQL][YARN] Make Spark build with `-release` 
instead of `-target`

### What changes were proposed in this pull request?
This pr makes the following changes to allow Spark to build with `-release` 
instead of `-target`:

1. Use `MethodHandle` instead of direct calls to 
`sun.security.action.GetBooleanAction` and `sun.util.calendar.ZoneInfo`, 
because they are not `exports` APIs.

2. `Channels.newReader` is used instead of ``,StreamDecoder.forDecoder 
because `StreamDecoder.forDecoder` is also not `exports` APIs.

```java
  public static Reader newReader(ReadableByteChannel ch,
   CharsetDecoder dec,
   int minBufferCap)
{
Objects.requireNonNull(ch, "ch");
return StreamDecoder.forDecoder(ch, dec.reset(), minBufferCap);
}
```

3. Adjusted the import of `java.io._` in `yarn/Client.scala` to fix the 
compilation error:

```
Error: ] 
/home/runner/work/spark/spark/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:20:
 object FileSystem is not a member of package java.io
```

4. Replaced `-target` with `-release` in `pom.xml` and `SparkBuild.scala`, 
and removed the `-source` option, because using `-release` is sufficient.

5. Upgrade `scala-maven-plugin` from 4.7.1 to 4.8.1 to fix the error 
`[ERROR] -release cannot be less than -target` when executing `build/mvn clean 
install -DskipTests -Djava.version=21`

### Why are the changes needed?
After Scala 2.13.9, the compile option `-target` has been deprecated, it is 
recommended to use `-release`:

- https://github.com/scala/scala/pull/9982

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #45716 from LuciferYang/scala-maven-plugin-491.

Authored-by: yangjie01 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/serializer/SerializationDebugger.scala  | 13 ++---
 pom.xml   | 16 ++--
 project/SparkBuild.scala  |  5 ++---
 .../scala/org/apache/spark/deploy/yarn/Client.scala   |  2 +-
 .../spark/sql/catalyst/util/SparkDateTimeUtils.scala  | 19 +++
 .../spark/sql/catalyst/json/CreateJacksonParser.scala |  7 +++
 .../spark/sql/catalyst/xml/CreateXmlParser.scala  |  7 +++
 7 files changed, 40 insertions(+), 29 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala 
b/core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala
index 287912490235..b05babdce169 100644
--- 
a/core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala
+++ 
b/core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala
@@ -18,14 +18,16 @@
 package org.apache.spark.serializer
 
 import java.io._
+import java.lang.invoke.MethodHandles
 import java.lang.reflect.{Field, Method}
-import java.security.AccessController
+import java.security.{AccessController, PrivilegedAction}
 
 import scala.annotation.tailrec
 import scala.collection.mutable
 import scala.util.control.NonFatal
 
 import org.apache.spark.internal.Logging
+import org.apache.spark.util.SparkClassUtils
 
 private[spark] object SerializationDebugger extends Logging {
 
@@ -68,8 +70,13 @@ private[spark] object SerializationDebugger extends Logging {
   }
 
   private[serializer] var enableDebugging: Boolean = {
-!AccessController.doPrivileged(new sun.security.action.GetBooleanAction(
-  "sun.io.serialization.extendedDebugInfo")).booleanValue()
+val lookup = MethodHandles.lookup()
+val clazz = 
SparkClassUtils.classForName("sun.security.action.GetBooleanAction")
+val constructor = clazz.getConstructor(classOf[String])
+val mh = lookup.unreflectConstructor(constructor)
+val action = mh.invoke("sun.io.serialization.extendedDebugInfo")
+  .asInstanceOf[PrivilegedAction[Boolean]]
+!AccessController.doPrivileged(action).booleanValue()
   }
 
   private class SerializationDebugger {
diff --git a/pom.xml b/pom.xml
index d4e0a2d840de..f7c104749e0d 100644
--- a/pom.xml
+++ b/pom.xml
@@ -114,8

(spark) branch master updated: [SPARK-47662][SQL][DOCS] Add User Document for Mapping Spark SQL Data Types to MySQL

2024-03-31 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 18b582ca806c [SPARK-47662][SQL][DOCS] Add User Document for Mapping 
Spark SQL Data Types to MySQL
18b582ca806c is described below

commit 18b582ca806c2ecb48a99333d77a4edd2eade5a2
Author: Kent Yao 
AuthorDate: Sun Mar 31 21:19:10 2024 -0700

[SPARK-47662][SQL][DOCS] Add User Document for Mapping Spark SQL Data Types 
to MySQL

### What changes were proposed in this pull request?

Following #45736, we add the User Document for Mapping Spark SQL Data Types 
to MySQL

### Why are the changes needed?

doc improvement

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

doc build

![image](https://github.com/apache/spark/assets/8326978/e7d1aa1a-3bcf-45ad-9848-233913830d07)

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #45787 from yaooqinn/SPARK-47662.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 docs/sql-data-sources-jdbc.md | 111 ++
 1 file changed, 111 insertions(+)

diff --git a/docs/sql-data-sources-jdbc.md b/docs/sql-data-sources-jdbc.md
index 3563088c600f..801e3c5b2fcb 100644
--- a/docs/sql-data-sources-jdbc.md
+++ b/docs/sql-data-sources-jdbc.md
@@ -678,3 +678,114 @@ are also available to connect MySQL, may have different 
mapping rules.
 
   
 
+
+### Mapping Spark SQL Data Types to MySQL
+
+The below table describe the data type conversions from Spark SQL Data Types 
to MySQL data types,
+when creating, altering, or writing data to a MySQL table using the built-in 
jdbc data source with
+the MySQL Connector/J as the activated JDBC Driver.
+
+Note that, different JDBC drivers, such as Maria Connector/J, which are also 
available to connect MySQL,
+may have different mapping rules.
+
+
+
+
+  
+
+  Spark SQL Data Type
+  MySQL Data Type
+  Remarks
+
+  
+  
+
+  BooleanType
+  BIT(1)
+  
+
+
+  ByteType
+  TINYINT
+  
+
+
+  ShortType
+  INTEGER
+  
+
+
+  IntegerType
+  INTEGER
+  
+
+
+  LongType
+  BIGINT
+  
+
+
+  FloatType
+  FLOAT
+  
+
+
+  DoubleType
+  DOUBLE PRECISION
+  
+
+
+  DecimalType(p, s)
+  DECIMAL(p,s)
+  
+
+
+  DateType
+  DATE
+  
+
+
+  TimestampType
+  TIMESTAMP
+  
+
+
+  TimestampNTZType
+  DATETIME
+  
+
+
+  StringType
+  LONGTEXT
+  
+
+
+  BinaryType
+  BLOB
+  
+
+
+  CharType(n)
+  CHAR(n)
+  
+
+
+  VarcharType(n)
+  VARCHAR(n)
+  
+
+  
+
+
+The Spark Catalyst data types below are not supported with suitable MYSQL 
types.
+
+- DayTimeIntervalType
+- YearMonthIntervalType
+- CalendarIntervalType
+- ArrayType
+- MapType
+- StructType
+- UserDefinedType
+- NullType
+- ObjectType
+- VariantType


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47658][BUILD] Upgrade `tink` to 1.12.0

2024-03-31 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ef4c27bac533 [SPARK-47658][BUILD] Upgrade `tink` to 1.12.0
ef4c27bac533 is described below

commit ef4c27bac5339f97489337d4c12e146c520e4854
Author: panbingkun 
AuthorDate: Sun Mar 31 15:42:04 2024 -0700

[SPARK-47658][BUILD] Upgrade `tink` to 1.12.0

### What changes were proposed in this pull request?
The pr aims to upgrade `tink` from `1.9.0` to `1.12.0`.

### Why are the changes needed?
The last update occurred 11 months ago, as follows: 
https://github.com/apache/spark/pull/40878.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45783 from panbingkun/SPARK-47658.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index cff11cbc1171..60c96a3c57f3 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -266,7 +266,7 @@ stax-api/1.0.1//stax-api-1.0.1.jar
 stream/2.9.6//stream-2.9.6.jar
 super-csv/2.2.0//super-csv-2.2.0.jar
 threeten-extra/1.7.1//threeten-extra-1.7.1.jar
-tink/1.9.0//tink-1.9.0.jar
+tink/1.12.0//tink-1.12.0.jar
 transaction-api/1.1//transaction-api-1.1.jar
 txw2/3.0.2//txw2-3.0.2.jar
 univocity-parsers/2.9.1//univocity-parsers-2.9.1.jar
diff --git a/pom.xml b/pom.xml
index c9ad54292650..d4e0a2d840de 100644
--- a/pom.xml
+++ b/pom.xml
@@ -216,7 +216,7 @@
 1.1.0
 1.6.0
 1.77
-1.9.0
+1.12.0
 5.0.1
 4.1.108.Final
 2.0.65.Final


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47656][BUILD] Upgrade commons-io to 2.16.0

2024-03-31 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1bc812b07921 [SPARK-47656][BUILD] Upgrade commons-io to 2.16.0
1bc812b07921 is described below

commit 1bc812b07921ddb606b5d38af95acd8228e9423f
Author: panbingkun 
AuthorDate: Sun Mar 31 15:40:06 2024 -0700

[SPARK-47656][BUILD] Upgrade commons-io to 2.16.0

### What changes were proposed in this pull request?
The pr aims to upgrade `commons-io` from `2.15.1` to `2.16.0`.

### Why are the changes needed?
1.2.15.1 vs 2.16.0

https://github.com/apache/commons-io/compare/rel/commons-io-2.15.1...rel/commons-io-2.16.0

2.The version fix some bugs:
https://github.com/apache/commons-io/pull/525
https://github.com/apache/commons-io/pull/521

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45781 from panbingkun/SPARK-47656.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 4f038f7f3c35..cff11cbc1171 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -43,7 +43,7 @@ commons-compiler/3.1.9//commons-compiler-3.1.9.jar
 commons-compress/1.26.0//commons-compress-1.26.0.jar
 commons-crypto/1.1.0//commons-crypto-1.1.0.jar
 commons-dbcp/1.4//commons-dbcp-1.4.jar
-commons-io/2.15.1//commons-io-2.15.1.jar
+commons-io/2.16.0//commons-io-2.16.0.jar
 commons-lang/2.6//commons-lang-2.6.jar
 commons-lang3/3.14.0//commons-lang3-3.14.0.jar
 commons-math3/3.6.1//commons-math3-3.6.1.jar
diff --git a/pom.xml b/pom.xml
index b4e7fc3478bb..c9ad54292650 100644
--- a/pom.xml
+++ b/pom.xml
@@ -192,7 +192,7 @@
 3.0.3
 1.16.1
 1.26.0
-2.15.1
+2.16.0
 
 2.6
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.4 updated: [SPARK-47646][SQL][FOLLOWUP][3.4] Replace non-existing try_to_number function with TryToNumber

2024-03-31 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new a1544152248d [SPARK-47646][SQL][FOLLOWUP][3.4] Replace non-existing 
try_to_number function with TryToNumber
a1544152248d is described below

commit a1544152248d6868ebef51932379e9eda8985e60
Author: Liang-Chi Hsieh 
AuthorDate: Sun Mar 31 15:37:01 2024 -0700

[SPARK-47646][SQL][FOLLOWUP][3.4] Replace non-existing try_to_number 
function with TryToNumber

### What changes were proposed in this pull request?

This patch fixes broken CI by replacing non-existing `try_to_number` 
function in branch-3.4.

### Why are the changes needed?

#45771 backported a test to `StringFunctionsSuite` in branch-3.4 but it 
uses `try_to_number` which is added since Spark 3.5.
So this patch fixes the broken CI: 
https://github.com/apache/spark/actions/runs/8494692184/job/23270175100

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unit test

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45785 from viirya/fix.

Authored-by: Liang-Chi Hsieh 
Signed-off-by: Dongjoon Hyun 
---
 .../src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala| 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala
index 404a1e742d19..18123a4d6ec6 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala
@@ -19,6 +19,7 @@ package org.apache.spark.sql
 
 import org.apache.spark.{SPARK_DOC_ROOT, SparkRuntimeException}
 import org.apache.spark.sql.catalyst.expressions.Cast._
+import org.apache.spark.sql.catalyst.expressions.TryToNumber
 import org.apache.spark.sql.functions._
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.test.SharedSparkSession
@@ -717,6 +718,7 @@ class StringFunctionsSuite extends QueryTest with 
SharedSparkSession {
 
   test("SPARK-47646: try_to_number should return NULL for malformed input") {
 val df = spark.createDataset(spark.sparkContext.parallelize(Seq("11")))
-checkAnswer(df.select(try_to_number($"value", lit("$99.99"))), 
Seq(Row(null)))
+val try_to_number = Column(TryToNumber($"value".expr, lit("$99.99").expr))
+checkAnswer(df.select(try_to_number), Seq(Row(null)))
   }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark-kubernetes-operator) branch main updated: Fix the default branch name to `main`

2024-03-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git


The following commit(s) were added to refs/heads/main by this push:
 new cc6549b  Fix the default branch name to `main`
cc6549b is described below

commit cc6549bcc414f9b0aad8aa9310756798eded
Author: Dongjoon Hyun 
AuthorDate: Fri Mar 29 15:10:40 2024 -0700

Fix the default branch name to `main`
---
 dev/merge_spark_pr.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py
index cd1c864..4647383 100755
--- a/dev/merge_spark_pr.py
+++ b/dev/merge_spark_pr.py
@@ -311,7 +311,7 @@ def resolve_jira_issue(merge_branches, comment, 
default_jira_id=""):
 
 default_fix_versions = []
 for b in merge_branches:
-if b == "master":
+if b == "main":
 default_fix_versions.append(versions[0].name)
 else:
 found = False
@@ -334,7 +334,7 @@ def resolve_jira_issue(merge_branches, comment, 
default_jira_id=""):
 
 for v in default_fix_versions:
 # Handles the case where we have forked a release branch but not yet 
made the release.
-# In this case, if the PR is committed to the master branch and the 
release branch, we
+# In this case, if the PR is committed to the main branch and the 
release branch, we
 # only consider the release branch to be the fix version. E.g. it is 
not valid to have
 # both 1.1.0 and 1.0.0 as fix versions.
 (major, minor, patch) = v.split(".")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark-kubernetes-operator) branch main updated: Update +GITHUB_API_BASE

2024-03-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git


The following commit(s) were added to refs/heads/main by this push:
 new c7df937  Update +GITHUB_API_BASE
c7df937 is described below

commit c7df93747867d1a0963023bdbdb415f8fdc04cd6
Author: Dongjoon Hyun 
AuthorDate: Fri Mar 29 15:08:21 2024 -0700

Update +GITHUB_API_BASE
---
 dev/merge_spark_pr.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py
index 459f67b..cd1c864 100755
--- a/dev/merge_spark_pr.py
+++ b/dev/merge_spark_pr.py
@@ -65,7 +65,7 @@ GITHUB_OAUTH_KEY = os.environ.get("GITHUB_OAUTH_KEY")
 
 
 GITHUB_BASE = "https://github.com/apache/spark-kubernetes-operator/pull";
-GITHUB_API_BASE = 
"https://api.github.com/repos/spark-kubernetes-operator/spark";
+GITHUB_API_BASE = "https://api.github.com/repos/spark-kubernetes-operator";
 JIRA_BASE = "https://issues.apache.org/jira/browse";
 JIRA_API_BASE = "https://issues.apache.org/jira";
 # Prefix added to temporary branches


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark-kubernetes-operator) branch main updated (9ec59ac -> c2aae54)

2024-03-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git


from 9ec59ac  Add GitHub Action job
 add c2aae54  Add `.licenserc.yaml`

No new revisions were added by this update.

Summary of changes:
 .github/.licenserc.yaml | 18 ++
 1 file changed, 18 insertions(+)
 create mode 100644 .github/.licenserc.yaml


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark-kubernetes-operator) branch main updated: Add GitHub Action job

2024-03-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git


The following commit(s) were added to refs/heads/main by this push:
 new 9ec59ac  Add GitHub Action job
9ec59ac is described below

commit 9ec59ac797bacc62128a5e3ab680069a9790dfb4
Author: Dongjoon Hyun 
AuthorDate: Fri Mar 29 14:59:11 2024 -0700

Add GitHub Action job
---
 .github/workflows/build_and_test.yml | 29 +
 1 file changed, 29 insertions(+)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
new file mode 100644
index 000..6a5a147
--- /dev/null
+++ b/.github/workflows/build_and_test.yml
@@ -0,0 +1,29 @@
+name: Build and test
+
+on:
+  push:
+branches:
+- main
+  pull_request:
+branches:
+- main
+
+# Cancel previous PR build and test
+concurrency:
+  group: ${{ github.workflow }}-${{ github.event_name == 'pull_request' && 
github.event.number || github.sha }}
+  cancel-in-progress: true
+
+jobs:
+  license-check:
+name: "License Check"
+runs-on: ubuntu-latest
+steps:
+  - name: Checkout repository
+uses: actions/checkout@v3
+  - name: Check license header
+uses: apache/skywalking-eyes@main
+env:
+  GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+with:
+  config: .github/.licenserc.yaml
+


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark-kubernetes-operator) branch main updated: Add merge_spark_pr.py

2024-03-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git


The following commit(s) were added to refs/heads/main by this push:
 new c1e4879  Add merge_spark_pr.py
c1e4879 is described below

commit c1e48798a839bb0f77783cd68c88cab23c3276f3
Author: Dongjoon Hyun 
AuthorDate: Fri Mar 29 14:53:31 2024 -0700

Add merge_spark_pr.py
---
 dev/merge_spark_pr.py | 717 ++
 1 file changed, 717 insertions(+)

diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py
new file mode 100755
index 000..459f67b
--- /dev/null
+++ b/dev/merge_spark_pr.py
@@ -0,0 +1,717 @@
+#!/usr/bin/env python3
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Utility for creating well-formed pull request merges and pushing them to 
Apache
+# Spark.
+#   usage: ./merge_spark_pr.py(see config env vars below)
+#
+# This utility assumes you already have a local Spark git folder and that you
+# have added remotes corresponding to both (i) the github apache Spark
+# mirror and (ii) the apache git repo.
+
+import json
+import os
+import re
+import subprocess
+import sys
+import traceback
+from urllib.request import urlopen
+from urllib.request import Request
+from urllib.error import HTTPError
+
+try:
+import jira.client
+
+JIRA_IMPORTED = True
+except ImportError:
+JIRA_IMPORTED = False
+
+# Location of your Spark git development area
+SPARK_HOME = os.environ.get("SPARK_HOME", os.getcwd())
+# Remote name which points to the Github site
+PR_REMOTE_NAME = os.environ.get("PR_REMOTE_NAME", "apache-github")
+# Remote name which points to Apache git
+PUSH_REMOTE_NAME = os.environ.get("PUSH_REMOTE_NAME", "apache")
+# ASF JIRA username
+JIRA_USERNAME = os.environ.get("JIRA_USERNAME", "")
+# ASF JIRA password
+JIRA_PASSWORD = os.environ.get("JIRA_PASSWORD", "")
+# ASF JIRA access token
+# If it is configured, username and password are dismissed
+# Go to https://issues.apache.org/jira/secure/ViewProfile.jspa -> Personal 
Access Tokens for
+# your own token management.
+JIRA_ACCESS_TOKEN = os.environ.get("JIRA_ACCESS_TOKEN")
+# OAuth key used for issuing requests against the GitHub API. If this is not 
defined, then requests
+# will be unauthenticated. You should only need to configure this if you find 
yourself regularly
+# exceeding your IP's unauthenticated request rate limit. You can create an 
OAuth key at
+# https://github.com/settings/tokens. This script only requires the 
"public_repo" scope.
+GITHUB_OAUTH_KEY = os.environ.get("GITHUB_OAUTH_KEY")
+
+
+GITHUB_BASE = "https://github.com/apache/spark-kubernetes-operator/pull";
+GITHUB_API_BASE = 
"https://api.github.com/repos/spark-kubernetes-operator/spark";
+JIRA_BASE = "https://issues.apache.org/jira/browse";
+JIRA_API_BASE = "https://issues.apache.org/jira";
+# Prefix added to temporary branches
+BRANCH_PREFIX = "PR_TOOL"
+
+
+def print_error(msg):
+print("\033[91m%s\033[0m" % msg)
+
+
+def bold_input(prompt) -> str:
+return input("\033[1m%s\033[0m" % prompt)
+
+
+def get_json(url):
+try:
+request = Request(url)
+if GITHUB_OAUTH_KEY:
+request.add_header("Authorization", "token %s" % GITHUB_OAUTH_KEY)
+return json.load(urlopen(request))
+except HTTPError as e:
+if "X-RateLimit-Remaining" in e.headers and 
e.headers["X-RateLimit-Remaining"] == "0":
+print_error(
+"Exceeded the GitHub API rate limit; see the instructions in "
++ "dev/merge_spark_pr.py to configure an OAuth token for 
making authenticated "
++ "GitHub requests."
+)
+elif e.code == 401:
+print_error(
+"GITHUB_OAUTH_KEY is invalid or expired. Please regenerate a 
new one with "
++ "at lea

(spark-kubernetes-operator) branch main updated: Add check-license

2024-03-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git


The following commit(s) were added to refs/heads/main by this push:
 new 2667d00  Add check-license
2667d00 is described below

commit 2667d0073979c4e39cd2fd7ca6190a9420a405bc
Author: Dongjoon Hyun 
AuthorDate: Fri Mar 29 14:49:34 2024 -0700

Add check-license
---
 .gitignore|  8 ++
 dev/.rat-excludes | 14 +
 dev/check-license | 86 +++
 3 files changed, 108 insertions(+)

diff --git a/.gitignore b/.gitignore
new file mode 100644
index 000..78213f8
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,8 @@
+*.swp
+*~
+.java-version
+.DS_Store
+.idea/
+.vscode
+/lib/
+target/
diff --git a/dev/.rat-excludes b/dev/.rat-excludes
new file mode 100644
index 000..a24671b
--- /dev/null
+++ b/dev/.rat-excludes
@@ -0,0 +1,14 @@
+target
+.gitignore
+.gitattributes
+.project
+.classpath
+.rat-excludes
+.*md
+.java-version
+licenses/*
+licenses-binary/*
+LICENSE
+NOTICE
+TAGS
+RELEASE
diff --git a/dev/check-license b/dev/check-license
new file mode 100755
index 000..bc7f493
--- /dev/null
+++ b/dev/check-license
@@ -0,0 +1,86 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+
+acquire_rat_jar () {
+
+  
URL="${DEFAULT_ARTIFACT_REPOSITORY:-https://repo1.maven.org/maven2/}org/apache/rat/apache-rat/${RAT_VERSION}/apache-rat-${RAT_VERSION}.jar";
+
+  JAR="$rat_jar"
+
+  # Download rat launch jar if it hasn't been downloaded yet
+  if [ ! -f "$JAR" ]; then
+# Download
+printf "Attempting to fetch rat\n"
+JAR_DL="${JAR}.part"
+if [ $(command -v curl) ]; then
+  curl -L --silent "${URL}" > "$JAR_DL" && mv "$JAR_DL" "$JAR"
+elif [ $(command -v wget) ]; then
+  wget --quiet ${URL} -O "$JAR_DL" && mv "$JAR_DL" "$JAR"
+else
+  printf "You do not have curl or wget installed, please install rat 
manually.\n"
+  exit -1
+fi
+  fi
+
+  unzip -tq "$JAR" &> /dev/null
+  if [ $? -ne 0 ]; then 
+# We failed to download
+rm "$JAR"
+printf "Our attempt to download rat locally to ${JAR} failed. Please 
install rat manually.\n"
+exit -1
+  fi
+}
+
+# Go to the Spark project root directory
+FWDIR="$(cd "`dirname "$0"`"/..; pwd)"
+cd "$FWDIR"
+
+if test -x "$JAVA_HOME/bin/java"; then
+declare java_cmd="$JAVA_HOME/bin/java"
+else
+declare java_cmd=java
+fi
+
+export RAT_VERSION=0.15
+export rat_jar="$FWDIR"/lib/apache-rat-${RAT_VERSION}.jar
+mkdir -p "$FWDIR"/lib
+
+[[ -f "$rat_jar" ]] || acquire_rat_jar || {
+echo "Download failed. Obtain the rat jar manually and place it at 
$rat_jar"
+exit 1
+}
+
+mkdir -p target
+$java_cmd -jar "$rat_jar" -E "$FWDIR"/dev/.rat-excludes -d "$FWDIR" > 
target/rat-results.txt
+
+if [ $? -ne 0 ]; then
+   echo "RAT exited abnormally"
+   exit 1
+fi
+
+ERRORS="$(cat target/rat-results.txt | grep -e "??")"
+
+if test ! -z "$ERRORS"; then 
+echo "Could not find Apache license headers in the following files:"
+echo "$ERRORS"
+exit 1
+else 
+echo -e "RAT checks passed."
+fi


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark-kubernetes-operator) branch main updated: Add PULL_REQUEST_TEMPLATE

2024-03-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git


The following commit(s) were added to refs/heads/main by this push:
 new b96b949  Add PULL_REQUEST_TEMPLATE
b96b949 is described below

commit b96b949b4f81a8e7eadb146ca708b56a3542e347
Author: Dongjoon Hyun 
AuthorDate: Fri Mar 29 13:53:28 2024 -0700

Add PULL_REQUEST_TEMPLATE
---
 .github/PULL_REQUEST_TEMPLATE | 53 +++
 1 file changed, 53 insertions(+)

diff --git a/.github/PULL_REQUEST_TEMPLATE b/.github/PULL_REQUEST_TEMPLATE
new file mode 100644
index 000..885d307
--- /dev/null
+++ b/.github/PULL_REQUEST_TEMPLATE
@@ -0,0 +1,53 @@
+
+
+### What changes were proposed in this pull request?
+
+
+
+### Why are the changes needed?
+
+
+
+### Does this PR introduce _any_ user-facing change?
+
+
+
+### How was this patch tested?
+
+
+
+### Was this patch authored or co-authored using generative AI tooling?
+


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark-kubernetes-operator) branch main updated: Add .asf.yaml

2024-03-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git


The following commit(s) were added to refs/heads/main by this push:
 new b23d486  Add .asf.yaml
b23d486 is described below

commit b23d486ad438ac93350e2d59de76e2c62e3c1f74
Author: Dongjoon Hyun 
AuthorDate: Fri Mar 29 13:49:42 2024 -0700

Add .asf.yaml
---
 .asf.yaml | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/.asf.yaml b/.asf.yaml
new file mode 100644
index 000..c7e6ae7
--- /dev/null
+++ b/.asf.yaml
@@ -0,0 +1,34 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# https://cwiki.apache.org/confluence/display/INFRA/git+-+.asf.yaml+features
+---
+github:
+  description: "Apache Spark Kubernetes Operator"
+  homepage: https://spark.apache.org/
+  labels:
+- java
+- spark
+- kubernetes
+  enabled_merge_buttons:
+merge: false
+squash: true
+rebase: true
+
+notifications:
+  pullrequests: revi...@spark.apache.org
+  issues: revi...@spark.apache.org
+  commits: commits@spark.apache.org
+  jira_options: link label


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (12155e8e027c -> 48def2b4a0b5)

2024-03-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 12155e8e027c [SPARK-47647][SQL] Make MySQL data source able to read 
bit(n>1) as BinaryType like Postgres
 add 48def2b4a0b5 [SPARK-47648][SQL][TESTS] Use `checkError()` to check 
Exception in `[CSV|Json|Xml]Suite` and `[Csv|Json]FunctionsSuite`

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/CsvFunctionsSuite.scala   |  52 +++--
 .../org/apache/spark/sql/JsonFunctionsSuite.scala  |  17 +-
 .../sql/execution/datasources/csv/CSVSuite.scala   | 140 +-
 .../sql/execution/datasources/json/JsonSuite.scala | 215 +
 .../sql/execution/datasources/xml/XmlSuite.scala   |  99 ++
 5 files changed, 330 insertions(+), 193 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47647][SQL] Make MySQL data source able to read bit(n>1) as BinaryType like Postgres

2024-03-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 12155e8e027c [SPARK-47647][SQL] Make MySQL data source able to read 
bit(n>1) as BinaryType like Postgres
12155e8e027c is described below

commit 12155e8e027c868537fda3796b8661188c306cc2
Author: Kent Yao 
AuthorDate: Fri Mar 29 09:08:34 2024 -0700

[SPARK-47647][SQL] Make MySQL data source able to read bit(n>1) as 
BinaryType like Postgres

### What changes were proposed in this pull request?

Make MySQL data source able to read bit(n>1) as BinaryType like Postgres. 
It seemed an unfulfilled work from the original author
>>// This could instead be a BinaryType if we'd rather return bit-vectors 
of up to 64
  // byte arrays instead of longs.

A property spark.sql.legacy.mysql.bitArrayMapping.enabled is added to 
restore the old behavior.

### Why are the changes needed?

Make the behavior consistent among different JDBC data sources.

### Does this PR introduce _any_ user-facing change?

yes, type mapping changes

### How was this patch tested?

new test

### Was this patch authored or co-authored using generative AI tooling?

Closes #45773 from yaooqinn/SPARK-47647.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/jdbc/MySQLIntegrationSuite.scala | 53 --
 docs/sql-data-sources-jdbc.md  |  7 ++-
 docs/sql-migration-guide.md|  1 +
 .../org/apache/spark/sql/internal/SQLConf.scala| 13 +-
 .../sql/execution/datasources/jdbc/JdbcUtils.scala | 10 
 .../org/apache/spark/sql/jdbc/MySQLDialect.scala   | 16 ---
 .../org/apache/spark/sql/jdbc/JDBCSuite.scala  | 10 +++-
 7 files changed, 75 insertions(+), 35 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
index 1343f9af7e35..dd680e6bd4a8 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
@@ -117,32 +117,35 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
   }
 
   test("Numeric types") {
-val df = sqlContext.read.jdbc(jdbcUrl, "numbers", new Properties)
-val rows = df.collect()
-assert(rows.length == 1)
-val types = rows(0).toSeq.map(x => x.getClass.toString)
-assert(types.length == 10)
-assert(types(0).equals("class java.lang.Boolean"))
-assert(types(1).equals("class java.lang.Long"))
-assert(types(2).equals("class java.lang.Short"))
-assert(types(3).equals("class java.lang.Integer"))
-assert(types(4).equals("class java.lang.Integer"))
-assert(types(5).equals("class java.lang.Long"))
-assert(types(6).equals("class java.math.BigDecimal"))
-assert(types(7).equals("class java.lang.Float"))
-assert(types(8).equals("class java.lang.Double"))
-assert(types(9).equals("class java.lang.Byte"))
-assert(rows(0).getBoolean(0) == false)
-assert(rows(0).getLong(1) == 0x225)
-assert(rows(0).getShort(2) == 17)
-assert(rows(0).getInt(3) == 7)
-assert(rows(0).getInt(4) == 123456789)
-assert(rows(0).getLong(5) == 123456789012345L)
+val row = sqlContext.read.jdbc(jdbcUrl, "numbers", new Properties).head()
+assert(row.length === 10)
+assert(row(0).isInstanceOf[Boolean])
+assert(row(1).isInstanceOf[Array[Byte]])
+assert(row(2).isInstanceOf[Short])
+assert(row(3).isInstanceOf[Int])
+assert(row(4).isInstanceOf[Int])
+assert(row(5).isInstanceOf[Long])
+assert(row(6).isInstanceOf[BigDecimal])
+assert(row(7).isInstanceOf[Float])
+assert(row(8).isInstanceOf[Double])
+assert(row(9).isInstanceOf[Byte])
+assert(!row.getBoolean(0))
+assert(java.util.Arrays.equals(row.getAs[Array[Byte]](1),
+  Array[Byte](49, 48, 49, 48, 48, 49, 48, 49)))
+assert(row.getShort(2) == 17)
+assert(row.getInt(3) == 7)
+assert(row.getInt(4) == 123456789)
+assert(row.getLong(5) == 123456789012345L)
 val bd = new BigDecimal("123456789012345.1234567890123450")
-assert(rows(0).getAs[BigDecimal](6).equals(bd))
-assert(rows(0).getFloat(7) == 42.75)
-assert(rows(0).getDouble(8) == 1.0002)
-assert(rows(0).getByte(9) == 0x80.toByte)
+assert(row.getAs[BigDecimal](6).equals(bd))
+assert(row.getFloat(7) == 42.75)

(spark) branch master updated: [SPARK-47641][SQL] Improve the performance for `UnaryMinus` and `Abs`

2024-03-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5318846db1e3 [SPARK-47641][SQL] Improve the performance for 
`UnaryMinus` and `Abs`
5318846db1e3 is described below

commit 5318846db1e367b17bb04366aa57419867e6b538
Author: panbingkun 
AuthorDate: Fri Mar 29 09:05:19 2024 -0700

[SPARK-47641][SQL] Improve the performance for `UnaryMinus` and `Abs`

### What changes were proposed in this pull request?
The pr aims to improve the `performance` for `UnaryMinus` and `Abs`.

### Why are the changes needed?
We can further `improve the performance` of `UnaryMinus` and `Abs` by the 
following suggestions:
https://github.com/apache/spark/assets/15246973/456b142d-a15d-408e-8aad-91b53841fc16";>

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Manually test.
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45766 from panbingkun/improve_UnaryMinus.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 .../apache/spark/sql/catalyst/util/MathUtils.scala   | 14 ++
 .../spark/sql/catalyst/expressions/arithmetic.scala  | 20 
 .../scala/org/apache/spark/sql/types/numerics.scala  | 12 +++-
 3 files changed, 21 insertions(+), 25 deletions(-)

diff --git 
a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala 
b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala
index 99caef978bb4..96c3fb81aa66 100644
--- a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala
+++ b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala
@@ -61,6 +61,20 @@ object MathUtils {
 withOverflow(Math.multiplyExact(a, b), hint = "try_multiply", context)
   }
 
+  def negateExact(a: Byte): Byte = {
+if (a == Byte.MinValue) { // if and only if x is Byte.MinValue, overflow 
can happen
+  throw ExecutionErrors.arithmeticOverflowError("byte overflow")
+}
+(-a).toByte
+  }
+
+  def negateExact(a: Short): Short = {
+if (a == Short.MinValue) { // if and only if x is Short.MinValue, overflow 
can happen
+  throw ExecutionErrors.arithmeticOverflowError("short overflow")
+}
+(-a).toShort
+  }
+
   def negateExact(a: Int): Int = withOverflow(Math.negateExact(a))
 
   def negateExact(a: Long): Long = withOverflow(Math.negateExact(a))
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
index 4e54e7890e1a..9eecf81684ce 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
@@ -61,14 +61,9 @@ case class UnaryMinus(
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = 
dataType match {
 case _: DecimalType => defineCodeGen(ctx, ev, c => s"$c.unary_$$minus()")
 case ByteType | ShortType | IntegerType | LongType if failOnError =>
-  val typeUtils = TypeUtils.getClass.getCanonicalName.stripSuffix("$")
-  val refDataType = ctx.addReferenceObj("refDataType", dataType, 
dataType.getClass.getName)
+  val mathUtils = MathUtils.getClass.getCanonicalName.stripSuffix("$")
   nullSafeCodeGen(ctx, ev, eval => {
-val javaBoxedType = CodeGenerator.boxedType(dataType)
-s"""
-   |${ev.value} = ($javaBoxedType)$typeUtils.getNumeric(
-   |  $refDataType, $failOnError).negate($eval);
- """.stripMargin
+s"${ev.value} = $mathUtils.negateExact($eval);"
   })
 case dt: NumericType => nullSafeCodeGen(ctx, ev, eval => {
   val originValue = ctx.freshName("origin")
@@ -174,15 +169,8 @@ case class Abs(child: Expression, failOnError: Boolean = 
SQLConf.get.ansiEnabled
   defineCodeGen(ctx, ev, c => s"$c.abs()")
 
 case ByteType | ShortType | IntegerType | LongType if failOnError =>
-  val typeUtils = TypeUtils.getClass.getCanonicalName.stripSuffix("$")
-  val refDataType = ctx.addReferenceObj("refDataType", dataType, 
dataType.getClass.getName)
-  nullSafeCodeGen(ctx, ev, eval => {
-val javaBoxedType = CodeGenerator.boxedType(dataType)
-s"""
-   |${ev.value} = ($javaBoxedType)$typeUtils.getNumeric(
-   |  $refDataType, $failOnError).abs($eval);
- """.stripMargin
-

(spark) branch master updated (63529bf97ad0 -> 5d5611e1788e)

2024-03-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 63529bf97ad0 [SPARK-47543][CONNECT][PYTHON][TESTS][FOLLOW-UP] Skip the 
test if pandas and PyArrow are unavailable
 add 5d5611e1788e [SPARK-47642][BUILD] Exclude dependencies related to 
`org.junit.jupiter` and `org.junit.platform` from `jmock-junit5`

No new revisions were added by this update.

Summary of changes:
 pom.xml | 10 ++
 1 file changed, 10 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.5 updated: [SPARK-47636][K8S][3.5] Use Java `17` instead of `17-jre` image in K8s Dockerfile

2024-03-28 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new edae8edc3a45 [SPARK-47636][K8S][3.5] Use Java `17` instead of `17-jre` 
image in K8s Dockerfile
edae8edc3a45 is described below

commit edae8edc3a45a860a9402018cb44760266515154
Author: Dongjoon Hyun 
AuthorDate: Thu Mar 28 16:34:25 2024 -0700

[SPARK-47636][K8S][3.5] Use Java `17` instead of `17-jre` image in K8s 
Dockerfile

### What changes were proposed in this pull request?

This PR aims to use Java 21 instead of 21-jre in K8s Dockerfile .

### Why are the changes needed?

Since Apache Spark 3.5.0, SPARK-44153 starts to use `jmap` like the 
following.


https://github.com/apache/spark/blob/c832e2ac1d04668c77493577662c639785808657/core/src/main/scala/org/apache/spark/util/Utils.scala#L2030

```
$ docker run -it --rm eclipse-temurin:17-jre jmap
/__cacert_entrypoint.sh: line 30: exec: jmap: not found
```

```
$ docker run -it --rm eclipse-temurin:17 jmap | head -n2
Usage:
jmap -clstats 
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs and manual review.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45762 from dongjoon-hyun/SPARK-47636.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../kubernetes/docker/src/main/dockerfiles/spark/Dockerfile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile 
b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile
index 88304c87a79c..22d8f1550128 100644
--- a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile
+++ b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile
@@ -14,7 +14,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
-ARG java_image_tag=17-jre
+ARG java_image_tag=17
 
 FROM eclipse-temurin:${java_image_tag}
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (c832e2ac1d04 -> ca0001345d0b)

2024-03-28 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from c832e2ac1d04 [SPARK-47492][SQL] Widen whitespace rules in lexer
 add ca0001345d0b [SPARK-47635][K8S] Use Java `21` instead of `21-jre` 
image in K8s Dockerfile

No new revisions were added by this update.

Summary of changes:
 .../kubernetes/docker/src/main/dockerfiles/spark/Dockerfile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (1623b2d513d2 -> d8dc0c3e5e8a)

2024-03-28 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 1623b2d513d2 [SPARK-47630][BUILD] Upgrade `zstd-jni` to 1.5.6-1
 add d8dc0c3e5e8a [SPARK-47632][BUILD] Ban 
`com.amazonaws:aws-java-sdk-bundle` dependency

No new revisions were added by this update.

Summary of changes:
 pom.xml | 1 +
 1 file changed, 1 insertion(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (2e4f2b0d307f -> 1623b2d513d2)

2024-03-28 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 2e4f2b0d307f [SPARK-47475][CORE][K8S] Support 
`spark.kubernetes.jars.avoidDownloadSchemes` for K8s Cluster Mode
 add 1623b2d513d2 [SPARK-47630][BUILD] Upgrade `zstd-jni` to 1.5.6-1

No new revisions were added by this update.

Summary of changes:
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 6 +-
 2 files changed, 6 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47475][CORE][K8S] Support `spark.kubernetes.jars.avoidDownloadSchemes` for K8s Cluster Mode

2024-03-28 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2e4f2b0d307f [SPARK-47475][CORE][K8S] Support 
`spark.kubernetes.jars.avoidDownloadSchemes` for K8s Cluster Mode
2e4f2b0d307f is described below

commit 2e4f2b0d307fa00121de77f01826c190527ebf3d
Author: jiale_tan 
AuthorDate: Thu Mar 28 08:52:27 2024 -0700

[SPARK-47475][CORE][K8S] Support 
`spark.kubernetes.jars.avoidDownloadSchemes` for K8s Cluster Mode

### What changes were proposed in this pull request?

During spark submit, for K8s cluster mode driver, instead of always 
downloading jars and serving it to executors, avoid the download if the url 
matches `spark.kubernetes.jars.avoidDownloadSchemes` in the configuration.

### Why are the changes needed?

For K8s cluster mode driver, `SparkSubmit` will download all the jars in 
the `spark.jars` to driver and then those jars' urls in `spark.jars` will be 
replaced by the driver local paths. Later when driver starts the 
`SparkContext`, it will copy all the `spark.jars` to 
`spark.app.initial.jar.urls`, start a file server and replace the jars with 
driver local paths in `spark.app.initial.jar.urls` with file service urls. When 
the executors start, they will download those driver local jars b [...]
When jars are big and the spark application requests a lot of executors, 
the executors' massive concurrent download of the jars from the driver will 
cause network saturation. In this case, the executors jar download will 
timeout, causing executors to be terminated. From user point of view, the 
application is trapped in the loop of massive executor loss and re-provision, 
but never gets enough live executors as requested, leads to SLA breach or 
sometimes failure.
So instead of letting driver to download the jars and then serve them to 
executors, if we just avoid driver from downloading the jars and keeping the 
urls in `spark.jars` as they were, the executor will try to directly download 
the jars from the urls provided by user. This will avoid the driver download 
bottleneck mentioned above, especially when jar urls are with scalable storage 
schemes, like s3 or hdfs.
Meanwhile, there are cases jar urls are with schemes of less scalable than 
driver file server, e.g. http, ftp, etc, or when the jars are small, or 
executor count is small - user may still want to fall back to current solution 
and use driver file server to serve the jars.
So in this case, make the driver jars downloading and serving optional by 
scheme (similar idea to `FORCE_DOWNLOAD_SCHEMES` in YARN) is a good approach 
for the solution.

### Does this PR introduce _any_ user-facing change?

A configuration `spark.kubernetes.jars.avoidDownloadSchemes` is added

### How was this patch tested?

- Unit tests added
- Tested with an application running on AWS EKS submitted with a 1GB jar on 
s3.
  - Before the fix, the application could not scale to 1k live executors.
  - After the fix, the application had no problem to scale beyond 12k live 
executors.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45715 from leletan/allow_k8s_executor_to_download_remote_jar.

Authored-by: jiale_tan 
Signed-off-by: Dongjoon Hyun 
---
 .../org/apache/spark/deploy/SparkSubmit.scala  | 28 ++-
 .../org/apache/spark/internal/config/package.scala | 12 +++
 .../org/apache/spark/deploy/SparkSubmitSuite.scala | 42 ++
 docs/running-on-kubernetes.md  | 12 +++
 4 files changed, 86 insertions(+), 8 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala 
b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
index c8cbedd9ea36..c60fbe537cbd 100644
--- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
@@ -401,16 +401,23 @@ private[spark] class SparkSubmit extends Logging {
 // SPARK-33782 : This downloads all the files , jars , archiveFiles 
and pyfiles to current
 // working directory
 // SPARK-43540: add current working directory into driver classpath
+// SPARK-47475: make download to driver optional so executors may 
fetch resource from remote
+// url directly to avoid overwhelming driver network when resource is 
big and executor count
+// is high
 val workingDirectory = "."
 childClasspath += workingDirectory
-def downloadResourcesToCurrentDirectory(uris: String, isArchive: 
Boolean = false):
-String = {
+def downloadResourcesToCurrentDirectory(
+uris: String,
+isArch

(spark) branch master updated: [MINOR][CORE] Replace `get+getOrElse` with `getOrElse` with default value in `StreamingQueryException`

2024-03-28 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 32e73dd55416 [MINOR][CORE] Replace `get+getOrElse` with `getOrElse` 
with default value in `StreamingQueryException`
32e73dd55416 is described below

commit 32e73dd55416b3a0a81ea6b6635e6fedde378842
Author: yangjie01 
AuthorDate: Thu Mar 28 07:37:01 2024 -0700

[MINOR][CORE] Replace `get+getOrElse` with `getOrElse` with default value 
in `StreamingQueryException`

### What changes were proposed in this pull request?
This PR replaces `get + getOrElse` with `getOrElse` with a default value in 
`StreamingQueryException`.

### Why are the changes needed?
Simplify code

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #45753 from LuciferYang/minor-getOrElse.

Authored-by: yangjie01 
Signed-off-by: Dongjoon Hyun 
---
 .../org/apache/spark/sql/streaming/StreamingQueryException.scala| 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git 
a/common/utils/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryException.scala
 
b/common/utils/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryException.scala
index 800a6dcfda8d..259f4330224c 100644
--- 
a/common/utils/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryException.scala
+++ 
b/common/utils/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryException.scala
@@ -48,11 +48,11 @@ class StreamingQueryException private[sql](
   errorClass: String,
   messageParameters: Map[String, String]) = {
 this(
-  messageParameters.get("queryDebugString").getOrElse(""),
+  messageParameters.getOrElse("queryDebugString", ""),
   message,
   cause,
-  messageParameters.get("startOffset").getOrElse(""),
-  messageParameters.get("endOffset").getOrElse(""),
+  messageParameters.getOrElse("startOffset", ""),
+  messageParameters.getOrElse("endOffset", ""),
   errorClass,
   messageParameters)
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (8c4d6764674f -> 4b58a631fea9)

2024-03-28 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 8c4d6764674f [SPARK-47559][SQL] Codegen Support for variant 
`parse_json`
 add 4b58a631fea9 [SPARK-47628][SQL] Fix Postgres bit array issue 'Cannot 
cast to boolean'

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/jdbc/PostgresIntegrationSuite.scala | 10 ++
 .../sql/execution/datasources/jdbc/JdbcUtils.scala| 16 
 .../org/apache/spark/sql/jdbc/PostgresDialect.scala   | 19 ++-
 3 files changed, 36 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47616][SQL] Add User Document for Mapping Spark SQL Data Types from MySQL

2024-03-27 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8d1539f7bb23 [SPARK-47616][SQL] Add User Document for Mapping Spark 
SQL Data Types from MySQL
8d1539f7bb23 is described below

commit 8d1539f7bb2319b746433067ab876754384246c2
Author: Kent Yao 
AuthorDate: Wed Mar 27 07:40:05 2024 -0700

[SPARK-47616][SQL] Add User Document for Mapping Spark SQL Data Types from 
MySQL

### What changes were proposed in this pull request?

This PR added a User Document for Mapping Spark SQL Data Types from MySQL. 
The write side document is not included yet which might need further 
verification.

### Why are the changes needed?

Now, the conversion of data types from MySQL to Spark SQL is solid. End 
users can refer to it.

It's also time for maintainers to have an overall perspective to review the 
changes in the coming 4.0.0

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?

add some test for missing MySQL data types


![image](https://github.com/apache/spark/assets/8326978/da8eb241-1d05-4a01-b038-d7d329193674)

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #45736 from yaooqinn/SPARK-47616.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/jdbc/MySQLIntegrationSuite.scala |  16 ++
 docs/sql-data-sources-jdbc.md  | 246 +
 2 files changed, 262 insertions(+)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
index 10049169caa1..1343f9af7e35 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
@@ -90,6 +90,10 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 conn.prepareStatement("CREATE TABLE collections (" +
 "a SET('cap', 'hat', 'helmet'), b ENUM('S', 'M', 'L', 
'XL'))").executeUpdate()
 conn.prepareStatement("INSERT INTO collections VALUES ('cap,hat', 
'M')").executeUpdate()
+
+conn.prepareStatement("CREATE TABLE TBL_GEOMETRY (col0 
GEOMETRY)").executeUpdate()
+conn.prepareStatement("INSERT INTO TBL_GEOMETRY VALUES 
(ST_GeomFromText('POINT(0 0)'))")
+  .executeUpdate()
   }
 
   def testConnection(): Unit = {
@@ -191,6 +195,12 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
   assert(rows(0).getAs[Timestamp](3).equals(Timestamp.valueOf("2009-02-13 
23:31:30")))
   assert(rows(0).getAs[Date](4).equals(Date.valueOf("2001-01-01")))
 }
+val df = spark.read.format("jdbc")
+  .option("url", jdbcUrl)
+  .option("query", "select yr from dates")
+  .option("yearIsDateType", false)
+  .load()
+checkAnswer(df, Row(2001))
   }
 
   test("SPARK-47406: MySQL datetime types with preferTimestampNTZ") {
@@ -318,6 +328,12 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 Row("cap,hat", "M") :: Row("cap,hat", "M") :: Nil)
 }
   }
+
+  test("SPARK-47616: Read GEOMETRY from MySQL") {
+val df = spark.read.jdbc(jdbcUrl, "TBL_GEOMETRY", new Properties)
+checkAnswer(df,
+  Row(Array[Byte](0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0)))
+  }
 }
 
 
diff --git a/docs/sql-data-sources-jdbc.md b/docs/sql-data-sources-jdbc.md
index bc0573a37219..17a61078cccf 100644
--- a/docs/sql-data-sources-jdbc.md
+++ b/docs/sql-data-sources-jdbc.md
@@ -427,3 +427,249 @@ SELECT * FROM resultTable
 
 
 
+
+## Data Type Mapping
+
+### Mapping Spark SQL Data Types from MySQL
+
+The below table describe the data type conversions from MySQL data types to 
Spark SQL Data Types,
+when reading data from a MySQL table using the built-in jdbc data source with 
the MySQL Connector/J
+as the activated JDBC Driver. Note that, different JDBC drivers, such as Maria 
Connector/J, which
+are also available to connect MySQL, may have different mapping rules.
+
+
+  
+
+  MySQL Data Type
+  Spark SQL Data Type
+  Remarks
+
+  
+  
+
+  BIT(1)
+  BooleanType
+  
+
+
+  BIT( >1 )
+  LongType
+  
+
+
+  TINYINT(1)
+  BooleanType
+  
+
+
+

(spark) branch master updated: [SPARK-47611][SQL] Cleanup dead code in MySQLDialect.getCatalystType

2024-03-27 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b540cc538614 [SPARK-47611][SQL] Cleanup dead code in 
MySQLDialect.getCatalystType
b540cc538614 is described below

commit b540cc538614c9808dc5e83a339ff52917fa0f37
Author: Kent Yao 
AuthorDate: Wed Mar 27 01:45:22 2024 -0700

[SPARK-47611][SQL] Cleanup dead code in MySQLDialect.getCatalystType

### What changes were proposed in this pull request?

This PR removes an unnecessary case-match branch for Types.BIT in 
MySQLDialect.getCatalystType, this is a special case for Maria Connector/J and 
can be handled in defaults as we have matched and handled Types.BIT&size > 1 
before this.

Additionally, we add some new tests for this corner case and other 
MySQL/Maria quirks

### Why are the changes needed?

code refactoring and test improvement

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

new tests
### Was this patch authored or co-authored using generative AI tooling?

no

Closes #45734 from yaooqinn/SPARK-47611.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/jdbc/MySQLIntegrationSuite.scala | 32 --
 .../org/apache/spark/sql/jdbc/MySQLDialect.scala   |  2 --
 .../org/apache/spark/sql/jdbc/JDBCSuite.scala  |  2 --
 3 files changed, 30 insertions(+), 6 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
index 705957631601..10049169caa1 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
@@ -64,10 +64,11 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 conn.prepareStatement("CREATE TABLE unsigned_numbers (" +
   "tiny TINYINT UNSIGNED, small SMALLINT UNSIGNED, med MEDIUMINT 
UNSIGNED," +
   "nor INT UNSIGNED, big BIGINT UNSIGNED, deci DECIMAL(40,20) UNSIGNED," +
-  "dbl DOUBLE UNSIGNED)").executeUpdate()
+  "dbl DOUBLE UNSIGNED, tiny1u TINYINT(1) UNSIGNED)").executeUpdate()
 
 conn.prepareStatement("INSERT INTO unsigned_numbers VALUES (255, 65535, 
16777215, 4294967295," +
-  "9223372036854775808, 123456789012345.123456789012345, 
1.0002)").executeUpdate()
+  "9223372036854775808, 123456789012345.123456789012345, 
1.0002, 0)")
+  .executeUpdate()
 
 conn.prepareStatement("CREATE TABLE dates (d DATE, t TIME, dt DATETIME, ts 
TIMESTAMP, "
   + "yr YEAR)").executeUpdate()
@@ -150,6 +151,13 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 assert(rows.get(4).isInstanceOf[BigDecimal])
 assert(rows.get(5).isInstanceOf[BigDecimal])
 assert(rows.get(6).isInstanceOf[Double])
+// Unlike MySQL, MariaDB seems not to distinguish signed and unsigned 
tinyint(1).
+val isMaria = jdbcUrl.indexOf("disableMariaDbDriver") == -1
+if (isMaria) {
+  assert(rows.get(7).isInstanceOf[Boolean])
+} else {
+  assert(rows.get(7).isInstanceOf[Short])
+}
 assert(rows.getShort(0) === 255)
 assert(rows.getInt(1) === 65535)
 assert(rows.getInt(2) === 16777215)
@@ -157,6 +165,11 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 assert(rows.getAs[BigDecimal](4).equals(new 
BigDecimal("9223372036854775808")))
 assert(rows.getAs[BigDecimal](5).equals(new 
BigDecimal("123456789012345.1234567890123450")))
 assert(rows.getDouble(6) === 1.0002)
+if (isMaria) {
+  assert(rows.getBoolean(7) === false)
+} else {
+  assert(rows.getShort(7) === 0)
+}
   }
 
   test("Date types") {
@@ -260,6 +273,21 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
   test("SPARK-47478: all boolean synonyms read-write roundtrip") {
 val df = sqlContext.read.jdbc(jdbcUrl, "bools", new Properties)
 checkAnswer(df, Row(true, true, true))
+
+val properties0 = new Properties()
+properties0.setProperty("transformedBitIsBoolean", "false")
+properties0.setProperty("tinyInt1isBit", "true")
+
+checkAnswer(spark.read.jdbc(jdbcUrl, "bools", properties0), Row(true, 
true, true))
+val properties1 = new Properties()
+properties1.setProperty("transformedBitIsBoolean", "true")

(spark) branch master updated (f9eb3f3c13bf -> a600c0ea3159)

2024-03-27 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from f9eb3f3c13bf [SPARK-46575][SQL][FOLLOWUP] Add back 
`HiveThriftServer2.startWithContext(SQLContext)` method for compatibility
 add a600c0ea3159 [SPARK-47491][CORE] Add `slf4j-api` jar to the class path 
first before the others of `jars` directory

No new revisions were added by this update.

Summary of changes:
 .../main/java/org/apache/spark/launcher/AbstractCommandBuilder.java | 6 ++
 .../test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala  | 3 +--
 2 files changed, 7 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (fd4b8e89f3a0 -> 7d87a94dd77f)

2024-03-26 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from fd4b8e89f3a0 [SPARK-47555][SQL] Show a warning message about 
SQLException if `JDBCTableCatalog.loadTable` fails
 add 7d87a94dd77f [MINOR][CORE] When failed to canceling the job group, add 
a warning log

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala | 3 +++
 1 file changed, 3 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (e00eace41a63 -> fd4b8e89f3a0)

2024-03-26 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from e00eace41a63 [SPARK-47561][SQL] Fix analyzer rule order issues about 
Alias
 add fd4b8e89f3a0 [SPARK-47555][SQL] Show a warning message about 
SQLException if `JDBCTableCatalog.loadTable` fails

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala| 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47561][SQL] Fix analyzer rule order issues about Alias

2024-03-26 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e00eace41a63 [SPARK-47561][SQL] Fix analyzer rule order issues about 
Alias
e00eace41a63 is described below

commit e00eace41a63996deb213b6e1816257ebca281e5
Author: Wenchen Fan 
AuthorDate: Tue Mar 26 07:45:54 2024 -0700

[SPARK-47561][SQL] Fix analyzer rule order issues about Alias

### What changes were proposed in this pull request?

We found two analyzer rule execution order issues in our internal workloads:
- `CreateStruct.apply` creates `NamePlaceholder` for unresolved 
`NamedExpression`. However, with certain rule execution order, the 
`NamedExpression` may be removed (e.g. remove unnecessary `Alias`) before 
`NamePlaceholder` is resolved, then `NamePlaceholder` can't be resolved anymore.
- UNPIVOT uses `UnresolvedAlias` to wrap `UnresolvedAttribute`. There is a 
conflict about how to determine the final alias name. If `ResolveAliases` runs 
first, then `UnresolvedAlias` will be removed and eventually the alias will be 
`b` for nested column `a.b`. If `ResolveReferences` runs first, then we resolve 
`a.b` first and then `UnresolvedAlias` will determine the alias as `a.b` not 
`b`.

This PR fixes the two issues
- `CreateStruct.apply` should determine the field name immediately if the 
input is `Alias`
- The parser rule for UNPIVOT should follow how we parse SELECT and return 
`UnresolvedAttribute` directly without the `UnresolvedAlias` wrapper. It's a 
bit risky to fix the order issue between `ResolveAliases` and 
`ResolveReferences` as it can change the final query schema, we will save it 
for later.

### Why are the changes needed?

fix unstable analyzer behavior with different rule execution orders.

### Does this PR introduce _any_ user-facing change?

Yes, some failed queries can run now. The issue for UNPIVOT only affects 
the error message.

### How was this patch tested?

verified by our internal workloads. The repro query is quite complicated to 
trigger a certain rule execution order so we won't add tests for it. The fix is 
quite obvious.

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #45718 from cloud-fan/rule.

Authored-by: Wenchen Fan 
Signed-off-by: Dongjoon Hyun 
---
 .../catalyst/expressions/complexTypeCreator.scala  |  1 +
 .../spark/sql/catalyst/parser/AstBuilder.scala |  2 +-
 .../sql/catalyst/parser/UnpivotParserSuite.scala   | 39 ++
 3 files changed, 20 insertions(+), 22 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
index 332a49f78ab9..993684f2c1ed 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
@@ -374,6 +374,7 @@ object CreateStruct {
   // alias name inside CreateNamedStruct.
   case (u: UnresolvedAttribute, _) => Seq(Literal(u.nameParts.last), u)
   case (u @ UnresolvedExtractValue(_, e: Literal), _) if e.dataType == 
StringType => Seq(e, u)
+  case (a: Alias, _) => Seq(Literal(a.name), a)
   case (e: NamedExpression, _) if e.resolved => Seq(Literal(e.name), e)
   case (e: NamedExpression, _) => Seq(NamePlaceholder, e)
   case (g @ GetStructField(_, _, Some(name)), _) => Seq(Literal(name), g)
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
index 131eaa3d..170dcc37f0a5 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
@@ -1346,7 +1346,7 @@ class AstBuilder extends DataTypeAstBuilder with 
SQLConfHelper with Logging {
* Create an Unpivot column.
*/
   override def visitUnpivotColumn(ctx: UnpivotColumnContext): NamedExpression 
= withOrigin(ctx) {
-
UnresolvedAlias(UnresolvedAttribute(visitMultipartIdentifier(ctx.multipartIdentifier)))
+UnresolvedAttribute(visitMultipartIdentifier(ctx.multipartIdentifier))
   }
 
   /**
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/UnpivotParserSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/UnpivotParserSuite.scala
index c680e08c1c83..3012ef6f1544 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/UnpivotParserSuite.scala
+++ 
b/sql/catalys

(spark) branch master updated: [SPARK-47544][PYTHON] SparkSession builder method is incompatible with visual studio code intellisense

2024-03-26 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8e20f8e3b440 [SPARK-47544][PYTHON] SparkSession builder method is 
incompatible with visual studio code intellisense
8e20f8e3b440 is described below

commit 8e20f8e3b4404b6d72ec47c546c94a040467c774
Author: Niranjan Jayakar 
AuthorDate: Tue Mar 26 07:43:10 2024 -0700

[SPARK-47544][PYTHON] SparkSession builder method is incompatible with 
visual studio code intellisense

### What changes were proposed in this pull request?

VS Code's intellisense is unable to detect the methods and properties of
`SparkSession.builder`. A video is worth a thousand words:

[video](https://github.com/apache/spark/assets/16217941/e611e7e7-8760-4d9f-aa6c-9d4bd519d516).

Adjust the implementation for better compatibility with the IDE.

### Why are the changes needed?

Compatibility with IDE tooling.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Built the wheel file locally and tested on local IDE.
See 
[video](https://github.com/apache/spark/assets/16217941/429b06dd-44a7-4d13-a551-c2b72c326c1e).

Confirmed the same works for Pycharm.

Further confirmed that the Pydocs for these methods are unaffected.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45700 from nija-at/vscode-intellisense.

Authored-by: Niranjan Jayakar 
Signed-off-by: Dongjoon Hyun 
---
 python/pyspark/sql/connect/session.py |  7 +++
 python/pyspark/sql/session.py | 29 ++---
 2 files changed, 21 insertions(+), 15 deletions(-)

diff --git a/python/pyspark/sql/connect/session.py 
b/python/pyspark/sql/connect/session.py
index f339fada0d11..2c08349a3300 100644
--- a/python/pyspark/sql/connect/session.py
+++ b/python/pyspark/sql/connect/session.py
@@ -236,10 +236,9 @@ class SparkSession:
 
 _client: SparkConnectClient
 
-@classproperty
-def builder(cls) -> Builder:
-return cls.Builder()
-
+# SPARK-47544: Explicitly declaring this as an identifier instead of a 
method.
+# If changing, make sure this bug is not reintroduced.
+builder: Builder = classproperty(lambda cls: cls.Builder())  # type: ignore
 builder.__doc__ = PySparkSession.builder.__doc__
 
 def __init__(self, connection: Union[str, DefaultChannelBuilder], userId: 
Optional[str] = None):
diff --git a/python/pyspark/sql/session.py b/python/pyspark/sql/session.py
index 6c80b7f42da4..4a8a653fd466 100644
--- a/python/pyspark/sql/session.py
+++ b/python/pyspark/sql/session.py
@@ -499,12 +499,18 @@ class SparkSession(SparkConversionMixin):
 
 os.environ["SPARK_CONNECT_MODE_ENABLED"] = "1"
 opts["spark.remote"] = url
-return 
RemoteSparkSession.builder.config(map=opts).getOrCreate()
+return cast(
+SparkSession,
+
RemoteSparkSession.builder.config(map=opts).getOrCreate(),
+)
 elif "SPARK_LOCAL_REMOTE" in os.environ:
 url = "sc://localhost"
 os.environ["SPARK_CONNECT_MODE_ENABLED"] = "1"
 opts["spark.remote"] = url
-return 
RemoteSparkSession.builder.config(map=opts).getOrCreate()
+return cast(
+SparkSession,
+
RemoteSparkSession.builder.config(map=opts).getOrCreate(),
+)
 else:
 raise PySparkRuntimeError(
 error_class="SESSION_ALREADY_EXIST",
@@ -560,14 +566,14 @@ class SparkSession(SparkConversionMixin):
 # used in conjunction with Spark Connect mode.
 os.environ["SPARK_CONNECT_MODE_ENABLED"] = "1"
 opts["spark.remote"] = url
-return RemoteSparkSession.builder.config(map=opts).create()
+return cast(SparkSession, 
RemoteSparkSession.builder.config(map=opts).create())
 else:
 raise PySparkRuntimeError(
 error_class="ONLY_SUPPORTED_WITH_SPARK_CONNECT",
 message_parameters={"feature": 
"SparkSession.builder.create"},
 )
 
-# TODO(SPARK-38912): Replace @classproperty with @classmethod + @property 
once support for
+# TOD

(spark) branch master updated: [SPARK-47557][SQL][TEST] Audit MySQL ENUM/SET Types

2024-03-26 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 89104b93d324 [SPARK-47557][SQL][TEST] Audit MySQL ENUM/SET Types
89104b93d324 is described below

commit 89104b93d324129ebe4dec3c666fe5e36a7586ad
Author: Kent Yao 
AuthorDate: Tue Mar 26 07:37:39 2024 -0700

[SPARK-47557][SQL][TEST] Audit MySQL ENUM/SET Types

### What changes were proposed in this pull request?

This PR adds tests for MySQL ENUM/SET Types

In MySQL/Maria Connector/J, the JDBC ResultSetMetadata API maps ENUM/SET 
types to `typeId:java.sql.Types.CHAR,typeName:'CHAR'`, which makes it 
impossible to distinguish them from a normal `CHAR(n)` type.

When working with ENUM/SET, it's possible to encounter char padding issues. 
However, this can be resolved by setting the LEGACY_CHAR_VARCHAR_AS_STRING 
parameter to true.

### Why are the changes needed?

API auditing for MYSQL jdbc data source

### Does this PR introduce _any_ user-facing change?

no, test only

### How was this patch tested?

added tests

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #45713 from yaooqinn/SPARK-47557.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 .../org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala | 15 +++
 1 file changed, 15 insertions(+)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
index 09eb99c25227..705957631601 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
@@ -26,6 +26,7 @@ import scala.util.Using
 
 import org.apache.spark.sql.Row
 import org.apache.spark.sql.catalyst.util.DateTimeTestUtils._
+import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.tags.DockerTest
 
 /**
@@ -84,6 +85,10 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
   "f4 FLOAT UNSIGNED, f5 FLOAT(10) UNSIGNED, f6 FLOAT(53) 
UNSIGNED)").executeUpdate()
 conn.prepareStatement("INSERT INTO floats VALUES (1.23, 4.56, 7.89, 1.23, 
4.56, 7.89)")
   .executeUpdate()
+
+conn.prepareStatement("CREATE TABLE collections (" +
+"a SET('cap', 'hat', 'helmet'), b ENUM('S', 'M', 'L', 
'XL'))").executeUpdate()
+conn.prepareStatement("INSERT INTO collections VALUES ('cap,hat', 
'M')").executeUpdate()
   }
 
   def testConnection(): Unit = {
@@ -275,6 +280,16 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 val df = spark.read.jdbc(jdbcUrl, "floats", new Properties)
 checkAnswer(df, Row(1.23f, 4.56f, 7.89d, 1.23d, 4.56d, 7.89d))
   }
+
+  test("SPARK-47557: MySQL ENUM/SET types contains only java.sq.Types.CHAR 
information") {
+val df = spark.read.jdbc(jdbcUrl, "collections", new Properties)
+checkAnswer(df, Row("cap,hat   ", "M "))
+df.write.mode("append").jdbc(jdbcUrl, "collections", new Properties)
+withSQLConf(SQLConf.LEGACY_CHAR_VARCHAR_AS_STRING.key -> "true") {
+  checkAnswer(spark.read.jdbc(jdbcUrl, "collections", new Properties),
+Row("cap,hat", "M") :: Row("cap,hat", "M") :: Nil)
+}
+  }
 }
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47367][PYTHON][CONNECT][TESTS][FOLLOW-UP] Recover the test case for the number of partitions

2024-03-26 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ded8cdf8d945 [SPARK-47367][PYTHON][CONNECT][TESTS][FOLLOW-UP] Recover 
the test case for the number of partitions
ded8cdf8d945 is described below

commit ded8cdf8d9459e0e5b73c01c8ee41ae54ccd7ac5
Author: Hyukjin Kwon 
AuthorDate: Tue Mar 26 07:35:49 2024 -0700

[SPARK-47367][PYTHON][CONNECT][TESTS][FOLLOW-UP] Recover the test case for 
the number of partitions

### What changes were proposed in this pull request?

This PR is a followup of https://github.com/apache/spark/pull/45486 that 
addresses https://github.com/apache/spark/pull/45486#discussion_r1538753052 
review comment to recover the test coverage related to the number of partitions 
in Python Data Source.

### Why are the changes needed?

To restore the test coverage.

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

Unittest fixed, CI in this PR should verify it.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45720 from HyukjinKwon/SPARK-47367-folliwup.

Authored-by: Hyukjin Kwon 
Signed-off-by: Dongjoon Hyun 
---
 python/pyspark/sql/tests/test_python_datasource.py | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/python/pyspark/sql/tests/test_python_datasource.py 
b/python/pyspark/sql/tests/test_python_datasource.py
index f69e1dee1285..d028a210b007 100644
--- a/python/pyspark/sql/tests/test_python_datasource.py
+++ b/python/pyspark/sql/tests/test_python_datasource.py
@@ -28,6 +28,7 @@ from pyspark.sql.datasource import (
 WriterCommitMessage,
 CaseInsensitiveDict,
 )
+from pyspark.sql.functions import spark_partition_id
 from pyspark.sql.types import Row, StructType
 from pyspark.testing.sqlutils import (
 have_pyarrow,
@@ -236,10 +237,12 @@ class BasePythonDataSourceTestsMixin:
 
 self.spark.dataSource.register(InMemoryDataSource)
 df = self.spark.read.format("memory").load()
+self.assertEqual(df.select(spark_partition_id()).distinct().count(), 3)
 assertDataFrameEqual(df, [Row(x=0, y="0"), Row(x=1, y="1"), Row(x=2, 
y="2")])
 
 df = self.spark.read.format("memory").option("num_partitions", 
2).load()
 assertDataFrameEqual(df, [Row(x=0, y="0"), Row(x=1, y="1")])
+self.assertEqual(df.select(spark_partition_id()).distinct().count(), 2)
 
 def _get_test_json_data_source(self):
 import json


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

< 2 3 4 5 6 7 8 9 10 11 >

601 - 700 of 4545 matches

Mail list logo