(spark) branch master updated: [SPARK-47054][PYTHON][TESTS] Remove pinned version of torch for Python 3.12 support

2024-02-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 86f15e4a779e [SPARK-47054][PYTHON][TESTS] Remove pinned version of 
torch for Python 3.12 support
86f15e4a779e is described below

commit 86f15e4a779ec746373c78c189830cb339b07492
Author: Hyukjin Kwon 
AuthorDate: Thu Feb 15 22:28:37 2024 -0800

[SPARK-47054][PYTHON][TESTS] Remove pinned version of torch for Python 3.12 
support

### What changes were proposed in this pull request?

This PR unpins the version for torch in our CI.

This PR is dependent on https://github.com/apache/spark/pull/45115

### Why are the changes needed?

Testing latest version. This also blocks SPARK-46078.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Manually tested via `./dev/lint-python`.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45113 from HyukjinKwon/SPARK-47054.

Authored-by: Hyukjin Kwon 
Signed-off-by: Dongjoon Hyun 
---
 dev/infra/Dockerfile | 6 +++---
 dev/requirements.txt | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile
index 53810756d30c..fa663bc6e419 100644
--- a/dev/infra/Dockerfile
+++ b/dev/infra/Dockerfile
@@ -100,7 +100,7 @@ ARG CONNECT_PIP_PKGS="grpcio==1.59.3 grpcio-status==1.59.3 
protobuf==4.25.1 goog
 
 # Add torch as a testing dependency for TorchDistributor and 
DeepspeedTorchDistributor
 RUN python3.9 -m pip install $BASIC_PIP_PKGS unittest-xml-reporting 
$CONNECT_PIP_PKGS && \
-python3.9 -m pip install 'torch<=2.0.1' torchvision --index-url 
https://download.pytorch.org/whl/cpu && \
+python3.9 -m pip install torch torchvision --index-url 
https://download.pytorch.org/whl/cpu && \
 python3.9 -m pip install deepspeed torcheval && \
 python3.9 -m pip cache purge
 
@@ -111,7 +111,7 @@ RUN apt-get update && apt-get install -y \
 && rm -rf /var/lib/apt/lists/*
 RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10
 RUN python3.10 -m pip install $BASIC_PIP_PKGS unittest-xml-reporting 
$CONNECT_PIP_PKGS && \
-python3.10 -m pip install 'torch<=2.0.1' torchvision --index-url 
https://download.pytorch.org/whl/cpu && \
+python3.10 -m pip install torch torchvision --index-url 
https://download.pytorch.org/whl/cpu && \
 python3.10 -m pip install deepspeed torcheval && \
 python3.10 -m pip cache purge
 
@@ -122,7 +122,7 @@ RUN apt-get update && apt-get install -y \
 && rm -rf /var/lib/apt/lists/*
 RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.11
 RUN python3.11 -m pip install $BASIC_PIP_PKGS unittest-xml-reporting 
$CONNECT_PIP_PKGS && \
-python3.11 -m pip install 'torch<=2.0.1' torchvision --index-url 
https://download.pytorch.org/whl/cpu && \
+python3.11 -m pip install torch torchvision --index-url 
https://download.pytorch.org/whl/cpu && \
 python3.11 -m pip install deepspeed torcheval && \
 python3.11 -m pip cache purge
 
diff --git a/dev/requirements.txt b/dev/requirements.txt
index 46a02450d375..6fcd04b6d44a 100644
--- a/dev/requirements.txt
+++ b/dev/requirements.txt
@@ -61,7 +61,7 @@ googleapis-common-protos-stubs==2.2.0
 grpc-stubs==1.24.11
 
 # TorchDistributor dependencies
-torch<=2.0.1
+torch
 torchvision
 torcheval
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.4 updated: [SPARK-47068][PYTHON][TESTS] Recover -1 and 0 case for spark.sql.execution.arrow.maxRecordsPerBatch

2024-02-15 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new b4e28df675fa [SPARK-47068][PYTHON][TESTS] Recover -1 and 0 case for 
spark.sql.execution.arrow.maxRecordsPerBatch
b4e28df675fa is described below

commit b4e28df675fa3c55487df6d3fb6f8a068f38748b
Author: Hyukjin Kwon 
AuthorDate: Fri Feb 16 12:41:19 2024 +0900

[SPARK-47068][PYTHON][TESTS] Recover -1 and 0 case for 
spark.sql.execution.arrow.maxRecordsPerBatch

This PR fixes the regression introduced by 
https://github.com/apache/spark/pull/36683.

```python
import pandas as pd
spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")
spark.conf.set("spark.sql.execution.arrow.maxRecordsPerBatch", 0)
spark.conf.set("spark.sql.execution.arrow.pyspark.fallback.enabled", False)
spark.createDataFrame(pd.DataFrame({'a': [123]})).toPandas()

spark.conf.set("spark.sql.execution.arrow.maxRecordsPerBatch", -1)
spark.createDataFrame(pd.DataFrame({'a': [123]})).toPandas()
```

**Before**

```
/.../spark/python/pyspark/sql/pandas/conversion.py:371: UserWarning: 
createDataFrame attempted Arrow optimization because 
'spark.sql.execution.arrow.pyspark.enabled' is set to true, but has reached the 
error below and will not continue because automatic fallback with 
'spark.sql.execution.arrow.pyspark.fallback.enabled' has been set to false.
  range() arg 3 must not be zero
  warn(msg)
Traceback (most recent call last):
  File "", line 1, in 
  File "/.../spark/python/pyspark/sql/session.py", line 1483, in 
createDataFrame
return super(SparkSession, self).createDataFrame(  # type: 
ignore[call-overload]
  File "/.../spark/python/pyspark/sql/pandas/conversion.py", line 351, in 
createDataFrame
return self._create_from_pandas_with_arrow(data, schema, timezone)
  File "/.../spark/python/pyspark/sql/pandas/conversion.py", line 633, in 
_create_from_pandas_with_arrow
pdf_slices = (pdf.iloc[start : start + step] for start in range(0, 
len(pdf), step))
ValueError: range() arg 3 must not be zero
```
```
Empty DataFrame
Columns: [a]
Index: []
```

**After**

```
 a
0  123
```

```
 a
0  123
```

It fixes a regerssion. This is a documented behaviour. It should be 
backported to branch-3.4 and branch-3.5.

Yes, it fixes a regression as described above.

Unittest was added.

No.

Closes #45132 from HyukjinKwon/SPARK-47068.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 3bb762dc032866cfb304019cba6db01125556c2f)
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/sql/pandas/conversion.py |  1 +
 python/pyspark/sql/tests/test_arrow.py  | 10 ++
 2 files changed, 11 insertions(+)

diff --git a/python/pyspark/sql/pandas/conversion.py 
b/python/pyspark/sql/pandas/conversion.py
index a5f0664ed75d..e4c4af709fa8 100644
--- a/python/pyspark/sql/pandas/conversion.py
+++ b/python/pyspark/sql/pandas/conversion.py
@@ -600,6 +600,7 @@ class SparkConversionMixin:
 
 # Slice the DataFrame to be batched
 step = self._jconf.arrowMaxRecordsPerBatch()
+step = step if step > 0 else len(pdf)
 pdf_slices = (pdf.iloc[start : start + step] for start in range(0, 
len(pdf), step))
 
 # Create list of Arrow (columns, type) for serializer dump_stream
diff --git a/python/pyspark/sql/tests/test_arrow.py 
b/python/pyspark/sql/tests/test_arrow.py
index 7fded1cbefdc..de832ce9273e 100644
--- a/python/pyspark/sql/tests/test_arrow.py
+++ b/python/pyspark/sql/tests/test_arrow.py
@@ -831,6 +831,16 @@ class ArrowTestsMixin:
 self.assertEqual([Row(c1=1, c2="string")], df.collect())
 self.assertGreater(self.spark.sparkContext.defaultParallelism, 
len(pdf))
 
+def test_negative_and_zero_batch_size(self):
+# SPARK-47068: Negative and zero value should work as unlimited batch 
size.
+with self.sql_conf({"spark.sql.execution.arrow.maxRecordsPerBatch": 
0}):
+pdf = pd.DataFrame({"a": [123]})
+assert_frame_equal(pdf, self.spark.createDataFrame(pdf).toPandas())
+
+with self.sql_conf({"spark.sql.execution.arrow.maxRecordsPerBatch": 
-1}):
+pdf = pd.DataFrame({"a": [123]})
+assert_frame_equal(pdf, self.spark.createDataFrame(pdf).toPandas())
+
 
 @unittest.skipIf(
 not have_pandas or not have_pyarrow,


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.5 updated: [SPARK-47068][PYTHON][TESTS] Recover -1 and 0 case for spark.sql.execution.arrow.maxRecordsPerBatch

2024-02-15 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 1c1c5faa29dc [SPARK-47068][PYTHON][TESTS] Recover -1 and 0 case for 
spark.sql.execution.arrow.maxRecordsPerBatch
1c1c5faa29dc is described below

commit 1c1c5faa29dc649faf143fe2eea39ccf15862f85
Author: Hyukjin Kwon 
AuthorDate: Fri Feb 16 12:41:19 2024 +0900

[SPARK-47068][PYTHON][TESTS] Recover -1 and 0 case for 
spark.sql.execution.arrow.maxRecordsPerBatch

This PR fixes the regression introduced by 
https://github.com/apache/spark/pull/36683.

```python
import pandas as pd
spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")
spark.conf.set("spark.sql.execution.arrow.maxRecordsPerBatch", 0)
spark.conf.set("spark.sql.execution.arrow.pyspark.fallback.enabled", False)
spark.createDataFrame(pd.DataFrame({'a': [123]})).toPandas()

spark.conf.set("spark.sql.execution.arrow.maxRecordsPerBatch", -1)
spark.createDataFrame(pd.DataFrame({'a': [123]})).toPandas()
```

**Before**

```
/.../spark/python/pyspark/sql/pandas/conversion.py:371: UserWarning: 
createDataFrame attempted Arrow optimization because 
'spark.sql.execution.arrow.pyspark.enabled' is set to true, but has reached the 
error below and will not continue because automatic fallback with 
'spark.sql.execution.arrow.pyspark.fallback.enabled' has been set to false.
  range() arg 3 must not be zero
  warn(msg)
Traceback (most recent call last):
  File "", line 1, in 
  File "/.../spark/python/pyspark/sql/session.py", line 1483, in 
createDataFrame
return super(SparkSession, self).createDataFrame(  # type: 
ignore[call-overload]
  File "/.../spark/python/pyspark/sql/pandas/conversion.py", line 351, in 
createDataFrame
return self._create_from_pandas_with_arrow(data, schema, timezone)
  File "/.../spark/python/pyspark/sql/pandas/conversion.py", line 633, in 
_create_from_pandas_with_arrow
pdf_slices = (pdf.iloc[start : start + step] for start in range(0, 
len(pdf), step))
ValueError: range() arg 3 must not be zero
```
```
Empty DataFrame
Columns: [a]
Index: []
```

**After**

```
 a
0  123
```

```
 a
0  123
```

It fixes a regerssion. This is a documented behaviour. It should be 
backported to branch-3.4 and branch-3.5.

Yes, it fixes a regression as described above.

Unittest was added.

No.

Closes #45132 from HyukjinKwon/SPARK-47068.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 3bb762dc032866cfb304019cba6db01125556c2f)
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/sql/pandas/conversion.py |  1 +
 python/pyspark/sql/tests/test_arrow.py  | 10 ++
 2 files changed, 11 insertions(+)

diff --git a/python/pyspark/sql/pandas/conversion.py 
b/python/pyspark/sql/pandas/conversion.py
index 8664c4df73ed..3643cafbb3ba 100644
--- a/python/pyspark/sql/pandas/conversion.py
+++ b/python/pyspark/sql/pandas/conversion.py
@@ -613,6 +613,7 @@ class SparkConversionMixin:
 
 # Slice the DataFrame to be batched
 step = self._jconf.arrowMaxRecordsPerBatch()
+step = step if step > 0 else len(pdf)
 pdf_slices = (pdf.iloc[start : start + step] for start in range(0, 
len(pdf), step))
 
 # Create list of Arrow (columns, arrow_type, spark_type) for 
serializer dump_stream
diff --git a/python/pyspark/sql/tests/test_arrow.py 
b/python/pyspark/sql/tests/test_arrow.py
index 73b6067373b0..9e9a7d3ac9b0 100644
--- a/python/pyspark/sql/tests/test_arrow.py
+++ b/python/pyspark/sql/tests/test_arrow.py
@@ -1238,6 +1238,16 @@ class ArrowTestsMixin:
 df = self.spark.createDataFrame([MyInheritedTuple(1, 2, 
MyInheritedTuple(1, 2, 3))])
 self.assertEqual(df.first(), Row(a=1, b=2, c=Row(a=1, b=2, c=3)))
 
+def test_negative_and_zero_batch_size(self):
+# SPARK-47068: Negative and zero value should work as unlimited batch 
size.
+with self.sql_conf({"spark.sql.execution.arrow.maxRecordsPerBatch": 
0}):
+pdf = pd.DataFrame({"a": [123]})
+assert_frame_equal(pdf, self.spark.createDataFrame(pdf).toPandas())
+
+with self.sql_conf({"spark.sql.execution.arrow.maxRecordsPerBatch": 
-1}):
+pdf = pd.DataFrame({"a": [123]})
+assert_frame_equal(pdf, self.spark.createDataFrame(pdf).toPandas())
+
 
 @unittest.skipIf(
 not have_pandas or not have_pyarrow,


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47068][PYTHON][TESTS] Recover -1 and 0 case for spark.sql.execution.arrow.maxRecordsPerBatch

2024-02-15 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3bb762dc0328 [SPARK-47068][PYTHON][TESTS] Recover -1 and 0 case for 
spark.sql.execution.arrow.maxRecordsPerBatch
3bb762dc0328 is described below

commit 3bb762dc032866cfb304019cba6db01125556c2f
Author: Hyukjin Kwon 
AuthorDate: Fri Feb 16 12:41:19 2024 +0900

[SPARK-47068][PYTHON][TESTS] Recover -1 and 0 case for 
spark.sql.execution.arrow.maxRecordsPerBatch

### What changes were proposed in this pull request?

This PR fixes the regression introduced by 
https://github.com/apache/spark/pull/36683.

```python
import pandas as pd
spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")
spark.conf.set("spark.sql.execution.arrow.maxRecordsPerBatch", 0)
spark.conf.set("spark.sql.execution.arrow.pyspark.fallback.enabled", False)
spark.createDataFrame(pd.DataFrame({'a': [123]})).toPandas()

spark.conf.set("spark.sql.execution.arrow.maxRecordsPerBatch", -1)
spark.createDataFrame(pd.DataFrame({'a': [123]})).toPandas()
```

**Before**

```
/.../spark/python/pyspark/sql/pandas/conversion.py:371: UserWarning: 
createDataFrame attempted Arrow optimization because 
'spark.sql.execution.arrow.pyspark.enabled' is set to true, but has reached the 
error below and will not continue because automatic fallback with 
'spark.sql.execution.arrow.pyspark.fallback.enabled' has been set to false.
  range() arg 3 must not be zero
  warn(msg)
Traceback (most recent call last):
  File "", line 1, in 
  File "/.../spark/python/pyspark/sql/session.py", line 1483, in 
createDataFrame
return super(SparkSession, self).createDataFrame(  # type: 
ignore[call-overload]
  File "/.../spark/python/pyspark/sql/pandas/conversion.py", line 351, in 
createDataFrame
return self._create_from_pandas_with_arrow(data, schema, timezone)
  File "/.../spark/python/pyspark/sql/pandas/conversion.py", line 633, in 
_create_from_pandas_with_arrow
pdf_slices = (pdf.iloc[start : start + step] for start in range(0, 
len(pdf), step))
ValueError: range() arg 3 must not be zero
```
```
Empty DataFrame
Columns: [a]
Index: []
```

**After**

```
 a
0  123
```

```
 a
0  123
```

### Why are the changes needed?

It fixes a regerssion. This is a documented behaviour. It should be 
backported to branch-3.4 and branch-3.5.

### Does this PR introduce _any_ user-facing change?

Yes, it fixes a regression as described above.

### How was this patch tested?

Unittest was added.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45132 from HyukjinKwon/SPARK-47068.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/sql/pandas/conversion.py |  1 +
 python/pyspark/sql/tests/test_arrow.py  | 10 ++
 2 files changed, 11 insertions(+)

diff --git a/python/pyspark/sql/pandas/conversion.py 
b/python/pyspark/sql/pandas/conversion.py
index 5288f0e100bb..d958b95795b7 100644
--- a/python/pyspark/sql/pandas/conversion.py
+++ b/python/pyspark/sql/pandas/conversion.py
@@ -630,6 +630,7 @@ class SparkConversionMixin:
 
 # Slice the DataFrame to be batched
 step = self._jconf.arrowMaxRecordsPerBatch()
+step = step if step > 0 else len(pdf)
 pdf_slices = (pdf.iloc[start : start + step] for start in range(0, 
len(pdf), step))
 
 # Create list of Arrow (columns, arrow_type, spark_type) for 
serializer dump_stream
diff --git a/python/pyspark/sql/tests/test_arrow.py 
b/python/pyspark/sql/tests/test_arrow.py
index fc979c9e8b78..c771e5db65e5 100644
--- a/python/pyspark/sql/tests/test_arrow.py
+++ b/python/pyspark/sql/tests/test_arrow.py
@@ -1144,6 +1144,16 @@ class ArrowTestsMixin:
 df = self.spark.createDataFrame([MyInheritedTuple(1, 2, 
MyInheritedTuple(1, 2, 3))])
 self.assertEqual(df.first(), Row(a=1, b=2, c=Row(a=1, b=2, c=3)))
 
+def test_negative_and_zero_batch_size(self):
+# SPARK-47068: Negative and zero value should work as unlimited batch 
size.
+with self.sql_conf({"spark.sql.execution.arrow.maxRecordsPerBatch": 
0}):
+pdf = pd.DataFrame({"a": [123]})
+assert_frame_equal(pdf, self.spark.createDataFrame(pdf).toPandas())
+
+with self.sql_conf({"spark.sql.execution.arrow.maxRecordsPerBatch": 
-1}):
+pdf = pd.DataFrame({"a": [123]})
+assert_frame_equal(pdf, self.spark.createDataFrame(pdf).toPandas())
+
 
 @unittest.skipIf(
 not have_pandas or not have_pyarrow,



(spark) branch master updated: [SPARK-46078][PYTHON][TESTS] Upgrade `pytorch` for Python 3.12

2024-02-15 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3a146ce46c55 [SPARK-46078][PYTHON][TESTS] Upgrade `pytorch` for Python 
3.12
3a146ce46c55 is described below

commit 3a146ce46c55ebdb206bb8b2439690905003e6e2
Author: Hyukjin Kwon 
AuthorDate: Fri Feb 16 12:40:45 2024 +0900

[SPARK-46078][PYTHON][TESTS] Upgrade `pytorch` for Python 3.12

### What changes were proposed in this pull request?

This PR proposes to upgrade PyTorch for Python 3.12

This PR is dependent on https://github.com/apache/spark/pull/45113

### Why are the changes needed?

To use the official releases.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Will be verified in CI. Seems like their support on Mac is not out yet.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45119 from HyukjinKwon/SPARK-46078.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 dev/infra/Dockerfile | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile
index fc515d4478ad..53810756d30c 100644
--- a/dev/infra/Dockerfile
+++ b/dev/infra/Dockerfile
@@ -133,9 +133,7 @@ RUN apt-get update && apt-get install -y \
 && rm -rf /var/lib/apt/lists/*
 RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.12
 # TODO(SPARK-46647) Add unittest-xml-reporting into Python 3.12 image when it 
supports Python 3.12
-# TODO(SPARK-46078) Use official one instead of nightly build when it's ready
 RUN python3.12 -m pip install $BASIC_PIP_PKGS $CONNECT_PIP_PKGS lxml && \
-python3.12 -m pip install --pre torch --index-url 
https://download.pytorch.org/whl/nightly/cpu && \
-python3.12 -m pip install torchvision --index-url 
https://download.pytorch.org/whl/cpu && \
+python3.12 -m pip install torch torchvision --index-url 
https://download.pytorch.org/whl/cpu && \
 python3.12 -m pip install torcheval && \
 python3.12 -m pip cache purge


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47055][PYTHON] Upgrade MyPy 1.8.0

2024-02-15 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7668eb5daf22 [SPARK-47055][PYTHON] Upgrade MyPy 1.8.0
7668eb5daf22 is described below

commit 7668eb5daf22868094fe83c08681a93b0a4f4d29
Author: Hyukjin Kwon 
AuthorDate: Fri Feb 16 12:36:54 2024 +0900

[SPARK-47055][PYTHON] Upgrade MyPy 1.8.0

### What changes were proposed in this pull request?

This PR proposes upgrade MyPy to 1.8.0.

### Why are the changes needed?

To unblock the full support of Python 3.12 with CI. This unblocks 
https://github.com/apache/spark/pull/45113

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually ran `dev/line-python`

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45115 from HyukjinKwon/SPARK-47055.

Lead-authored-by: Hyukjin Kwon 
Co-authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 .github/workflows/build_and_test.yml   |  4 +--
 dev/lint-python|  7 ++--
 dev/requirements.txt   |  2 +-
 python/mypy.ini|  8 +
 python/pyspark/accumulators.py |  2 +-
 python/pyspark/ml/classification.py|  6 ++--
 python/pyspark/ml/connect/tuning.py|  2 +-
 python/pyspark/ml/torch/distributor.py |  3 --
 python/pyspark/ml/util.py  |  2 +-
 python/pyspark/pandas/data_type_ops/boolean_ops.py |  2 +-
 python/pyspark/pandas/data_type_ops/num_ops.py |  8 +++--
 python/pyspark/pandas/data_type_ops/string_ops.py  |  4 ++-
 python/pyspark/pandas/frame.py |  8 +++--
 python/pyspark/pandas/namespace.py |  4 +--
 python/pyspark/pandas/series.py|  4 +--
 python/pyspark/pandas/sql_processor.py |  2 +-
 python/pyspark/pandas/supported_api_gen.py |  7 ++--
 python/pyspark/pandas/typedef/typehints.py | 18 --
 python/pyspark/profiler.py |  2 +-
 python/pyspark/rdd.py  | 22 ++--
 python/pyspark/sql/connect/expressions.py  |  8 +++--
 python/pyspark/sql/connect/plan.py |  8 +++--
 python/pyspark/sql/connect/session.py  |  2 +-
 python/pyspark/sql/group.py| 12 +++
 python/pyspark/sql/pandas/functions.pyi|  6 ++--
 python/pyspark/sql/pandas/types.py | 42 +-
 python/pyspark/sql/session.py  |  4 +--
 python/pyspark/sql/streaming/readwriter.py | 10 +++---
 python/pyspark/sql/types.py| 12 +++
 python/pyspark/sql/utils.py|  4 +--
 python/pyspark/streaming/dstream.py|  6 ++--
 python/pyspark/worker.py   |  2 +-
 32 files changed, 121 insertions(+), 112 deletions(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 605c2a0aea1a..0427fc0fd4a3 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -596,7 +596,7 @@ jobs:
 python-version: '3.9'
 - name: Install dependencies for Python CodeGen check
   run: |
-python3.9 -m pip install 'black==23.9.1' 'protobuf==4.25.1' 
'mypy==0.982' 'mypy-protobuf==3.3.0'
+python3.9 -m pip install 'black==23.9.1' 'protobuf==4.25.1' 
'mypy==1.8.0' 'mypy-protobuf==3.3.0'
 python3.9 -m pip list
 - name: Python CodeGen check
   run: ./dev/connect-check-protos.py
@@ -704,7 +704,7 @@ jobs:
 # See 'docutils<0.18.0' in SPARK-39421
 python3.9 -m pip install 'sphinx==4.5.0' mkdocs 
'pydata_sphinx_theme>=0.13' sphinx-copybutton nbsphinx numpydoc jinja2 
markupsafe 'pyzmq<24.0.0' \
   ipython ipython_genutils sphinx_plotly_directive 'numpy>=1.20.0' 
pyarrow pandas 'plotly>=4.8' 'docutils<0.18.0' \
-  'flake8==3.9.0' 'mypy==0.982' 'pytest==7.1.3' 
'pytest-mypy-plugins==1.9.3' 'black==23.9.1' \
+  'flake8==3.9.0' 'mypy==1.8.0' 'pytest==7.1.3' 
'pytest-mypy-plugins==1.9.3' 'black==23.9.1' \
   'pandas-stubs==1.2.0.53' 'grpcio==1.59.3' 'grpc-stubs==1.24.11' 
'googleapis-common-protos-stubs==2.2.0' \
   'sphinxcontrib-applehelp==1.0.4' 'sphinxcontrib-devhelp==1.0.2' 
'sphinxcontrib-htmlhelp==2.0.1' 'sphinxcontrib-qthelp==1.0.3' 
'sphinxcontrib-serializinghtml==1.1.5'
 python3.9 -m pip list
diff --git a/dev/lint-python b/dev/lint-python
index 5cb4fa6336e0..76f844aa3895 100755
--- a/dev/lint-python
+++ b/dev/lint-python
@@ -221,9 +221,10 @@ function mypy_test {
 if [[ "$MYPY_EXA

(spark) branch master updated: [SPARK-47067][INFRA] Add Daily Apple Silicon Github Action Job (Java/Scala)

2024-02-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 964545aa88e9 [SPARK-47067][INFRA] Add Daily Apple Silicon Github 
Action Job (Java/Scala)
964545aa88e9 is described below

commit 964545aa88e9fd5b033871cbeae8d40f1a4ced91
Author: Dongjoon Hyun 
AuthorDate: Thu Feb 15 17:26:40 2024 -0800

[SPARK-47067][INFRA] Add Daily Apple Silicon Github Action Job (Java/Scala)

### What changes were proposed in this pull request?

This PR aims to add a new `Daily Apple Silicon Github Action (Java/Scala)` 
job for Apache Spark 4.0.0.

### Why are the changes needed?

To have a test coverage for Apple Silicon environment in Java/Scala first.

For Python and R, we need more efforts due to their library dependencies.

We will add them separately.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unfortunately, this should be tested after merging.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45128 from dongjoon-hyun/SPARK-47067.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/build_apple_silicon.yml | 47 +++
 1 file changed, 47 insertions(+)

diff --git a/.github/workflows/build_apple_silicon.yml 
b/.github/workflows/build_apple_silicon.yml
new file mode 100644
index ..52a906af3f04
--- /dev/null
+++ b/.github/workflows/build_apple_silicon.yml
@@ -0,0 +1,47 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+name: "Build / Apple Silicon (master, JDK 21)"
+
+on:
+  schedule:
+- cron: '0 20 * * *'
+
+jobs:
+  run-build:
+permissions:
+  packages: write
+name: Run
+uses: ./.github/workflows/build_and_test.yml
+runs-on: macos-14
+if: github.repository == 'apache/spark'
+with:
+  java: 21
+  branch: master
+  hadoop: hadoop3
+  envs: >-
+{
+  "SKIP_MIMA": "true",
+  "SKIP_UNIDOC": "true",
+  "DEDICATED_JVM_SBT_TESTS": 
"org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormatV1Suite,org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormatV2Suite,org.apache.spark.sql.execution.datasources.orc.OrcSourceV1Suite,org.apache.spark.sql.execution.datasources.orc.OrcSourceV2Suite"
+}
+  jobs: >-
+{
+  "build": "true"
+}


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47066][INFRA] Add `Apple Silicon` Maven build test to GitHub Action CI

2024-02-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 86a387737eb6 [SPARK-47066][INFRA] Add `Apple Silicon` Maven build test 
to GitHub Action CI
86a387737eb6 is described below

commit 86a387737eb6e480f305a62c2f9d8e5cab0503ac
Author: Dongjoon Hyun 
AuthorDate: Thu Feb 15 14:21:35 2024 -0800

[SPARK-47066][INFRA] Add `Apple Silicon` Maven build test to GitHub Action 
CI

### What changes were proposed in this pull request?

This PR aims to add a new maven build test pipeline on `Apple Silicon and 
MacOS 14` environment.

### Why are the changes needed?

To have a test coverage.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs. It passed already.

![Screenshot 2024-02-15 at 12 14 
15](https://github.com/apache/spark/assets/9700541/814880d6-78ba-478d-963d-30c1a814016c)

![Screenshot 2024-02-15 at 12 24 
52](https://github.com/apache/spark/assets/9700541/5def455a-8130-43bc-9a66-5d395dce0dac)

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45126 from dongjoon-hyun/SPARK-47066.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/build_and_test.yml | 56 
 1 file changed, 56 insertions(+)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 43903d139d1f..605c2a0aea1a 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -848,6 +848,62 @@ jobs:
 ./build/mvn $MAVEN_CLI_OPTS -DskipTests -Pyarn -Pkubernetes -Pvolcano 
-Phive -Phive-thriftserver -Phadoop-cloud -Djava.version=${JAVA_VERSION/-ea} 
install
 rm -rf ~/.m2/repository/org/apache/spark
 
+  apple-silicon:
+needs: precondition
+if: fromJson(needs.precondition.outputs.required).build == 'true'
+name: Apple Silicon build with Maven
+strategy:
+  fail-fast: false
+  matrix:
+java:
+  - 21
+runs-on: macos-14
+timeout-minutes: 300
+steps:
+- name: Checkout Spark repository
+  uses: actions/checkout@v4
+  with:
+fetch-depth: 0
+repository: apache/spark
+ref: ${{ inputs.branch }}
+- name: Sync the current branch with the latest in Apache Spark
+  if: github.repository != 'apache/spark'
+  run: |
+git fetch https://github.com/$GITHUB_REPOSITORY.git 
${GITHUB_REF#refs/heads/}
+git -c user.name='Apache Spark Test Account' -c 
user.email='sparktest...@gmail.com' merge --no-commit --progress --squash 
FETCH_HEAD
+git -c user.name='Apache Spark Test Account' -c 
user.email='sparktest...@gmail.com' commit -m "Merged commit" --allow-empty
+- name: Cache Scala, SBT and Maven
+  uses: actions/cache@v4
+  with:
+path: |
+  build/apache-maven-*
+  build/scala-*
+  build/*.jar
+  ~/.sbt
+key: build-${{ hashFiles('**/pom.xml', 'project/build.properties', 
'build/mvn', 'build/sbt', 'build/sbt-launch-lib.bash', 
'build/spark-build-info') }}
+restore-keys: |
+  apple-silicon-build-
+- name: Cache Maven local repository
+  uses: actions/cache@v4
+  with:
+path: ~/.m2/repository
+key: java${{ matrix.java }}-maven-${{ hashFiles('**/pom.xml') }}
+restore-keys: |
+  apple-silicon-${{ matrix.java }}-maven-
+- name: Install Java ${{ matrix.java }}
+  uses: actions/setup-java@v4
+  with:
+distribution: zulu
+java-version: ${{ matrix.java }}
+- name: Build with Maven
+  run: |
+export MAVEN_OPTS="-Xss64m -Xmx2g -XX:ReservedCodeCacheSize=1g 
-Dorg.slf4j.simpleLogger.defaultLogLevel=WARN"
+export MAVEN_CLI_OPTS="--no-transfer-progress"
+export JAVA_VERSION=${{ matrix.java }}
+# It uses Maven's 'install' intentionally, see 
https://github.com/apache/spark/pull/26414.
+./build/mvn $MAVEN_CLI_OPTS -DskipTests -Pyarn -Pkubernetes -Pvolcano 
-Phive -Phive-thriftserver -Phadoop-cloud -Djava.version=${JAVA_VERSION/-ea} 
install
+rm -rf ~/.m2/repository/org/apache/spark
+
   # Any TPC-DS related updates on this job need to be applied to tpcds-1g-gen 
job of benchmark.yml as well
   tpcds-1g:
 needs: precondition


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (135bdeb2ba10 -> 638a68f8e566)

2024-02-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 135bdeb2ba10 [SPARK-47064][SQL][TESTS] Use Scala 2.13 Spark 
distribution in `HiveExternalCatalogVersionsSuite`
 add 638a68f8e566 [SPARK-47058][TESTS] Add `scalastyle` and `checkstyle` 
rules to ban `AtomicDoubleArray|CompoundOrdering`

No new revisions were added by this update.

Summary of changes:
 dev/checkstyle.xml|  8 
 scalastyle-config.xml | 10 ++
 2 files changed, 18 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.4 updated: [SPARK-46400][CORE][SQL][3.4] When there are corrupted files in the local maven repo, skip this cache and try again

2024-02-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new d25ef73d3543 [SPARK-46400][CORE][SQL][3.4] When there are corrupted 
files in the local maven repo, skip this cache and try again
d25ef73d3543 is described below

commit d25ef73d3543d7cc7c3d2089e215309ce0bd22d2
Author: panbingkun 
AuthorDate: Thu Feb 15 09:52:53 2024 -0800

[SPARK-46400][CORE][SQL][3.4] When there are corrupted files in the local 
maven repo, skip this cache and try again

### What changes were proposed in this pull request?
The pr aims to
- fix potential bug(ie: https://github.com/apache/spark/pull/44208) and 
enhance user experience.
- make the code more compliant with standards

Backport above to branch 3.4.
Master branch pr: https://github.com/apache/spark/pull/44343

### Why are the changes needed?
We use the local maven repo as the first-level cache in ivy.  The original 
intention was to reduce the time required to parse and obtain the ar, but when 
there are corrupted files in the local maven repo,The above mechanism will be 
directly interrupted and the prompt is very unfriendly, which will greatly 
confuse the user.  Based on the original intention, we should skip the cache 
directly in similar situations.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Manually test.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45018 from panbingkun/branch-3.4_SPARK-46400.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 .../org/apache/spark/deploy/SparkSubmit.scala  | 116 +
 .../sql/hive/client/IsolatedClientLoader.scala |   4 +
 2 files changed, 98 insertions(+), 22 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala 
b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
index 8b4ef1dee8ac..18d85599bd2d 100644
--- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
@@ -41,7 +41,7 @@ import org.apache.ivy.Ivy
 import org.apache.ivy.core.LogOptions
 import org.apache.ivy.core.module.descriptor._
 import org.apache.ivy.core.module.id.{ArtifactId, ModuleId, ModuleRevisionId}
-import org.apache.ivy.core.report.ResolveReport
+import org.apache.ivy.core.report.{DownloadStatus, ResolveReport}
 import org.apache.ivy.core.resolve.ResolveOptions
 import org.apache.ivy.core.retrieve.RetrieveOptions
 import org.apache.ivy.core.settings.IvySettings
@@ -1217,7 +1217,7 @@ private[spark] object SparkSubmitUtils extends Logging {
 s"be whitespace. The artifactId provided is: ${splits(1)}")
   require(splits(2) != null && splits(2).trim.nonEmpty, s"The version 
cannot be null or " +
 s"be whitespace. The version provided is: ${splits(2)}")
-  new MavenCoordinate(splits(0), splits(1), splits(2))
+  MavenCoordinate(splits(0), splits(1), splits(2))
 }
   }
 
@@ -1232,21 +1232,27 @@ private[spark] object SparkSubmitUtils extends Logging {
   }
 
   /**
-   * Extracts maven coordinates from a comma-delimited string
+   * Create a ChainResolver used by Ivy to search for and resolve dependencies.
+   *
* @param defaultIvyUserDir The default user path for Ivy
+   * @param useLocalM2AsCache Whether to use the local maven repo as a cache
* @return A ChainResolver used by Ivy to search for and resolve 
dependencies.
*/
-  def createRepoResolvers(defaultIvyUserDir: File): ChainResolver = {
+  def createRepoResolvers(
+  defaultIvyUserDir: File,
+  useLocalM2AsCache: Boolean = true): ChainResolver = {
 // We need a chain resolver if we want to check multiple repositories
 val cr = new ChainResolver
 cr.setName("spark-list")
 
-val localM2 = new IBiblioResolver
-localM2.setM2compatible(true)
-localM2.setRoot(m2Path.toURI.toString)
-localM2.setUsepoms(true)
-localM2.setName("local-m2-cache")
-cr.add(localM2)
+if (useLocalM2AsCache) {
+  val localM2 = new IBiblioResolver
+  localM2.setM2compatible(true)
+  localM2.setRoot(m2Path.toURI.toString)
+  localM2.setUsepoms(true)
+  localM2.setName("local-m2-cache")
+  cr.add(localM2)
+}
 
 val localIvy = new FileSystemResolver
 val localIvyRoot = new File(defaultIvyUserDir, "local")
@@ -1342,18 +1348,23 @@ private[spark] object SparkSubmitUtils extends Logging {
 
   /**
* Build Ivy Settings using options with default resolvers
+   *
* @param remoteRepos Comma-delimited string of remote repositories other 
than maven central
* @param ivyPath The path to the local ivy repository
+   * @param useLocalM2AsCache Whether or not use `local-m2 repo` as

(spark) branch master updated: [SPARK-47064][SQL][TESTS] Use Scala 2.13 Spark distribution in `HiveExternalCatalogVersionsSuite`

2024-02-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 135bdeb2ba10 [SPARK-47064][SQL][TESTS] Use Scala 2.13 Spark 
distribution in `HiveExternalCatalogVersionsSuite`
135bdeb2ba10 is described below

commit 135bdeb2ba10726fb8036f4312c6a40cea63a801
Author: Dongjoon Hyun 
AuthorDate: Thu Feb 15 09:42:45 2024 -0800

[SPARK-47064][SQL][TESTS] Use Scala 2.13 Spark distribution in 
`HiveExternalCatalogVersionsSuite`

### What changes were proposed in this pull request?

This PR aims to use `Scala 2.13` Spark binary in 
`HiveExternalCatalogVersionsSuite`.

### Why are the changes needed?

SPARK-45314 makes Scala 2.13 is the default Scala version.
As one of migration paths, the users choose Apache Spark 3.5.0 (Scala 2.13) 
and Apache Spark 3.4.2 (Scala 2.13).
We had better focus on Scala 2.13 testing.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs and manual validation.

```
$ SBT_OPTS=-Dspark.test.cache-dir=/tmp/test-spark \
   build/sbt "hive/testOnly *.HiveExternalCatalogVersionsSuite" -Phive
...
[info] HiveExternalCatalogVersionsSuite:
[info] - backward compatibility (11 seconds, 583 milliseconds)
[info] Run completed in 42 seconds, 143 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 55 s, completed Feb 15, 2024, 9:34:54 AM
```

During the testing, `Scala 2.13.8` is used.
```
$ ls -al /tmp/test-spark/spark-3.4.2/jars/scala*
-rw-r--r--  1 dongjoon  wheel  5596 Nov 24 23:04 
/tmp/test-spark/spark-3.4.2/jars/scala-collection-compat_2.13-2.7.0.jar
-rw-r--r--  1 dongjoon  wheel  12097183 Nov 24 23:04 
/tmp/test-spark/spark-3.4.2/jars/scala-compiler-2.13.8.jar
-rw-r--r--  1 dongjoon  wheel   6003601 Nov 24 23:04 
/tmp/test-spark/spark-3.4.2/jars/scala-library-2.13.8.jar
-rw-r--r--  1 dongjoon  wheel   1127123 Nov 24 23:04 
/tmp/test-spark/spark-3.4.2/jars/scala-parallel-collections_2.13-1.0.4.jar
-rw-r--r--  1 dongjoon  wheel189556 Nov 24 23:04 
/tmp/test-spark/spark-3.4.2/jars/scala-parser-combinators_2.13-2.1.1.jar
-rw-r--r--  1 dongjoon  wheel   3772083 Nov 24 23:04 
/tmp/test-spark/spark-3.4.2/jars/scala-reflect-2.13.8.jar
-rw-r--r--  1 dongjoon  wheel483090 Nov 24 23:04 
/tmp/test-spark/spark-3.4.2/jars/scala-xml_2.13-2.1.0.jar
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45124 from dongjoon-hyun/SPARK-47064.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
index 50cf4017bd1e..726341ffdf9e 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
@@ -95,7 +95,7 @@ class HiveExternalCatalogVersionsSuite extends 
SparkSubmitTestUtils {
   mirrors.distinct :+ "https://archive.apache.org/dist"; :+ 
PROCESS_TABLES.releaseMirror
 logInfo(s"Trying to download Spark $version from $sites")
 for (site <- sites) {
-  val filename = s"spark-$version-bin-hadoop3.tgz"
+  val filename = s"spark-$version-bin-hadoop3-scala2.13.tgz"
   val url = s"$site/spark/spark-$version/$filename"
   logInfo(s"Downloading Spark $version from $url")
   try {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47059][SQL] Attach error context for ALTER COLUMN v1 command

2024-02-15 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e2cd71a4cd54 [SPARK-47059][SQL] Attach error context for ALTER COLUMN 
v1 command
e2cd71a4cd54 is described below

commit e2cd71a4cd54bbdf5af76d3edfbb2fc8c1b067b6
Author: Wenchen Fan 
AuthorDate: Thu Feb 15 18:36:11 2024 +0300

[SPARK-47059][SQL] Attach error context for ALTER COLUMN v1 command

### What changes were proposed in this pull request?

This is a small fix to improve the error message for ALTER COLUMN. We 
attach the error context for v1 command as well, making it consistent with v2 
command.

### Why are the changes needed?

better error message

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

updated tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #45121 from cloud-fan/context.

Authored-by: Wenchen Fan 
Signed-off-by: Max Gekk 
---
 .../spark/sql/errors/QueryCompilationErrors.scala  |  7 +--
 .../apache/spark/sql/execution/command/ddl.scala   |  2 +-
 .../analyzer-results/change-column.sql.out |  9 +++-
 .../sql-tests/analyzer-results/charvarchar.sql.out |  9 +++-
 .../sql-tests/results/change-column.sql.out|  9 +++-
 .../sql-tests/results/charvarchar.sql.out  |  9 +++-
 .../execution/command/CharVarcharDDLTestBase.scala | 24 ++
 7 files changed, 44 insertions(+), 25 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
index 46028817e8eb..53338f38ed6d 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
@@ -2637,7 +2637,8 @@ private[sql] object QueryCompilationErrors extends 
QueryErrorsBase with Compilat
   def alterTableChangeColumnNotSupportedForColumnTypeError(
   tableName: String,
   originColumn: StructField,
-  newColumn: StructField): Throwable = {
+  newColumn: StructField,
+  origin: Origin): Throwable = {
 new AnalysisException(
   errorClass = "NOT_SUPPORTED_CHANGE_COLUMN",
   messageParameters = Map(
@@ -2645,7 +2646,9 @@ private[sql] object QueryCompilationErrors extends 
QueryErrorsBase with Compilat
 "originName" -> toSQLId(originColumn.name),
 "originType" -> toSQLType(originColumn.dataType),
 "newName" -> toSQLId(newColumn.name),
-"newType"-> toSQLType(newColumn.dataType)))
+"newType"-> toSQLType(newColumn.dataType)),
+  origin = origin
+)
   }
 
   def cannotAlterPartitionColumn(
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
index dc1c5b3fd580..a5e48784ada1 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
@@ -390,7 +390,7 @@ case class AlterTableChangeColumnCommand(
 // Throw an AnalysisException if the column name/dataType is changed.
 if (!columnEqual(originColumn, newColumn, resolver)) {
   throw 
QueryCompilationErrors.alterTableChangeColumnNotSupportedForColumnTypeError(
-toSQLId(table.identifier.nameParts), originColumn, newColumn)
+toSQLId(table.identifier.nameParts), originColumn, newColumn, 
this.origin)
 }
 
 val newDataSchema = table.dataSchema.fields.map { field =>
diff --git 
a/sql/core/src/test/resources/sql-tests/analyzer-results/change-column.sql.out 
b/sql/core/src/test/resources/sql-tests/analyzer-results/change-column.sql.out
index a3d4388ab84f..07edfa5e95e1 100644
--- 
a/sql/core/src/test/resources/sql-tests/analyzer-results/change-column.sql.out
+++ 
b/sql/core/src/test/resources/sql-tests/analyzer-results/change-column.sql.out
@@ -69,7 +69,14 @@ org.apache.spark.sql.AnalysisException
 "originName" : "`a`",
 "originType" : "\"INT\"",
 "table" : "`spark_catalog`.`default`.`test_change`"
-  }
+  },
+  "queryContext" : [ {
+"objectType" : "",
+"objectName" : "",
+"startIndex" : 1,
+"stopIndex" : 44,
+"fragment" : "ALTER TABLE test_change CHANGE a TYPE STRING"
+  } ]
 }
 
 
diff --git 
a/sql/core/src/test/resources/sql-tests/analyzer-results/charvarchar.sql.out 
b/sql/core/src/test/resources/sql-tests/analyzer-results/charvarchar.sql.out
index 4f556d6dbc0b..02f09e0831d2 100644
--- a/sql/core/src/test/resources/sql-tests/analyzer-results/charvarchar.sql.out
+++ b/sql/core/src/test/resources/sql-tests/analyz

svn commit: r67358 - in /dev/spark/v3.5.1-rc2-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/R/articles/ _site/api/R/deps/ _site/api/R/deps/bootstrap-5.3.1/ _site/api/R/deps/bootstrap-5.3.1/fonts/

2024-02-15 Thread kabhwan
Author: kabhwan
Date: Thu Feb 15 14:08:37 2024
New Revision: 67358

Log:
Apache Spark v3.5.1-rc2 docs


[This commit notification would consist of 4713 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r67355 - /dev/spark/v3.5.1-rc2-bin/

2024-02-15 Thread kabhwan
Author: kabhwan
Date: Thu Feb 15 11:39:51 2024
New Revision: 67355

Log:
Apache Spark v3.5.1-rc2

Added:
dev/spark/v3.5.1-rc2-bin/
dev/spark/v3.5.1-rc2-bin/SparkR_3.5.1.tar.gz   (with props)
dev/spark/v3.5.1-rc2-bin/SparkR_3.5.1.tar.gz.asc
dev/spark/v3.5.1-rc2-bin/SparkR_3.5.1.tar.gz.sha512
dev/spark/v3.5.1-rc2-bin/pyspark-3.5.1.tar.gz   (with props)
dev/spark/v3.5.1-rc2-bin/pyspark-3.5.1.tar.gz.asc
dev/spark/v3.5.1-rc2-bin/pyspark-3.5.1.tar.gz.sha512
dev/spark/v3.5.1-rc2-bin/spark-3.5.1-bin-hadoop3-scala2.13.tgz   (with 
props)
dev/spark/v3.5.1-rc2-bin/spark-3.5.1-bin-hadoop3-scala2.13.tgz.asc
dev/spark/v3.5.1-rc2-bin/spark-3.5.1-bin-hadoop3-scala2.13.tgz.sha512
dev/spark/v3.5.1-rc2-bin/spark-3.5.1-bin-hadoop3.tgz   (with props)
dev/spark/v3.5.1-rc2-bin/spark-3.5.1-bin-hadoop3.tgz.asc
dev/spark/v3.5.1-rc2-bin/spark-3.5.1-bin-hadoop3.tgz.sha512
dev/spark/v3.5.1-rc2-bin/spark-3.5.1-bin-without-hadoop.tgz   (with props)
dev/spark/v3.5.1-rc2-bin/spark-3.5.1-bin-without-hadoop.tgz.asc
dev/spark/v3.5.1-rc2-bin/spark-3.5.1-bin-without-hadoop.tgz.sha512
dev/spark/v3.5.1-rc2-bin/spark-3.5.1.tgz   (with props)
dev/spark/v3.5.1-rc2-bin/spark-3.5.1.tgz.asc
dev/spark/v3.5.1-rc2-bin/spark-3.5.1.tgz.sha512

Added: dev/spark/v3.5.1-rc2-bin/SparkR_3.5.1.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.5.1-rc2-bin/SparkR_3.5.1.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.5.1-rc2-bin/SparkR_3.5.1.tar.gz.asc
==
--- dev/spark/v3.5.1-rc2-bin/SparkR_3.5.1.tar.gz.asc (added)
+++ dev/spark/v3.5.1-rc2-bin/SparkR_3.5.1.tar.gz.asc Thu Feb 15 11:39:51 2024
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJHBAABCgAxFiEE/T6ElC5eYQYjWh0lvTVqn4dA5P8FAmXN908THGthYmh3YW5A
+YXBhY2hlLm9yZwAKCRC9NWqfh0Dk/7gaD/43OxM8XC9NB/uf53WLbJr467fDRxuc
+LDCUwTsCC4mhX1DzAX9pt8sLQ6coWuLyqVHdaXyLCzbnHwonXpEV2euYITWWvIcy
+hK7TKSFFnWWB2QipZjRw8LcnR5ScLPKENX76XgnLOIMwKojVBItWE+01Z2klFopZ
+875bh2STo0/BQ3VyJEu935EXcfB7nAV5nTByhc6io39wcxvxlH25+hKi73KL4pPq
+uDTWanamwI/IQtTI05Oe4d/6f2WDjksVBVdSj/xgAxubKMMUU2aRQNKHwr2Aebe3
+CACQpGgpBRAa1LzzHgvkNnrZZ2uDzx1HHPPuGyheZrYIMM4vVVi2Lj5M3Sfh/zkQ
+ifR7Gn2XXFiypo5SPgaR5C7Zr5vn1yGPKxeWnaeBpeTkzoMr7wkyj6CJM5RlzGlo
+jDbAJwgpV/EZookonEVqDKAYHkbkjh2bmuK9l/YvjvrADWSH1eAntrmZH8laG3q2
+O4TcOgqKQhaf1fUErTXOciHc22fg62XauLLvRjnCmHNbHggh47m3TEAIc/WjW21t
+YAXZ9P7kdyRYoNWxxFJf32Ne1/JCYTu+7ev4TbHM2yugYWxdkVcs9XXeou/V+gwt
+AMngu6QItixJWtSaS/twpUB/GdUpHx36pL6QwqVysgHiuCrC+VOAwpg3oAoNaGaM
+U1XUkNeqKZm8Vw==
+=Ux6g
+-END PGP SIGNATURE-

Added: dev/spark/v3.5.1-rc2-bin/SparkR_3.5.1.tar.gz.sha512
==
--- dev/spark/v3.5.1-rc2-bin/SparkR_3.5.1.tar.gz.sha512 (added)
+++ dev/spark/v3.5.1-rc2-bin/SparkR_3.5.1.tar.gz.sha512 Thu Feb 15 11:39:51 2024
@@ -0,0 +1 @@
+1a57cf9dd6129e022c7bda178ab0baca2aeb8265b0eb6184c9bb2b1c30b9f4b118bef3bd7b165bcfff0b5b8d80ab7e8b57d6eba141e5a49fd47a6668550d0d14
  SparkR_3.5.1.tar.gz

Added: dev/spark/v3.5.1-rc2-bin/pyspark-3.5.1.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.5.1-rc2-bin/pyspark-3.5.1.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.5.1-rc2-bin/pyspark-3.5.1.tar.gz.asc
==
--- dev/spark/v3.5.1-rc2-bin/pyspark-3.5.1.tar.gz.asc (added)
+++ dev/spark/v3.5.1-rc2-bin/pyspark-3.5.1.tar.gz.asc Thu Feb 15 11:39:51 2024
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJHBAABCgAxFiEE/T6ElC5eYQYjWh0lvTVqn4dA5P8FAmXN91ETHGthYmh3YW5A
+YXBhY2hlLm9yZwAKCRC9NWqfh0Dk/+WMEAC5YH5rKEQQ4BuwaIXb9AAd0afzEOCj
+3nyYGHcT5BfIfs0fEicTvcs7rL5kxfzSa9Dhr8ly2FQbiv3DWm3m2UEqPY+HPxZH
+dI/jbaZlpx5k4K0fEpT4v5vDiHaN0z9pCqGZnbzcqggIML4u22/R4SctjuhfdCXC
+AKx6YJ0OpWixjblUmKixp/hg0bQJ5JaNmSMOJra0hNNgmZZQ/vYRcsltZFqsA/uh
+tN0gGeQ/o9uNYrSrD+zMlKx+SUZI6x6PcLZrPjVrn5g4QlwWb9qhrRrA37Nh6Z2K
+x9t6TjBIZjql8u6BzcLfgjrUXHA8+IDLEFVkSSssOuMUT8VbRDIqmIHY0m4bHNM+
+KORAJkKTzQ2PEAV5bYK+qsFrpiXOksMGCc9ia2NEHxrcZNJ0VnivsfJDEwbVYfj0
+4OIWTrJPkxwKUg38hkZemHKdz1SHxZRrXfofsVDkUnfszHdzyPyFuyHjFyzCoLMB
+6JcFZNf/ca2EVvvy1h323B0i09PbXKmxS8gdpDz7Mt1+lTFWTjaslAw6EqQPI+0I
+O7jbgJdAcRuyzXnGur4L24qh8zNUJH13fqPNFVtoANOgj4MKjeqYh0wQDkbSumpa
+4E0tnehKAm9jmbAkLl2Yusouv5kucMzRx5glG3+yWZ3MEOAxvlqokzu9Htf8qBQ6
+UgmPGPeUXF/Tgg==
+=+Er+
+-END PGP SIGNATURE-

Added: dev/spark/v3.5.1-rc2-bin/pyspark-3.5.1.tar.gz.sha512
==
--- dev/spark/v3.5.1-rc2-bin/pyspark-3.5.1.tar.gz.sha512 (added)
+++ dev/spark/v

(spark) 01/01: Preparing development version 3.5.2-SNAPSHOT

2024-02-15 Thread kabhwan
This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git

commit cacb6fa0868a8741cb67ef705375ac378adaeebb
Author: Jungtaek Lim 
AuthorDate: Thu Feb 15 10:56:51 2024 +

Preparing development version 3.5.2-SNAPSHOT
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 common/utils/pom.xml   | 2 +-
 connector/avro/pom.xml | 2 +-
 connector/connect/client/jvm/pom.xml   | 2 +-
 connector/connect/common/pom.xml   | 2 +-
 connector/connect/server/pom.xml   | 2 +-
 connector/docker-integration-tests/pom.xml | 2 +-
 connector/kafka-0-10-assembly/pom.xml  | 2 +-
 connector/kafka-0-10-sql/pom.xml   | 2 +-
 connector/kafka-0-10-token-provider/pom.xml| 2 +-
 connector/kafka-0-10/pom.xml   | 2 +-
 connector/kinesis-asl-assembly/pom.xml | 2 +-
 connector/kinesis-asl/pom.xml  | 2 +-
 connector/protobuf/pom.xml | 2 +-
 connector/spark-ganglia-lgpl/pom.xml   | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/api/pom.xml| 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 45 files changed, 47 insertions(+), 47 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 66faa8031c45..89bee06852be 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.5.1
+Version: 3.5.2
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
.
 Authors@R:
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 47b54729bbd2..d1ef9b24afda 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.1
+3.5.2-SNAPSHOT
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 66e6bb473bf2..9df20f8facf5 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.1
+3.5.2-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 98897b4424ae..27a53b0f9f3b 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.1
+3.5.2-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 44531ea54cd5..93410815e6c0 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.1
+3.5.2-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 8fcf20328e8e..a99b8b96402a 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-

(spark) branch branch-3.5 updated (9b4778fc1dc7 -> cacb6fa0868a)

2024-02-15 Thread kabhwan
This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


from 9b4778fc1dc7 [SPARK-46906][INFRA][3.5] Bump python libraries (pandas, 
pyarrow) in Docker image for release script
 add fd86f85e181f Preparing Spark release v3.5.1-rc2
 new cacb6fa0868a Preparing development version 3.5.2-SNAPSHOT

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) tag v3.5.1-rc2 created (now fd86f85e181f)

2024-02-15 Thread kabhwan
This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to tag v3.5.1-rc2
in repository https://gitbox.apache.org/repos/asf/spark.git


  at fd86f85e181f (commit)
This tag includes the following new commits:

 new fd86f85e181f Preparing Spark release v3.5.1-rc2

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) 01/01: Preparing Spark release v3.5.1-rc2

2024-02-15 Thread kabhwan
This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to tag v3.5.1-rc2
in repository https://gitbox.apache.org/repos/asf/spark.git

commit fd86f85e181fc2dc0f50a096855acf83a6cc5d9c
Author: Jungtaek Lim 
AuthorDate: Thu Feb 15 10:56:47 2024 +

Preparing Spark release v3.5.1-rc2
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 common/utils/pom.xml   | 2 +-
 connector/avro/pom.xml | 2 +-
 connector/connect/client/jvm/pom.xml   | 2 +-
 connector/connect/common/pom.xml   | 2 +-
 connector/connect/server/pom.xml   | 2 +-
 connector/docker-integration-tests/pom.xml | 2 +-
 connector/kafka-0-10-assembly/pom.xml  | 2 +-
 connector/kafka-0-10-sql/pom.xml   | 2 +-
 connector/kafka-0-10-token-provider/pom.xml| 2 +-
 connector/kafka-0-10/pom.xml   | 2 +-
 connector/kinesis-asl-assembly/pom.xml | 2 +-
 connector/kinesis-asl/pom.xml  | 2 +-
 connector/protobuf/pom.xml | 2 +-
 connector/spark-ganglia-lgpl/pom.xml   | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/api/pom.xml| 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 45 files changed, 47 insertions(+), 47 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 89bee06852be..66faa8031c45 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.5.2
+Version: 3.5.1
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
.
 Authors@R:
diff --git a/assembly/pom.xml b/assembly/pom.xml
index d1ef9b24afda..47b54729bbd2 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.2-SNAPSHOT
+3.5.1
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 9df20f8facf5..66e6bb473bf2 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.2-SNAPSHOT
+3.5.1
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 27a53b0f9f3b..98897b4424ae 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.2-SNAPSHOT
+3.5.1
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 93410815e6c0..44531ea54cd5 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.2-SNAPSHOT
+3.5.1
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index a99b8b96402a..8fcf20328e8e 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.2-SNA

(spark) branch master updated: [SPARK-47056][TESTS] Add `scalastyle` and `checkstyle` rules to ban `FileBackedOutputStream`

2024-02-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new edf4ac4b518d [SPARK-47056][TESTS] Add `scalastyle` and `checkstyle` 
rules to ban `FileBackedOutputStream`
edf4ac4b518d is described below

commit edf4ac4b518d0d69f7012ff5c0f1428fe45412ba
Author: Dongjoon Hyun 
AuthorDate: Thu Feb 15 01:26:59 2024 -0800

[SPARK-47056][TESTS] Add `scalastyle` and `checkstyle` rules to ban 
`FileBackedOutputStream`

### What changes were proposed in this pull request?

This PR aims to add `scalastyle` and `checkstyle` rules to ban 
`FileBackedOutputStream`.

### Why are the changes needed?

We don't use this but this will explicitly prevent any accidental usage of 
`FileBackedOutputStream` in the future.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45116 from dongjoon-hyun/SPARK-47056.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 dev/checkstyle.xml| 4 
 scalastyle-config.xml | 5 +
 2 files changed, 9 insertions(+)

diff --git a/dev/checkstyle.xml b/dev/checkstyle.xml
index b9997d2050d1..cb7e962e8033 100644
--- a/dev/checkstyle.xml
+++ b/dev/checkstyle.xml
@@ -180,6 +180,10 @@
   value="Avoid using com.google.common.io.Files.createTempDir() 
due to CVE-2020-8908.
 Use org.apache.spark.network.util.JavaUtils.createTempDir() 
instead." />
 
+
+
+
+
 
 
 
diff --git a/scalastyle-config.xml b/scalastyle-config.xml
index 2077769c71d0..5a2cf7ed4f44 100644
--- a/scalastyle-config.xml
+++ b/scalastyle-config.xml
@@ -462,6 +462,11 @@ This file is divided into 3 sections:
 
   
 
+  
+FileBackedOutputStream
+Avoid using FileBackedOutputStream due to 
CVE-2023-2976.
+  
+
   
 new Path\(new 
URI\(