(spark) branch branch-3.5 updated: [SPARK-49803][SQL][TESTS] Increase `spark.test.docker.connectionTimeout` to 10min

2024-09-26 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 50c1783a1f97 [SPARK-49803][SQL][TESTS] Increase 
`spark.test.docker.connectionTimeout` to 10min
50c1783a1f97 is described below

commit 50c1783a1f97e336c8560fc03ef85ec7319672ea
Author: Dongjoon Hyun 
AuthorDate: Thu Sep 26 17:32:29 2024 -0700

[SPARK-49803][SQL][TESTS] Increase `spark.test.docker.connectionTimeout` to 
10min

This PR aims to increase `spark.test.docker.connectionTimeout` to 10min.

Recently, various DB images fails at `connection` stage on multiple 
branches.

**MASTER** branch
https://github.com/apache/spark/actions/runs/11045311764/job/30682732260

```
[info] OracleIntegrationSuite:
[info] org.apache.spark.sql.jdbc.OracleIntegrationSuite *** ABORTED *** (5 
minutes, 17 seconds)
[info]   The code passed to eventually never returned normally. Attempted 
298 times over 5.004500551155 minutes. Last failure message: ORA-12541: 
Cannot connect. No listener at host 10.1.0.41 port 41079. 
(CONNECTION_ID=n9ZWIh+nQn+G9fkwKyoBQA==)
```

**branch-3.5** branch
https://github.com/apache/spark/actions/runs/10939696926/job/30370552237

```
[info] MsSqlServerNamespaceSuite:
[info] org.apache.spark.sql.jdbc.v2.MsSqlServerNamespaceSuite *** ABORTED 
*** (5 minutes, 42 seconds)
[info]   The code passed to eventually never returned normally. Attempted 
11 times over 5.48763128241 minutes. Last failure message: The TCP/IP 
connection to the host 10.1.0.56, port 35345 has failed. Error: "Connection 
refused (Connection refused). Verify the connection properties. Make sure that 
an instance of SQL Server is running on the host and accepting TCP/IP 
connections at the port. Make sure that TCP connections to the port are not 
blocked by a firewall.".. (DockerJDBCInt [...]
```

**branch-3.4** branch
https://github.com/apache/spark/actions/runs/10937842509/job/30364658576

```
[info] MsSqlServerNamespaceSuite:
[info] org.apache.spark.sql.jdbc.v2.MsSqlServerNamespaceSuite *** ABORTED 
*** (5 minutes, 42 seconds)
[info]   The code passed to eventually never returned normally. Attempted 
11 times over 5.48755564563 minutes. Last failure message: The TCP/IP 
connection to the host 10.1.0.153, port 46153 has failed. Error: "Connection 
refused (Connection refused). Verify the connection properties. Make sure that 
an instance of SQL Server is running on the host and accepting TCP/IP 
connections at the port. Make sure that TCP connections to the port are not 
blocked by a firewall.".. (DockerJDBCIn [...]
```

No, this is a test-only change.

Pass the CIs.

No.

    Closes #48272 from dongjoon-hyun/SPARK-49803.
    
Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 09b7aa67ce64d7d4ecc803215eaf85464df181c5)
Signed-off-by: Dongjoon Hyun 
---
 .../scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala
index 40e8cbb6546b..55142e6d8de8 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala
@@ -97,7 +97,7 @@ abstract class DockerJDBCIntegrationSuite
 
   protected val dockerIp = DockerUtils.getDockerIp()
   val db: DatabaseOnDocker
-  val connectionTimeout = timeout(5.minutes)
+  val connectionTimeout = timeout(10.minutes)
   val keepContainer =
 sys.props.getOrElse("spark.test.docker.keepContainer", "false").toBoolean
   val removePulledImage =


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.4 updated: [SPARK-49803][SQL][TESTS] Increase `spark.test.docker.connectionTimeout` to 10min

2024-09-26 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new c787e9a89a86 [SPARK-49803][SQL][TESTS] Increase 
`spark.test.docker.connectionTimeout` to 10min
c787e9a89a86 is described below

commit c787e9a89a867b540b32faf8bb26302af256cc33
Author: Dongjoon Hyun 
AuthorDate: Thu Sep 26 17:32:29 2024 -0700

[SPARK-49803][SQL][TESTS] Increase `spark.test.docker.connectionTimeout` to 
10min

This PR aims to increase `spark.test.docker.connectionTimeout` to 10min.

Recently, various DB images fails at `connection` stage on multiple 
branches.

**MASTER** branch
https://github.com/apache/spark/actions/runs/11045311764/job/30682732260

```
[info] OracleIntegrationSuite:
[info] org.apache.spark.sql.jdbc.OracleIntegrationSuite *** ABORTED *** (5 
minutes, 17 seconds)
[info]   The code passed to eventually never returned normally. Attempted 
298 times over 5.004500551155 minutes. Last failure message: ORA-12541: 
Cannot connect. No listener at host 10.1.0.41 port 41079. 
(CONNECTION_ID=n9ZWIh+nQn+G9fkwKyoBQA==)
```

**branch-3.5** branch
https://github.com/apache/spark/actions/runs/10939696926/job/30370552237

```
[info] MsSqlServerNamespaceSuite:
[info] org.apache.spark.sql.jdbc.v2.MsSqlServerNamespaceSuite *** ABORTED 
*** (5 minutes, 42 seconds)
[info]   The code passed to eventually never returned normally. Attempted 
11 times over 5.48763128241 minutes. Last failure message: The TCP/IP 
connection to the host 10.1.0.56, port 35345 has failed. Error: "Connection 
refused (Connection refused). Verify the connection properties. Make sure that 
an instance of SQL Server is running on the host and accepting TCP/IP 
connections at the port. Make sure that TCP connections to the port are not 
blocked by a firewall.".. (DockerJDBCInt [...]
```

**branch-3.4** branch
https://github.com/apache/spark/actions/runs/10937842509/job/30364658576

```
[info] MsSqlServerNamespaceSuite:
[info] org.apache.spark.sql.jdbc.v2.MsSqlServerNamespaceSuite *** ABORTED 
*** (5 minutes, 42 seconds)
[info]   The code passed to eventually never returned normally. Attempted 
11 times over 5.48755564563 minutes. Last failure message: The TCP/IP 
connection to the host 10.1.0.153, port 46153 has failed. Error: "Connection 
refused (Connection refused). Verify the connection properties. Make sure that 
an instance of SQL Server is running on the host and accepting TCP/IP 
connections at the port. Make sure that TCP connections to the port are not 
blocked by a firewall.".. (DockerJDBCIn [...]
```

No, this is a test-only change.

Pass the CIs.

No.

    Closes #48272 from dongjoon-hyun/SPARK-49803.
    
Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 09b7aa67ce64d7d4ecc803215eaf85464df181c5)
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 50c1783a1f97e336c8560fc03ef85ec7319672ea)
Signed-off-by: Dongjoon Hyun 
---
 .../scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala
index 40e8cbb6546b..55142e6d8de8 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala
@@ -97,7 +97,7 @@ abstract class DockerJDBCIntegrationSuite
 
   protected val dockerIp = DockerUtils.getDockerIp()
   val db: DatabaseOnDocker
-  val connectionTimeout = timeout(5.minutes)
+  val connectionTimeout = timeout(10.minutes)
   val keepContainer =
 sys.props.getOrElse("spark.test.docker.keepContainer", "false").toBoolean
   val removePulledImage =


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-49803][SQL][TESTS] Increase `spark.test.docker.connectionTimeout` to 10min

2024-09-26 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 09b7aa67ce64 [SPARK-49803][SQL][TESTS] Increase 
`spark.test.docker.connectionTimeout` to 10min
09b7aa67ce64 is described below

commit 09b7aa67ce64d7d4ecc803215eaf85464df181c5
Author: Dongjoon Hyun 
AuthorDate: Thu Sep 26 17:32:29 2024 -0700

[SPARK-49803][SQL][TESTS] Increase `spark.test.docker.connectionTimeout` to 
10min

### What changes were proposed in this pull request?

This PR aims to increase `spark.test.docker.connectionTimeout` to 10min.

### Why are the changes needed?

Recently, various DB images fails at `connection` stage on multiple 
branches.

**MASTER** branch
https://github.com/apache/spark/actions/runs/11045311764/job/30682732260

```
[info] OracleIntegrationSuite:
[info] org.apache.spark.sql.jdbc.OracleIntegrationSuite *** ABORTED *** (5 
minutes, 17 seconds)
[info]   The code passed to eventually never returned normally. Attempted 
298 times over 5.004500551155 minutes. Last failure message: ORA-12541: 
Cannot connect. No listener at host 10.1.0.41 port 41079. 
(CONNECTION_ID=n9ZWIh+nQn+G9fkwKyoBQA==)
```

**branch-3.5** branch
https://github.com/apache/spark/actions/runs/10939696926/job/30370552237

```
[info] MsSqlServerNamespaceSuite:
[info] org.apache.spark.sql.jdbc.v2.MsSqlServerNamespaceSuite *** ABORTED 
*** (5 minutes, 42 seconds)
[info]   The code passed to eventually never returned normally. Attempted 
11 times over 5.48763128241 minutes. Last failure message: The TCP/IP 
connection to the host 10.1.0.56, port 35345 has failed. Error: "Connection 
refused (Connection refused). Verify the connection properties. Make sure that 
an instance of SQL Server is running on the host and accepting TCP/IP 
connections at the port. Make sure that TCP connections to the port are not 
blocked by a firewall.".. (DockerJDBCInt [...]
```

**branch-3.4** branch
https://github.com/apache/spark/actions/runs/10937842509/job/30364658576

```
[info] MsSqlServerNamespaceSuite:
[info] org.apache.spark.sql.jdbc.v2.MsSqlServerNamespaceSuite *** ABORTED 
*** (5 minutes, 42 seconds)
[info]   The code passed to eventually never returned normally. Attempted 
11 times over 5.48755564563 minutes. Last failure message: The TCP/IP 
connection to the host 10.1.0.153, port 46153 has failed. Error: "Connection 
refused (Connection refused). Verify the connection properties. Make sure that 
an instance of SQL Server is running on the host and accepting TCP/IP 
connections at the port. Make sure that TCP connections to the port are not 
blocked by a firewall.".. (DockerJDBCIn [...]
```

### Does this PR introduce _any_ user-facing change?

No, this is a test-only change.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

    Closes #48272 from dongjoon-hyun/SPARK-49803.
    
Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala
index 8d17e0b4e36e..1df01bd3bfb6 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala
@@ -115,7 +115,7 @@ abstract class DockerJDBCIntegrationSuite
   protected val startContainerTimeout: Long =
 
timeStringAsSeconds(sys.props.getOrElse("spark.test.docker.startContainerTimeout",
 "5min"))
   protected val connectionTimeout: PatienceConfiguration.Timeout = {
-val timeoutStr = 
sys.props.getOrElse("spark.test.docker.connectionTimeout", "5min")
+val timeoutStr = 
sys.props.getOrElse("spark.test.docker.connectionTimeout", "10min")
 timeout(timeStringAsSeconds(timeoutStr).seconds)
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.5 updated: [SPARK-49791][SQL][FOLLOWUP][3.5] Fix `import` statement

2024-09-26 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new b51db8bcf80c [SPARK-49791][SQL][FOLLOWUP][3.5] Fix `import` statement
b51db8bcf80c is described below

commit b51db8bcf80cf070f93a05345640ca594301899d
Author: Dongjoon Hyun 
AuthorDate: Thu Sep 26 14:51:57 2024 -0700

[SPARK-49791][SQL][FOLLOWUP][3.5] Fix `import` statement

### What changes were proposed in this pull request?

This PR is a follow-up for `branch-3.5` due to the difference of `import` 
statement.
- #48257

### Why are the changes needed?

To fix the compilation.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #48271 from dongjoon-hyun/SPARK-49791.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
index f1664f66b7f8..4c0c750246f8 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
@@ -28,7 +28,7 @@ import org.apache.spark.sql.catalyst.catalog._
 import org.apache.spark.sql.catalyst.expressions.Literal
 import org.apache.spark.sql.catalyst.plans.logical.{AppendData, 
CreateTableAsSelect, InsertIntoStatement, LogicalPlan, OptionList, 
OverwriteByExpression, OverwritePartitionsDynamic, ReplaceTableAsSelect, 
UnresolvedTableSpec}
 import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap
-import org.apache.spark.sql.connector.catalog.{CatalogManager, CatalogPlugin, 
CatalogV2Implicits, CatalogV2Util, DelegatingCatalogExtension, Identifier, 
SupportsCatalogOptions, Table, TableCatalog, TableProvider, V1Table}
+import org.apache.spark.sql.connector.catalog.{CatalogExtension, 
CatalogManager, CatalogPlugin, CatalogV2Implicits, CatalogV2Util, Identifier, 
SupportsCatalogOptions, Table, TableCatalog, TableProvider, V1Table}
 import org.apache.spark.sql.connector.catalog.TableCapability._
 import org.apache.spark.sql.connector.catalog.TableWritePrivilege
 import org.apache.spark.sql.connector.catalog.TableWritePrivilege._


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.5 updated: [SPARK-49791][SQL] Make DelegatingCatalogExtension more extendable

2024-09-26 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new f1c69a5a687f [SPARK-49791][SQL] Make DelegatingCatalogExtension more 
extendable
f1c69a5a687f is described below

commit f1c69a5a687fdb4e5a613fe43bbf6f6366f63fda
Author: Wenchen Fan 
AuthorDate: Thu Sep 26 13:39:02 2024 -0700

[SPARK-49791][SQL] Make DelegatingCatalogExtension more extendable

### What changes were proposed in this pull request?

This PR updates `DelegatingCatalogExtension` so that it's more extendable
- `initialize` becomes not final, so that sub-classes can overwrite it
- `delegate` becomes `protected`, so that sub-classes can access it

In addition, this PR fixes a mistake that `DelegatingCatalogExtension` is 
just a convenient default implementation, it's actually the `CatalogExtension` 
interface that indicates this catalog implementation will delegate requests to 
the Spark session catalog. https://github.com/apache/spark/pull/47724 should 
use `CatalogExtension` instead.

### Why are the changes needed?

Unblock the Iceberg extension.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #48257 from cloud-fan/catalog.

Lead-authored-by: Wenchen Fan 
Co-authored-by: Wenchen Fan 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 339dd5b93316fecd0455b53b2cedee2b5333a184)
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/connector/catalog/DelegatingCatalogExtension.java   | 4 ++--
 sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala| 2 +-
 .../apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala| 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/DelegatingCatalogExtension.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/DelegatingCatalogExtension.java
index f6686d2e4d3b..786821514822 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/DelegatingCatalogExtension.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/DelegatingCatalogExtension.java
@@ -38,7 +38,7 @@ import org.apache.spark.sql.util.CaseInsensitiveStringMap;
 @Evolving
 public abstract class DelegatingCatalogExtension implements CatalogExtension {
 
-  private CatalogPlugin delegate;
+  protected CatalogPlugin delegate;
 
   @Override
   public final void setDelegateCatalog(CatalogPlugin delegate) {
@@ -51,7 +51,7 @@ public abstract class DelegatingCatalogExtension implements 
CatalogExtension {
   }
 
   @Override
-  public final void initialize(String name, CaseInsensitiveStringMap options) 
{}
+  public void initialize(String name, CaseInsensitiveStringMap options) {}
 
   @Override
   public Set capabilities() {
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
index 2506cb736f18..f1664f66b7f8 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
@@ -568,7 +568,7 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) 
{
 val canUseV2 = lookupV2Provider().isDefined || 
(df.sparkSession.sessionState.conf.getConf(
 SQLConf.V2_SESSION_CATALOG_IMPLEMENTATION).isDefined &&
 
!df.sparkSession.sessionState.catalogManager.catalog(CatalogManager.SESSION_CATALOG_NAME)
-  .isInstanceOf[DelegatingCatalogExtension])
+  .isInstanceOf[CatalogExtension])
 
 session.sessionState.sqlParser.parseMultipartIdentifier(tableName) match {
   case nameParts @ NonSessionCatalogAndIdentifier(catalog, ident) =>
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
index 7500f32ac2b9..0a86a043985e 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
@@ -27,7 +27,7 @@ import org.apache.spark.sql.catalyst.plans.logical._
 import org.apache.spark.sql.catalyst.rules.Rule
 import org.apache.spark.sql.catalyst.util.{quoteIfNeeded, toPrettySQL, 
ResolveDefaultColumns => DefaultCols}
 import org.apache.spark.sql.catalyst.util.ResolveDefaultColumns._
-import org.apache.spark.sql.connector.catalog.{CatalogManager, CatalogPlugin, 
CatalogV2Util, DelegatingCatalog

(spark) branch master updated: [SPARK-49791][SQL] Make DelegatingCatalogExtension more extendable

2024-09-26 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 339dd5b93316 [SPARK-49791][SQL] Make DelegatingCatalogExtension more 
extendable
339dd5b93316 is described below

commit 339dd5b93316fecd0455b53b2cedee2b5333a184
Author: Wenchen Fan 
AuthorDate: Thu Sep 26 13:39:02 2024 -0700

[SPARK-49791][SQL] Make DelegatingCatalogExtension more extendable

### What changes were proposed in this pull request?

This PR updates `DelegatingCatalogExtension` so that it's more extendable
- `initialize` becomes not final, so that sub-classes can overwrite it
- `delegate` becomes `protected`, so that sub-classes can access it

In addition, this PR fixes a mistake that `DelegatingCatalogExtension` is 
just a convenient default implementation, it's actually the `CatalogExtension` 
interface that indicates this catalog implementation will delegate requests to 
the Spark session catalog. https://github.com/apache/spark/pull/47724 should 
use `CatalogExtension` instead.

### Why are the changes needed?

Unblock the Iceberg extension.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #48257 from cloud-fan/catalog.

Lead-authored-by: Wenchen Fan 
Co-authored-by: Wenchen Fan 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/connector/catalog/DelegatingCatalogExtension.java   | 4 ++--
 .../apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala| 4 ++--
 .../scala/org/apache/spark/sql/internal/DataFrameWriterImpl.scala | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/DelegatingCatalogExtension.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/DelegatingCatalogExtension.java
index f6686d2e4d3b..786821514822 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/DelegatingCatalogExtension.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/DelegatingCatalogExtension.java
@@ -38,7 +38,7 @@ import org.apache.spark.sql.util.CaseInsensitiveStringMap;
 @Evolving
 public abstract class DelegatingCatalogExtension implements CatalogExtension {
 
-  private CatalogPlugin delegate;
+  protected CatalogPlugin delegate;
 
   @Override
   public final void setDelegateCatalog(CatalogPlugin delegate) {
@@ -51,7 +51,7 @@ public abstract class DelegatingCatalogExtension implements 
CatalogExtension {
   }
 
   @Override
-  public final void initialize(String name, CaseInsensitiveStringMap options) 
{}
+  public void initialize(String name, CaseInsensitiveStringMap options) {}
 
   @Override
   public Set capabilities() {
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
index 02ad2e79a564..a9ad7523c8fb 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
@@ -28,7 +28,7 @@ import org.apache.spark.sql.catalyst.plans.logical._
 import org.apache.spark.sql.catalyst.rules.Rule
 import org.apache.spark.sql.catalyst.util.{quoteIfNeeded, toPrettySQL, 
ResolveDefaultColumns => DefaultCols}
 import org.apache.spark.sql.catalyst.util.ResolveDefaultColumns._
-import org.apache.spark.sql.connector.catalog.{CatalogManager, CatalogPlugin, 
CatalogV2Util, DelegatingCatalogExtension, LookupCatalog, SupportsNamespaces, 
V1Table}
+import org.apache.spark.sql.connector.catalog.{CatalogExtension, 
CatalogManager, CatalogPlugin, CatalogV2Util, LookupCatalog, 
SupportsNamespaces, V1Table}
 import org.apache.spark.sql.connector.expressions.Transform
 import org.apache.spark.sql.errors.QueryCompilationErrors
 import org.apache.spark.sql.execution.command._
@@ -706,6 +706,6 @@ class ResolveSessionCatalog(val catalogManager: 
CatalogManager)
   private def supportsV1Command(catalog: CatalogPlugin): Boolean = {
 isSessionCatalog(catalog) && (
   SQLConf.get.getConf(SQLConf.V2_SESSION_CATALOG_IMPLEMENTATION).isEmpty ||
-catalog.isInstanceOf[DelegatingCatalogExtension])
+catalog.isInstanceOf[CatalogExtension])
   }
 }
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/internal/DataFrameWriterImpl.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/internal/DataFrameWriterImpl.scala
index f0eef9ae1cbb..8164d33f46fe 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/internal/DataFrameWriter

(spark) branch master updated: [SPARK-49800][BUILD][K8S] Upgrade `kubernetes-client` to 6.13.4

2024-09-26 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 54e62a158ead [SPARK-49800][BUILD][K8S] Upgrade `kubernetes-client` to 
6.13.4
54e62a158ead is described below

commit 54e62a158ead91d832d477a76aace40ef5b54121
Author: Bjørn Jørgensen 
AuthorDate: Thu Sep 26 13:37:39 2024 -0700

[SPARK-49800][BUILD][K8S] Upgrade `kubernetes-client` to 6.13.4

### What changes were proposed in this pull request?
Upgrade `kubernetes-client` from 6.13.3 to 6.13.4

### Why are the changes needed?
New version that have 5 fixes
[Release log 
6.13.4](https://github.com/fabric8io/kubernetes-client/releases/tag/v6.13.4)

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #48268 from bjornjorgensen/k8sclient6.13.4.

Authored-by: Bjørn Jørgensen 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 50 +--
 pom.xml   |  2 +-
 2 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 19b8a237d30a..c9a32757554b 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -159,31 +159,31 @@ jsr305/3.0.0//jsr305-3.0.0.jar
 jta/1.1//jta-1.1.jar
 jul-to-slf4j/2.0.16//jul-to-slf4j-2.0.16.jar
 kryo-shaded/4.0.2//kryo-shaded-4.0.2.jar
-kubernetes-client-api/6.13.3//kubernetes-client-api-6.13.3.jar
-kubernetes-client/6.13.3//kubernetes-client-6.13.3.jar
-kubernetes-httpclient-okhttp/6.13.3//kubernetes-httpclient-okhttp-6.13.3.jar
-kubernetes-model-admissionregistration/6.13.3//kubernetes-model-admissionregistration-6.13.3.jar
-kubernetes-model-apiextensions/6.13.3//kubernetes-model-apiextensions-6.13.3.jar
-kubernetes-model-apps/6.13.3//kubernetes-model-apps-6.13.3.jar
-kubernetes-model-autoscaling/6.13.3//kubernetes-model-autoscaling-6.13.3.jar
-kubernetes-model-batch/6.13.3//kubernetes-model-batch-6.13.3.jar
-kubernetes-model-certificates/6.13.3//kubernetes-model-certificates-6.13.3.jar
-kubernetes-model-common/6.13.3//kubernetes-model-common-6.13.3.jar
-kubernetes-model-coordination/6.13.3//kubernetes-model-coordination-6.13.3.jar
-kubernetes-model-core/6.13.3//kubernetes-model-core-6.13.3.jar
-kubernetes-model-discovery/6.13.3//kubernetes-model-discovery-6.13.3.jar
-kubernetes-model-events/6.13.3//kubernetes-model-events-6.13.3.jar
-kubernetes-model-extensions/6.13.3//kubernetes-model-extensions-6.13.3.jar
-kubernetes-model-flowcontrol/6.13.3//kubernetes-model-flowcontrol-6.13.3.jar
-kubernetes-model-gatewayapi/6.13.3//kubernetes-model-gatewayapi-6.13.3.jar
-kubernetes-model-metrics/6.13.3//kubernetes-model-metrics-6.13.3.jar
-kubernetes-model-networking/6.13.3//kubernetes-model-networking-6.13.3.jar
-kubernetes-model-node/6.13.3//kubernetes-model-node-6.13.3.jar
-kubernetes-model-policy/6.13.3//kubernetes-model-policy-6.13.3.jar
-kubernetes-model-rbac/6.13.3//kubernetes-model-rbac-6.13.3.jar
-kubernetes-model-resource/6.13.3//kubernetes-model-resource-6.13.3.jar
-kubernetes-model-scheduling/6.13.3//kubernetes-model-scheduling-6.13.3.jar
-kubernetes-model-storageclass/6.13.3//kubernetes-model-storageclass-6.13.3.jar
+kubernetes-client-api/6.13.4//kubernetes-client-api-6.13.4.jar
+kubernetes-client/6.13.4//kubernetes-client-6.13.4.jar
+kubernetes-httpclient-okhttp/6.13.4//kubernetes-httpclient-okhttp-6.13.4.jar
+kubernetes-model-admissionregistration/6.13.4//kubernetes-model-admissionregistration-6.13.4.jar
+kubernetes-model-apiextensions/6.13.4//kubernetes-model-apiextensions-6.13.4.jar
+kubernetes-model-apps/6.13.4//kubernetes-model-apps-6.13.4.jar
+kubernetes-model-autoscaling/6.13.4//kubernetes-model-autoscaling-6.13.4.jar
+kubernetes-model-batch/6.13.4//kubernetes-model-batch-6.13.4.jar
+kubernetes-model-certificates/6.13.4//kubernetes-model-certificates-6.13.4.jar
+kubernetes-model-common/6.13.4//kubernetes-model-common-6.13.4.jar
+kubernetes-model-coordination/6.13.4//kubernetes-model-coordination-6.13.4.jar
+kubernetes-model-core/6.13.4//kubernetes-model-core-6.13.4.jar
+kubernetes-model-discovery/6.13.4//kubernetes-model-discovery-6.13.4.jar
+kubernetes-model-events/6.13.4//kubernetes-model-events-6.13.4.jar
+kubernetes-model-extensions/6.13.4//kubernetes-model-extensions-6.13.4.jar
+kubernetes-model-flowcontrol/6.13.4//kubernetes-model-flowcontrol-6.13.4.jar
+kubernetes-model-gatewayapi/6.13.4//kubernetes-model-gatewayapi-6.13.4.jar
+kubernetes-model-metrics/6.13.4//kubernetes-model-metrics-6.13.4.jar
+kubernetes-model-networking/6.13.4//kubernetes-model-networking-6.13.4.jar
+kubernetes-model-node/6.13.4

(spark) branch master updated (218051a566c7 -> 87b5ffb22082)

2024-09-26 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 218051a566c7 [MINOR][SQL][TESTS] Use `formatString.format(value)` 
instead of `value.formatted(formatString)`
 add 87b5ffb22082 [SPARK-49797][INFRA] Align the running OS image of 
`maven_test.yml` to `ubuntu-latest`

No new revisions were added by this update.

Summary of changes:
 .github/workflows/maven_test.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-49786][K8S] Lower `KubernetesClusterSchedulerBackend.onDisconnected` log level to debug

2024-09-25 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 562977928772 [SPARK-49786][K8S] Lower 
`KubernetesClusterSchedulerBackend.onDisconnected` log level to debug
562977928772 is described below

commit 5629779287724a891c81b16f982f9529bd379c39
Author: Dongjoon Hyun 
AuthorDate: Wed Sep 25 22:34:35 2024 -0700

[SPARK-49786][K8S] Lower `KubernetesClusterSchedulerBackend.onDisconnected` 
log level to debug

### What changes were proposed in this pull request?

This PR aims to lower `KubernetesClusterSchedulerBackend.onDisconnected` 
log level to debug.

### Why are the changes needed?

This INFO-level message was added here. We already propagate the 
disconnection reason to UI, and `No executor found` has been used when an 
unknown peer is connect or disconnect.

- https://github.com/apache/spark/pull/37821

The driver can be accessed by non-executors by design. And, all other 
resource managers do not complain at INFO level.
```
INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: No 
executor found for x.x.x.0:x
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual review because this is a log level change.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #48249 from dongjoon-hyun/SPARK-49786.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala
index 4e4634504a0f..09faa2a7fb1b 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala
+++ 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala
@@ -32,7 +32,7 @@ import org.apache.spark.deploy.k8s.Config._
 import org.apache.spark.deploy.k8s.Constants._
 import org.apache.spark.deploy.k8s.submit.KubernetesClientUtils
 import org.apache.spark.deploy.security.HadoopDelegationTokenManager
-import org.apache.spark.internal.LogKeys.{COUNT, HOST_PORT, TOTAL}
+import org.apache.spark.internal.LogKeys.{COUNT, TOTAL}
 import org.apache.spark.internal.MDC
 import 
org.apache.spark.internal.config.SCHEDULER_MIN_REGISTERED_RESOURCES_RATIO
 import org.apache.spark.resource.ResourceProfile
@@ -356,7 +356,7 @@ private[spark] class KubernetesClusterSchedulerBackend(
   execIDRequester -= rpcAddress
   // Expected, executors re-establish a connection with an ID
 case _ =>
-  logInfo(log"No executor found for ${MDC(HOST_PORT, rpcAddress)}")
+  logDebug(s"No executor found for ${rpcAddress}")
   }
   }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark-kubernetes-operator) branch main updated: [SPARK-49790] Support `HPA` template for `SparkCluster`

2024-09-25 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git


The following commit(s) were added to refs/heads/main by this push:
 new b2cee84  [SPARK-49790] Support `HPA` template for `SparkCluster`
b2cee84 is described below

commit b2cee8443e7760b82e63bc9b343a5b9279c0ae6a
Author: Dongjoon Hyun 
AuthorDate: Wed Sep 25 14:58:30 2024 -0700

[SPARK-49790] Support `HPA` template for `SparkCluster`

### What changes were proposed in this pull request?

This PR aims to support `HPA` template for `SparkCluster`.

### Why are the changes needed?

Although `SparkCluster` needs generated values for the following `HPA` 
field.s
```
  maxReplicas:
  minReplicas:
  scaleTargetRef:
apiVersion: apps/v1
kind: StatefulSet
name:
```

We still can allow the users to tune HPA for their cluster usage pattern 
like the following.
```yaml
horizontalPodAutoscalerSpec:
  metrics:
  - type: Resource
resource:
  name: cpu
  target:
type: Utilization
averageUtilization: 10
  behavior:
scaleUp:
  policies:
  - type: Pods
value: 1
periodSeconds: 10
scaleDown:
  policies:
  - type: Pods
value: 1
periodSeconds: 1200
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs. And, do the manual review.

- Delete the existing CRD because it's changed.
```
$ kubectl delete crd sparkclusters.spark.apache.org
```

- Build and Install
```
$ gradle build buildDockerImage spark-operator-api:relocateGeneratedCRD

$ helm install spark-kubernetes-operator -f 
build-tools/helm/spark-kubernetes-operator/values.yaml 
build-tools/helm/spark-kubernetes-operator/
```

- Create a `SparkCluster` with HPA template via the given example.
```
$ kubectl apply -f examples/cluster-with-hpa-template.yaml

$ kubectl get hpa cluster-with-hpa-template-worker-hpa -oyaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  creationTimestamp: "2024-09-25T21:11:40Z"
  labels:
spark.operator/name: spark-kubernetes-operator
spark.operator/spark-cluster-name: cluster-with-hpa-template
  name: cluster-with-hpa-template-worker-hpa
  namespace: default
...
spec:
  behavior:
scaleDown:
  policies:
  - periodSeconds: 1200
type: Pods
value: 1
  selectPolicy: Max
scaleUp:
  policies:
  - periodSeconds: 10
type: Pods
value: 1
  selectPolicy: Max
  stabilizationWindowSeconds: 0
  maxReplicas: 2
  metrics:
  - resource:
  name: cpu
  target:
averageUtilization: 10
type: Utilization
type: Resource
  minReplicas: 1
...
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #137 from dongjoon-hyun/SPARK-49790.

    Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 examples/cluster-with-hpa-template.yaml| 64 +
 .../apache/spark/k8s/operator/spec/WorkerSpec.java |  2 +
 .../k8s/operator/SparkClusterResourceSpec.java | 66 +-
 3 files changed, 104 insertions(+), 28 deletions(-)

diff --git a/examples/cluster-with-hpa-template.yaml 
b/examples/cluster-with-hpa-template.yaml
new file mode 100644
index 000..cee5b18
--- /dev/null
+++ b/examples/cluster-with-hpa-template.yaml
@@ -0,0 +1,64 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+apiVersion: spark.apache.org/v1alpha1
+kind: SparkCluster
+metadata:
+  name: cluster-with-hpa-template
+spec:
+  runtimeVersions:
+sparkVersion: "4.0.0-preview2&quo

(spark) branch master updated: [SPARK-49775][SQL][FOLLOW-UP] Use SortedSet instead of Array with sorting

2024-09-25 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 09209f0ff503 [SPARK-49775][SQL][FOLLOW-UP] Use SortedSet instead of 
Array with sorting
09209f0ff503 is described below

commit 09209f0ff503b29f9da92ba7db8aa820c03b3c0f
Author: Hyukjin Kwon 
AuthorDate: Wed Sep 25 07:57:08 2024 -0700

[SPARK-49775][SQL][FOLLOW-UP] Use SortedSet instead of Array with sorting

### What changes were proposed in this pull request?

This PR is a followup of https://github.com/apache/spark/pull/48235 that 
addresses https://github.com/apache/spark/pull/48235#discussion_r1775020195 
comment.

### Why are the changes needed?

For better performance (in theory)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests should verify them

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #48245 from HyukjinKwon/SPARK-49775-followup.

Authored-by: Hyukjin Kwon 
Signed-off-by: Dongjoon Hyun 
---
 .../scala/org/apache/spark/sql/catalyst/util/CharsetProvider.scala| 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CharsetProvider.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CharsetProvider.scala
index d85673f2ce81..f805d2ed87b5 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CharsetProvider.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CharsetProvider.scala
@@ -18,13 +18,15 @@
  import java.nio.charset.{Charset, CharsetDecoder, CharsetEncoder, 
CodingErrorAction, IllegalCharsetNameException, UnsupportedCharsetException}
  import java.util.Locale
 
+ import scala.collection.SortedSet
+
  import org.apache.spark.sql.errors.QueryExecutionErrors
  import org.apache.spark.sql.internal.SQLConf
 
 private[sql] object CharsetProvider {
 
   final lazy val VALID_CHARSETS =
-Array("us-ascii", "iso-8859-1", "utf-8", "utf-16be", "utf-16le", "utf-16", 
"utf-32").sorted
+SortedSet("us-ascii", "iso-8859-1", "utf-8", "utf-16be", "utf-16le", 
"utf-16", "utf-32")
 
   def forName(
   charset: String,


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-49731][K8S] Support K8s volume `mount.subPathExpr` and `hostPath` volume `type`

2024-09-25 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1f2e7b87db76 [SPARK-49731][K8S] Support K8s volume `mount.subPathExpr` 
and `hostPath` volume `type`
1f2e7b87db76 is described below

commit 1f2e7b87db76ef60eded8a6db09f6690238471ce
Author: Enrico Minack 
AuthorDate: Wed Sep 25 07:53:12 2024 -0700

[SPARK-49731][K8S] Support K8s volume `mount.subPathExpr` and `hostPath` 
volume `type`

### What changes were proposed in this pull request?
Add the following config options:
- 
`spark.kubernetes.executor.volumes.[VolumeType].[VolumeName].mount.subPathExpr`
- `spark.kubernetes.executor.volumes.hostPath.[VolumeName].options.type`

### Why are the changes needed?

K8s Spec
- https://kubernetes.io/docs/concepts/storage/volumes/#hostpath-volume-types
- 
https://kubernetes.io/docs/concepts/storage/volumes/#using-subpath-expanded-environment

These are natural extensions of the existing options
- 
`spark.kubernetes.executor.volumes.[VolumeType].[VolumeName].mount.subPath`
- `spark.kubernetes.executor.volumes.hostPath.[VolumeName].options.path`

### Does this PR introduce _any_ user-facing change?
Above config options.

### How was this patch tested?
Unit tests

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #48181 from EnricoMi/k8s-volume-options.

Authored-by: Enrico Minack 
Signed-off-by: Dongjoon Hyun 
---
 .../scala/org/apache/spark/deploy/k8s/Config.scala |  2 +
 .../spark/deploy/k8s/KubernetesVolumeSpec.scala|  3 +-
 .../spark/deploy/k8s/KubernetesVolumeUtils.scala   | 18 +-
 .../k8s/features/MountVolumesFeatureStep.scala |  6 +-
 .../spark/deploy/k8s/KubernetesTestConf.scala  | 11 +++-
 .../deploy/k8s/KubernetesVolumeUtilsSuite.scala| 42 -
 .../k8s/features/LocalDirsFeatureStepSuite.scala   |  3 +-
 .../features/MountVolumesFeatureStepSuite.scala| 72 +-
 8 files changed, 144 insertions(+), 13 deletions(-)

diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
index 3a4d68c19014..9c50f8ddb00c 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
+++ 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
@@ -769,8 +769,10 @@ private[spark] object Config extends Logging {
   val KUBERNETES_VOLUMES_NFS_TYPE = "nfs"
   val KUBERNETES_VOLUMES_MOUNT_PATH_KEY = "mount.path"
   val KUBERNETES_VOLUMES_MOUNT_SUBPATH_KEY = "mount.subPath"
+  val KUBERNETES_VOLUMES_MOUNT_SUBPATHEXPR_KEY = "mount.subPathExpr"
   val KUBERNETES_VOLUMES_MOUNT_READONLY_KEY = "mount.readOnly"
   val KUBERNETES_VOLUMES_OPTIONS_PATH_KEY = "options.path"
+  val KUBERNETES_VOLUMES_OPTIONS_TYPE_KEY = "options.type"
   val KUBERNETES_VOLUMES_OPTIONS_CLAIM_NAME_KEY = "options.claimName"
   val KUBERNETES_VOLUMES_OPTIONS_CLAIM_STORAGE_CLASS_KEY = 
"options.storageClass"
   val KUBERNETES_VOLUMES_OPTIONS_MEDIUM_KEY = "options.medium"
diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesVolumeSpec.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesVolumeSpec.scala
index 9dfd40a773eb..b4fe414e3cde 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesVolumeSpec.scala
+++ 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesVolumeSpec.scala
@@ -18,7 +18,7 @@ package org.apache.spark.deploy.k8s
 
 private[spark] sealed trait KubernetesVolumeSpecificConf
 
-private[spark] case class KubernetesHostPathVolumeConf(hostPath: String)
+private[spark] case class KubernetesHostPathVolumeConf(hostPath: String, 
volumeType: String)
   extends KubernetesVolumeSpecificConf
 
 private[spark] case class KubernetesPVCVolumeConf(
@@ -42,5 +42,6 @@ private[spark] case class KubernetesVolumeSpec(
 volumeName: String,
 mountPath: String,
 mountSubPath: String,
+mountSubPathExpr: String,
 mountReadOnly: Boolean,
 volumeConf: KubernetesVolumeSpecificConf)
diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesVolumeUtils.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesVolumeUtils.scala
index 6463512c0114..88bb998d88b7 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesVolumeUtils.scala
+++ 
b/resource-mana

(spark) branch master updated: [SPARK-49746][BUILD] Upgrade Scala to 2.13.15

2024-09-25 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 983f6f434af3 [SPARK-49746][BUILD] Upgrade Scala to 2.13.15
983f6f434af3 is described below

commit 983f6f434af335b9270a0748dc5b4b18c7dc4846
Author: panbingkun 
AuthorDate: Wed Sep 25 07:50:20 2024 -0700

[SPARK-49746][BUILD] Upgrade Scala to 2.13.15

### What changes were proposed in this pull request?
The pr aims to upgrade `scala` from `2.13.14` to `2.13.15`.

### Why are the changes needed?
https://contributors.scala-lang.org/t/scala-2-13-15-release-planning/6649
https://github.com/user-attachments/assets/277cfdb4-8542-42fe-86e5-ad72ca2bba4c";>

**Note: since 2.13.15, "-Wconf:cat=deprecation:wv,any:e" no longer takes 
effect and needs to be changed to "-Wconf:any:e", "-Wconf:cat=deprecation:wv", 
please refer to the details: https://github.com/scala/scala/pull/10708**

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #48192 from panbingkun/SPARK-49746.

Lead-authored-by: panbingkun 
Co-authored-by: YangJie 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 8 
 docs/_config.yml  | 2 +-
 pom.xml   | 7 ---
 project/SparkBuild.scala  | 6 +-
 4 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 88526995293f..19b8a237d30a 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -144,7 +144,7 @@ jetty-util-ajax/11.0.23//jetty-util-ajax-11.0.23.jar
 jetty-util/11.0.23//jetty-util-11.0.23.jar
 jjwt-api/0.12.6//jjwt-api-0.12.6.jar
 jline/2.14.6//jline-2.14.6.jar
-jline/3.25.1//jline-3.25.1.jar
+jline/3.26.3//jline-3.26.3.jar
 jna/5.14.0//jna-5.14.0.jar
 joda-time/2.13.0//joda-time-2.13.0.jar
 jodd-core/3.5.2//jodd-core-3.5.2.jar
@@ -252,11 +252,11 @@ py4j/0.10.9.7//py4j-0.10.9.7.jar
 remotetea-oncrpc/1.1.2//remotetea-oncrpc-1.1.2.jar
 rocksdbjni/9.5.2//rocksdbjni-9.5.2.jar
 scala-collection-compat_2.13/2.7.0//scala-collection-compat_2.13-2.7.0.jar
-scala-compiler/2.13.14//scala-compiler-2.13.14.jar
-scala-library/2.13.14//scala-library-2.13.14.jar
+scala-compiler/2.13.15//scala-compiler-2.13.15.jar
+scala-library/2.13.15//scala-library-2.13.15.jar
 
scala-parallel-collections_2.13/1.0.4//scala-parallel-collections_2.13-1.0.4.jar
 scala-parser-combinators_2.13/2.4.0//scala-parser-combinators_2.13-2.4.0.jar
-scala-reflect/2.13.14//scala-reflect-2.13.14.jar
+scala-reflect/2.13.15//scala-reflect-2.13.15.jar
 scala-xml_2.13/2.3.0//scala-xml_2.13-2.3.0.jar
 slf4j-api/2.0.16//slf4j-api-2.0.16.jar
 snakeyaml-engine/2.7//snakeyaml-engine-2.7.jar
diff --git a/docs/_config.yml b/docs/_config.yml
index e74eda047041..089d6bf2097b 100644
--- a/docs/_config.yml
+++ b/docs/_config.yml
@@ -22,7 +22,7 @@ include:
 SPARK_VERSION: 4.0.0-SNAPSHOT
 SPARK_VERSION_SHORT: 4.0.0
 SCALA_BINARY_VERSION: "2.13"
-SCALA_VERSION: "2.13.14"
+SCALA_VERSION: "2.13.15"
 SPARK_ISSUE_TRACKER_URL: https://issues.apache.org/jira/browse/SPARK
 SPARK_GITHUB_URL: https://github.com/apache/spark
 # Before a new release, we should:
diff --git a/pom.xml b/pom.xml
index 131e754da815..f3dc92426ac4 100644
--- a/pom.xml
+++ b/pom.xml
@@ -169,7 +169,7 @@
 
 3.2.2
 4.4
-2.13.14
+2.13.15
 2.13
 2.2.0
 4.9.1
@@ -226,7 +226,7 @@
 and ./python/packaging/connect/setup.py too.
 -->
 17.0.0
-3.0.0-M2
+3.0.0
 0.12.6
 
 
@@ -3051,7 +3051,8 @@
   -explaintypes
   -release
   17
-  -Wconf:cat=deprecation:wv,any:e
+  -Wconf:any:e
+  -Wconf:cat=deprecation:wv
   -Wunused:imports
   -Wconf:cat=scaladoc:wv
   
-Wconf:msg=^(?=.*?method|value|type|object|trait|inheritance)(?=.*?deprecated)(?=.*?since
 2.13).+$:e
diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index 2f390cb70baa..82950fb30287 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -234,7 +234,11 @@ object SparkBuild extends PomBuild {
 // replace -Xfatal-warnings with fine-grained configuration, since 
2.13.2
 // verbose warning on deprecation, error on all others
 // see `scalac -Wconf:help` for details
-"-Wconf:cat=deprecation:wv,any:e",
+// since 2.13.15, "-Wconf:cat=deprecation:wv,any:e" no longer takes 
effect and needs to
+// be changed to "-Wconf:any:e&q

(spark-kubernetes-operator) branch main updated: [SPARK-49778] Remove (master|worker) prefix from field names of `(Master|Worker)Spec`

2024-09-25 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git


The following commit(s) were added to refs/heads/main by this push:
 new 52b9aef  [SPARK-49778] Remove (master|worker) prefix from field names 
of `(Master|Worker)Spec`
52b9aef is described below

commit 52b9aef3706581c4a2b0def74e4f16b133a5463b
Author: Dongjoon Hyun 
AuthorDate: Wed Sep 25 00:04:35 2024 -0700

[SPARK-49778] Remove (master|worker) prefix from field names of 
`(Master|Worker)Spec`

### What changes were proposed in this pull request?

This PR aims to remove the redundant `master` or `worker` prefixes from 
field names of `MasterSpec` and `WorkerSpec`. For example,
```
- workerSpec.workerStatefulSetSpec
+ workerSpec.statefulSetSpec
```

### Why are the changes needed?

To simplify `MasterSpec` and `WorkerSpec` by removing repetitions.

### Does this PR introduce _any_ user-facing change?

No, this is not released yet.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #136 from dongjoon-hyun/SPARK-49778.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 examples/cluster-with-hpa.yaml |  2 +-
 examples/cluster-with-template.yaml|  6 ++--
 .../apache/spark/k8s/operator/spec/MasterSpec.java |  8 +++---
 .../apache/spark/k8s/operator/spec/WorkerSpec.java |  8 +++---
 .../k8s/operator/SparkClusterResourceSpec.java | 18 
 .../k8s/operator/SparkClusterResourceSpecTest.java | 32 +++---
 6 files changed, 34 insertions(+), 40 deletions(-)

diff --git a/examples/cluster-with-hpa.yaml b/examples/cluster-with-hpa.yaml
index a91384a..ca82088 100644
--- a/examples/cluster-with-hpa.yaml
+++ b/examples/cluster-with-hpa.yaml
@@ -25,7 +25,7 @@ spec:
   minWorkers: 1
   maxWorkers: 3
   workerSpec:
-workerStatefulSetSpec:
+statefulSetSpec:
   template:
 spec:
   containers:
diff --git a/examples/cluster-with-template.yaml 
b/examples/cluster-with-template.yaml
index d9e12a2..66c6516 100644
--- a/examples/cluster-with-template.yaml
+++ b/examples/cluster-with-template.yaml
@@ -56,10 +56,10 @@ spec:
   annotations:
 customAnnotation: "svc1"
   workerSpec:
-workerStatefulSetMetadata:
+statefulSetMetadata:
   annotations:
 customAnnotation: "annotation"
-workerStatefulSetSpec:
+statefulSetSpec:
   template:
 spec:
   priorityClassName: system-cluster-critical
@@ -83,7 +83,7 @@ spec:
   limits:
 cpu: "0.1"
 memory: "10Mi"
-workerServiceMetadata:
+serviceMetadata:
   annotations:
 customAnnotation: "annotation"
   sparkConf:
diff --git 
a/spark-operator-api/src/main/java/org/apache/spark/k8s/operator/spec/MasterSpec.java
 
b/spark-operator-api/src/main/java/org/apache/spark/k8s/operator/spec/MasterSpec.java
index 7becfc0..c04a2be 100644
--- 
a/spark-operator-api/src/main/java/org/apache/spark/k8s/operator/spec/MasterSpec.java
+++ 
b/spark-operator-api/src/main/java/org/apache/spark/k8s/operator/spec/MasterSpec.java
@@ -34,8 +34,8 @@ import lombok.NoArgsConstructor;
 @Builder
 @JsonInclude(JsonInclude.Include.NON_NULL)
 public class MasterSpec {
-  protected StatefulSetSpec masterStatefulSetSpec;
-  protected ObjectMeta masterStatefulSetMetadata;
-  protected ServiceSpec masterServiceSpec;
-  protected ObjectMeta masterServiceMetadata;
+  protected StatefulSetSpec statefulSetSpec;
+  protected ObjectMeta statefulSetMetadata;
+  protected ServiceSpec serviceSpec;
+  protected ObjectMeta serviceMetadata;
 }
diff --git 
a/spark-operator-api/src/main/java/org/apache/spark/k8s/operator/spec/WorkerSpec.java
 
b/spark-operator-api/src/main/java/org/apache/spark/k8s/operator/spec/WorkerSpec.java
index 2c5beb1..04f5abe 100644
--- 
a/spark-operator-api/src/main/java/org/apache/spark/k8s/operator/spec/WorkerSpec.java
+++ 
b/spark-operator-api/src/main/java/org/apache/spark/k8s/operator/spec/WorkerSpec.java
@@ -34,8 +34,8 @@ import lombok.NoArgsConstructor;
 @Builder
 @JsonInclude(JsonInclude.Include.NON_NULL)
 public class WorkerSpec {
-  protected StatefulSetSpec workerStatefulSetSpec;
-  protected ObjectMeta workerStatefulSetMetadata;
-  protected ServiceSpec workerServiceSpec;
-  protected ObjectMeta workerServiceMetadata;
+  protected StatefulSetSpec statefulSetSpec;
+  protected ObjectMeta statefulSetMetadata;
+  protected ServiceSpec serviceSpec;
+  protected ObjectMeta serviceMetadata;
 }
diff --git 
a/spark-submission-worker/src/main/java/org/apache/spark/k8s/operator/SparkClusterResourceSpec.java
 
b/spark-submission-

(spark-website) branch asf-site updated: Update` latest` to 3.5.3 (#559)

2024-09-24 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 84265de833 Update` latest` to 3.5.3 (#559)
84265de833 is described below

commit 84265de83327c94b4aef65b735cb023ea1542bab
Author: Haejoon Lee 
AuthorDate: Wed Sep 25 05:14:57 2024 +0900

Update` latest` to 3.5.3 (#559)
---
 site/docs/latest | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/site/docs/latest b/site/docs/latest
index 80d13b7d9b..678fd88a33 12
--- a/site/docs/latest
+++ b/site/docs/latest
@@ -1 +1 @@
-3.5.2
\ No newline at end of file
+3.5.3
\ No newline at end of file


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-49713][PYTHON][FOLLOWUP] Make function `count_min_sketch` accept long seed

2024-09-24 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 55d0233d19cc [SPARK-49713][PYTHON][FOLLOWUP] Make function 
`count_min_sketch` accept long seed
55d0233d19cc is described below

commit 55d0233d19cc52bee91a9619057d9b6f33165a0a
Author: Ruifeng Zheng 
AuthorDate: Tue Sep 24 07:48:23 2024 -0700

[SPARK-49713][PYTHON][FOLLOWUP] Make function `count_min_sketch` accept 
long seed

### What changes were proposed in this pull request?
Make function `count_min_sketch` accept long seed

### Why are the changes needed?
existing implementation only accepts int seed, which is inconsistent with 
other `ExpressionWithRandomSeed`:

```py
In [3]: >>> from pyspark.sql import functions as sf
   ...: >>> spark.range(100).select(
   ...: ... sf.hex(sf.count_min_sketch("id", sf.lit(1.5), 0.6, 
111))
   ...: ... ).show(truncate=False)

...
AnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve 
"count_min_sketch(id, 1.5, 0.6, 111)" due to data type 
mismatch: The 4th parameter requires the "INT" type, however 
"111" has the type "BIGINT". SQLSTATE: 42K09;
'Aggregate [unresolvedalias('hex(count_min_sketch(id#64L, 1.5, 0.6, 
111, 0, 0)))]
+- Range (0, 100, step=1, splits=Some(12))
...

```

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
added doctest

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #48223 from zhengruifeng/count_min_sk_long_seed.

Authored-by: Ruifeng Zheng 
Signed-off-by: Dongjoon Hyun 
---
 python/pyspark/sql/connect/functions/builtin.py|  3 +--
 python/pyspark/sql/functions/builtin.py| 14 +-
 .../src/main/scala/org/apache/spark/sql/functions.scala|  2 +-
 .../catalyst/expressions/aggregate/CountMinSketchAgg.scala |  8 ++--
 4 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/python/pyspark/sql/connect/functions/builtin.py 
b/python/pyspark/sql/connect/functions/builtin.py
index 2a39bc6bfddd..6953230f5b42 100644
--- a/python/pyspark/sql/connect/functions/builtin.py
+++ b/python/pyspark/sql/connect/functions/builtin.py
@@ -70,7 +70,6 @@ from pyspark.sql.types import (
 StringType,
 )
 from pyspark.sql.utils import enum_to_value as _enum_to_value
-from pyspark.util import JVM_INT_MAX
 
 # The implementation of pandas_udf is embedded in 
pyspark.sql.function.pandas_udf
 # for code reuse.
@@ -1130,7 +1129,7 @@ def count_min_sketch(
 confidence: Union[Column, float],
 seed: Optional[Union[Column, int]] = None,
 ) -> Column:
-_seed = lit(random.randint(0, JVM_INT_MAX)) if seed is None else lit(seed)
+_seed = lit(random.randint(0, sys.maxsize)) if seed is None else lit(seed)
 return _invoke_function_over_columns("count_min_sketch", col, lit(eps), 
lit(confidence), _seed)
 
 
diff --git a/python/pyspark/sql/functions/builtin.py 
b/python/pyspark/sql/functions/builtin.py
index 2688f9daa23a..09a286fe7c94 100644
--- a/python/pyspark/sql/functions/builtin.py
+++ b/python/pyspark/sql/functions/builtin.py
@@ -6080,7 +6080,19 @@ def count_min_sketch(
 
|00010064000100025D96391C00320032|
 
++
 
-Example 3: Using a random seed
+Example 3: Using a long seed
+
+>>> from pyspark.sql import functions as sf
+>>> spark.range(100).select(
+... sf.hex(sf.count_min_sketch("id", sf.lit(1.5), 0.2, 
111))
+... ).show(truncate=False)
+
++
+|hex(count_min_sketch(id, 1.5, 0.2, 111))  
  |
+
++
+
|000100640001000244078BA100320032|
+
++
+
+Example 4: Using a random seed
 
 >>> from pyspark.sql import functions as sf
 >>> spark.range(100).select(
diff --git a/sql/api/src/main/scala/org/apache/spark/sql/functions.scala 
b/sql/api/src/main/scala/org/apache/spark/sql/functions.scala
index d9bceabe88f8..ab69789c75f5 100644
--- a/sql/api/src/main/scala/org/apache/spark/sql/functions.scala
+++ b/sql/api/src/main/scala/org/apache/

(spark) branch branch-3.4 updated: [SPARK-49750][DOC] Mention delegation token support in K8s mode

2024-09-24 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 2256141cbb45 [SPARK-49750][DOC] Mention delegation token support in 
K8s mode
2256141cbb45 is described below

commit 2256141cbb455bd38b8676104f51207cc30a
Author: Cheng Pan 
AuthorDate: Tue Sep 24 07:40:58 2024 -0700

[SPARK-49750][DOC] Mention delegation token support in K8s mode

Update docs to mention delegation token support in K8s mode.

The delegation token support in K8s mode has been implemented since 3.0.0 
via SPARK-23257.

Yes, docs are updated.

Review.

No.

Closes #48199 from pan3793/SPARK-49750.

Authored-by: Cheng Pan 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit dedf5aa91827f32736ce5dae2eb123ba4e244c3b)
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit b513297f661bf314bcb47033f408810b14ea39b8)
Signed-off-by: Dongjoon Hyun 
---
 docs/security.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/security.md b/docs/security.md
index 2c61a64c36a6..1e694d53ff5a 100644
--- a/docs/security.md
+++ b/docs/security.md
@@ -814,7 +814,7 @@ mechanism (see `java.util.ServiceLoader`). Implementations 
of
 `org.apache.spark.security.HadoopDelegationTokenProvider` can be made 
available to Spark
 by listing their names in the corresponding file in the jar's 
`META-INF/services` directory.
 
-Delegation token support is currently only supported in YARN and Mesos modes. 
Consult the
+Delegation token support is currently only supported in YARN, Kubernetes and 
Mesos modes. Consult the
 deployment-specific page for more information.
 
 The following options provides finer-grained control for this feature:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.5 updated: [SPARK-49750][DOC] Mention delegation token support in K8s mode

2024-09-24 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new b513297f661b [SPARK-49750][DOC] Mention delegation token support in 
K8s mode
b513297f661b is described below

commit b513297f661bf314bcb47033f408810b14ea39b8
Author: Cheng Pan 
AuthorDate: Tue Sep 24 07:40:58 2024 -0700

[SPARK-49750][DOC] Mention delegation token support in K8s mode

Update docs to mention delegation token support in K8s mode.

The delegation token support in K8s mode has been implemented since 3.0.0 
via SPARK-23257.

Yes, docs are updated.

Review.

No.

Closes #48199 from pan3793/SPARK-49750.

Authored-by: Cheng Pan 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit dedf5aa91827f32736ce5dae2eb123ba4e244c3b)
Signed-off-by: Dongjoon Hyun 
---
 docs/security.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/security.md b/docs/security.md
index 10201e6ed540..e6ef9ea584a1 100644
--- a/docs/security.md
+++ b/docs/security.md
@@ -840,7 +840,7 @@ mechanism (see `java.util.ServiceLoader`). Implementations 
of
 `org.apache.spark.security.HadoopDelegationTokenProvider` can be made 
available to Spark
 by listing their names in the corresponding file in the jar's 
`META-INF/services` directory.
 
-Delegation token support is currently only supported in YARN and Mesos modes. 
Consult the
+Delegation token support is currently only supported in YARN, Kubernetes and 
Mesos modes. Consult the
 deployment-specific page for more information.
 
 The following options provides finer-grained control for this feature:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-49750][DOC] Mention delegation token support in K8s mode

2024-09-24 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new dedf5aa91827 [SPARK-49750][DOC] Mention delegation token support in 
K8s mode
dedf5aa91827 is described below

commit dedf5aa91827f32736ce5dae2eb123ba4e244c3b
Author: Cheng Pan 
AuthorDate: Tue Sep 24 07:40:58 2024 -0700

[SPARK-49750][DOC] Mention delegation token support in K8s mode

### What changes were proposed in this pull request?

Update docs to mention delegation token support in K8s mode.

### Why are the changes needed?

The delegation token support in K8s mode has been implemented since 3.0.0 
via SPARK-23257.

### Does this PR introduce _any_ user-facing change?

Yes, docs are updated.

### How was this patch tested?

Review.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #48199 from pan3793/SPARK-49750.

Authored-by: Cheng Pan 
Signed-off-by: Dongjoon Hyun 
---
 docs/security.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/security.md b/docs/security.md
index b97abfeacf24..c7d3fd5f8c36 100644
--- a/docs/security.md
+++ b/docs/security.md
@@ -947,7 +947,7 @@ mechanism (see `java.util.ServiceLoader`). Implementations 
of
 `org.apache.spark.security.HadoopDelegationTokenProvider` can be made 
available to Spark
 by listing their names in the corresponding file in the jar's 
`META-INF/services` directory.
 
-Delegation token support is currently only supported in YARN mode. Consult the
+Delegation token support is currently only supported in YARN and Kubernetes 
mode. Consult the
 deployment-specific page for more information.
 
 The following options provides finer-grained control for this feature:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-49753][BUILD] Upgrade ZSTD-JNI to 1.5.6-6

2024-09-23 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 438a6e7782ec [SPARK-49753][BUILD] Upgrade ZSTD-JNI to 1.5.6-6
438a6e7782ec is described below

commit 438a6e7782ece23492928cfbb2d01e14104dfd9a
Author: yangjie01 
AuthorDate: Mon Sep 23 21:39:27 2024 -0700

[SPARK-49753][BUILD] Upgrade ZSTD-JNI to 1.5.6-6

### What changes were proposed in this pull request?
The pr aims to upgrade `zstd-jni` from `1.5.6-5` to `1.5.6-6`.

### Why are the changes needed?
The new version allow including compression level when training a 
dictionary:  
https://github.com/luben/zstd-jni/commit/3ca26eed6c84fb09c382854ead527188e643e206#diff-bd5c0f62db7cb85cac88c7b6cfad1c0e5e2f433ba45097761654829627b7a31c

All changes in the new version are as follows:
- https://github.com/luben/zstd-jni/compare/v1.5.6-5...v1.5.6-6

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #48204 from LuciferYang/zstd-jni-1.5.6-6.

Lead-authored-by: yangjie01 
Co-authored-by: YangJie 
Signed-off-by: Dongjoon Hyun 
---
 .../ZStandardBenchmark-jdk21-results.txt   | 56 +++---
 core/benchmarks/ZStandardBenchmark-results.txt | 56 +++---
 dev/deps/spark-deps-hadoop-3-hive-2.3  |  2 +-
 pom.xml|  2 +-
 4 files changed, 58 insertions(+), 58 deletions(-)

diff --git a/core/benchmarks/ZStandardBenchmark-jdk21-results.txt 
b/core/benchmarks/ZStandardBenchmark-jdk21-results.txt
index b3bffea826e5..f6bd681451d5 100644
--- a/core/benchmarks/ZStandardBenchmark-jdk21-results.txt
+++ b/core/benchmarks/ZStandardBenchmark-jdk21-results.txt
@@ -2,48 +2,48 @@
 Benchmark ZStandardCompressionCodec
 

 
-OpenJDK 64-Bit Server VM 21.0.4+7-LTS on Linux 6.5.0-1025-azure
+OpenJDK 64-Bit Server VM 21.0.4+7-LTS on Linux 6.8.0-1014-azure
 AMD EPYC 7763 64-Core Processor
 Benchmark ZStandardCompressionCodec:Best Time(ms)   Avg 
Time(ms)   Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 
--
-Compression 1 times at level 1 without buffer pool657  
  670  14  0.0   65699.2   1.0X
-Compression 1 times at level 2 without buffer pool697  
  697   1  0.0   69673.4   0.9X
-Compression 1 times at level 3 without buffer pool799  
  802   3  0.0   79855.2   0.8X
-Compression 1 times at level 1 with buffer pool   593  
  595   1  0.0   59326.9   1.1X
-Compression 1 times at level 2 with buffer pool   622  
  624   3  0.0   62194.1   1.1X
-Compression 1 times at level 3 with buffer pool   732  
  733   1  0.0   73178.6   0.9X
+Compression 1 times at level 1 without buffer pool659  
  676  16  0.0   65860.7   1.0X
+Compression 1 times at level 2 without buffer pool721  
  723   2  0.0   72135.5   0.9X
+Compression 1 times at level 3 without buffer pool815  
  816   1  0.0   81500.6   0.8X
+Compression 1 times at level 1 with buffer pool   608  
  609   0  0.0   60846.6   1.1X
+Compression 1 times at level 2 with buffer pool   645  
  647   3  0.0   64476.3   1.0X
+Compression 1 times at level 3 with buffer pool   746  
  746   1  0.0   74584.0   0.9X
 
-OpenJDK 64-Bit Server VM 21.0.4+7-LTS on Linux 6.5.0-1025-azure
+OpenJDK 64-Bit Server VM 21.0.4+7-LTS on Linux 6.8.0-1014-azure
 AMD EPYC 7763 64-Core Processor
 Benchmark ZStandardCompressionCodec:Best Time(ms)   
Avg Time(ms)   Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 
--
-Decompression 1 times from level 1 without buffer pool813  
  820  11  0.0   81273.2   1.0X
-Decompression 1 times from level 2 without buffer pool810  
  813   3  0.0   80986.2

(spark) branch branch-3.4 updated: [SPARK-49760][YARN] Correct handling of `SPARK_USER` env variable override in app master

2024-09-23 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new e825ac3c272e [SPARK-49760][YARN] Correct handling of `SPARK_USER` env 
variable override in app master
e825ac3c272e is described below

commit e825ac3c272e65083aa4d06648e7cda16c04aa5e
Author: Chris Nauroth 
AuthorDate: Mon Sep 23 21:36:48 2024 -0700

[SPARK-49760][YARN] Correct handling of `SPARK_USER` env variable override 
in app master

### What changes were proposed in this pull request?

This patch corrects handling of a user-supplied `SPARK_USER` environment 
variable in the YARN app master. Currently, the user-supplied value gets 
appended to the default, like a classpath entry. The patch fixes it by using 
only the user-supplied value.

### Why are the changes needed?

Overriding the `SPARK_USER` environment variable in the YARN app master 
with configuration property `spark.yarn.appMasterEnv.SPARK_USER` currently 
results in an incorrect value. `Client#setupLaunchEnv` first sets a default in 
the environment map using the Hadoop user. After that, 
`YarnSparkHadoopUtil.addPathToEnvironment` sees the existing value in the map 
and interprets the user-supplied value as needing to be appended like a 
classpath entry. The end result is the Hadoop user appende [...]

### Does this PR introduce _any_ user-facing change?

Yes, the app master now uses the user-supplied `SPARK_USER` if specified. 
(The default is still the Hadoop user.)

### How was this patch tested?

* Existing unit tests pass.
* Added new unit tests covering default and overridden `SPARK_USER` for the 
app master. The override test fails without this patch, and then passes after 
the patch is applied.
* Manually tested in a live YARN cluster as shown below.

Manual testing used the `DFSReadWriteTest` job with overrides of 
`SPARK_USER`:

```
spark-submit \
--deploy-mode cluster \
--files all-lines.txt \
--class org.apache.spark.examples.DFSReadWriteTest \
--conf spark.yarn.appMasterEnv.SPARK_USER=sparkuser_appMaster \
--conf spark.driverEnv.SPARK_USER=sparkuser_driver \
--conf spark.executorEnv.SPARK_USER=sparkuser_executor \
/usr/lib/spark/examples/jars/spark-examples.jar \
all-lines.txt /tmp/DFSReadWriteTest
```

Before the patch, we can see the app master's `SPARK_USER` mishandled by 
looking at the `_SUCCESS` file in HDFS:

```
hdfs dfs -ls -R /tmp/DFSReadWriteTest

drwxr-xr-x   - cnauroth:sparkuser_appMaster hadoop  0 2024-09-20 
23:35 /tmp/DFSReadWriteTest/dfs_read_write_test
-rw-r--r--   1 cnauroth:sparkuser_appMaster hadoop  0 2024-09-20 
23:35 /tmp/DFSReadWriteTest/dfs_read_write_test/_SUCCESS
-rw-r--r--   1 sparkuser_executor  hadoop2295080 
2024-09-20 23:35 /tmp/DFSReadWriteTest/dfs_read_write_test/part-0
-rw-r--r--   1 sparkuser_executor  hadoop2288718 
2024-09-20 23:35 /tmp/DFSReadWriteTest/dfs_read_write_test/part-1
```

After the patch, we can see it working correctly:

```
hdfs dfs -ls -R /tmp/DFSReadWriteTest
drwxr-xr-x   - sparkuser_appMaster hadoop  0 2024-09-23 17:13 
/tmp/DFSReadWriteTest/dfs_read_write_test
-rw-r--r--   1 sparkuser_appMaster hadoop  0 2024-09-23 17:13 
/tmp/DFSReadWriteTest/dfs_read_write_test/_SUCCESS
-rw-r--r--   1 sparkuser_executor  hadoop2295080 2024-09-23 17:13 
/tmp/DFSReadWriteTest/dfs_read_write_test/part-0
-rw-r--r--   1 sparkuser_executor  hadoop2288718 2024-09-23 17:13 
/tmp/DFSReadWriteTest/dfs_read_write_test/part-1
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #48216 from cnauroth/SPARK-49760-branch-3.5.

Authored-by: Chris Nauroth 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit dd76a82734564afeed4225d19331243f7b926ae8)
Signed-off-by: Dongjoon Hyun 
---
 .../main/scala/org/apache/spark/deploy/yarn/Client.scala |  7 +--
 .../scala/org/apache/spark/deploy/yarn/ClientSuite.scala | 16 
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
index d8e9cd8b47d8..09fc5b7a0caa 100644
--- 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
+++ 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
@@ -904,14 +904,13 @@ private[spark] class Client(
   /**
* Set up the environment for launching our ApplicationMaster container.
*/
-  pr

(spark) branch branch-3.5 updated (e7ca790ed4f0 -> dd76a8273456)

2024-09-23 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


from e7ca790ed4f0 [SPARK-49699][SS] Disable PruneFilters for streaming 
workloads
 add dd76a8273456 [SPARK-49760][YARN] Correct handling of `SPARK_USER` env 
variable override in app master

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/deploy/yarn/Client.scala |  7 +--
 .../scala/org/apache/spark/deploy/yarn/ClientSuite.scala | 16 
 2 files changed, 21 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-49760][YARN] Correct handling of `SPARK_USER` env variable override in app master

2024-09-23 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 35e5d290deee [SPARK-49760][YARN] Correct handling of `SPARK_USER` env 
variable override in app master
35e5d290deee is described below

commit 35e5d290deee9cf2a913571407e2257217e0e9e2
Author: Chris Nauroth 
AuthorDate: Mon Sep 23 21:35:32 2024 -0700

[SPARK-49760][YARN] Correct handling of `SPARK_USER` env variable override 
in app master

### What changes were proposed in this pull request?

This patch corrects handling of a user-supplied `SPARK_USER` environment 
variable in the YARN app master. Currently, the user-supplied value gets 
appended to the default, like a classpath entry. The patch fixes it by using 
only the user-supplied value.

### Why are the changes needed?

Overriding the `SPARK_USER` environment variable in the YARN app master 
with configuration property `spark.yarn.appMasterEnv.SPARK_USER` currently 
results in an incorrect value. `Client#setupLaunchEnv` first sets a default in 
the environment map using the Hadoop user. After that, 
`YarnSparkHadoopUtil.addPathToEnvironment` sees the existing value in the map 
and interprets the user-supplied value as needing to be appended like a 
classpath entry. The end result is the Hadoop user appende [...]

### Does this PR introduce _any_ user-facing change?

Yes, the app master now uses the user-supplied `SPARK_USER` if specified. 
(The default is still the Hadoop user.)

### How was this patch tested?

* Existing unit tests pass.
* Added new unit tests covering default and overridden `SPARK_USER` for the 
app master. The override test fails without this patch, and then passes after 
the patch is applied.
* Manually tested in a live YARN cluster as shown below.

Manual testing used the `DFSReadWriteTest` job with overrides of 
`SPARK_USER`:

```
spark-submit \
--deploy-mode cluster \
--files all-lines.txt \
--class org.apache.spark.examples.DFSReadWriteTest \
--conf spark.yarn.appMasterEnv.SPARK_USER=sparkuser_appMaster \
--conf spark.driverEnv.SPARK_USER=sparkuser_driver \
--conf spark.executorEnv.SPARK_USER=sparkuser_executor \
/usr/lib/spark/examples/jars/spark-examples.jar \
all-lines.txt /tmp/DFSReadWriteTest
```

Before the patch, we can see the app master's `SPARK_USER` mishandled by 
looking at the `_SUCCESS` file in HDFS:

```
hdfs dfs -ls -R /tmp/DFSReadWriteTest

drwxr-xr-x   - cnauroth:sparkuser_appMaster hadoop  0 2024-09-20 
23:35 /tmp/DFSReadWriteTest/dfs_read_write_test
-rw-r--r--   1 cnauroth:sparkuser_appMaster hadoop  0 2024-09-20 
23:35 /tmp/DFSReadWriteTest/dfs_read_write_test/_SUCCESS
-rw-r--r--   1 sparkuser_executor  hadoop2295080 
2024-09-20 23:35 /tmp/DFSReadWriteTest/dfs_read_write_test/part-0
-rw-r--r--   1 sparkuser_executor  hadoop2288718 
2024-09-20 23:35 /tmp/DFSReadWriteTest/dfs_read_write_test/part-1
```

After the patch, we can see it working correctly:

```
hdfs dfs -ls -R /tmp/DFSReadWriteTest
drwxr-xr-x   - sparkuser_appMaster hadoop  0 2024-09-23 17:13 
/tmp/DFSReadWriteTest/dfs_read_write_test
-rw-r--r--   1 sparkuser_appMaster hadoop  0 2024-09-23 17:13 
/tmp/DFSReadWriteTest/dfs_read_write_test/_SUCCESS
-rw-r--r--   1 sparkuser_executor  hadoop2295080 2024-09-23 17:13 
/tmp/DFSReadWriteTest/dfs_read_write_test/part-0
-rw-r--r--   1 sparkuser_executor  hadoop2288718 2024-09-23 17:13 
/tmp/DFSReadWriteTest/dfs_read_write_test/part-1
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #48214 from cnauroth/SPARK-49760.

Authored-by: Chris Nauroth 
Signed-off-by: Dongjoon Hyun 
---
 .../main/scala/org/apache/spark/deploy/yarn/Client.scala |  7 +--
 .../scala/org/apache/spark/deploy/yarn/ClientSuite.scala | 16 
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
index b2c4d97bc7b0..8b621e82afe2 100644
--- 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
+++ 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
@@ -960,14 +960,13 @@ private[spark] class Client(
   /**
* Set up the environment for launching our ApplicationMaster container.
*/
-  private def setupLaunchEnv(
+  private[yarn] def setupLaunchEnv(
   stagingDirPath: Path,
   pySparkArchives: Seq[S

(spark-kubernetes-operator) branch main updated: [SPARK-49754] Support HPA for `SparkCluster`

2024-09-23 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git


The following commit(s) were added to refs/heads/main by this push:
 new a48c0a3  [SPARK-49754] Support HPA for `SparkCluster`
a48c0a3 is described below

commit a48c0a311c29b7d2a76d11134b89c72f39fbe38d
Author: Dongjoon Hyun 
AuthorDate: Mon Sep 23 09:58:07 2024 -0700

[SPARK-49754] Support HPA for `SparkCluster`

### What changes were proposed in this pull request?

This PR aims to support K8s [Horizontal Pod 
Autoscaler](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale)
 for `SparkCluster`.

### Why are the changes needed?

To allow users more flexible installation on top of static SparkClusters.

### Does this PR introduce _any_ user-facing change?

No. This is a new feature and HPA is not created if the minimum number of 
workers is equal to the maximum number of workers.

### How was this patch tested?

Pass the CIs. And, manually create an example cluster and wait until it 
scales down to the minimum number of workers.
```
$ gradle build buildDockerImage spark-operator-api:relocateGeneratedCRD
$ kubectl apply -f examples/cluster-with-hpa.yaml
```

```
Conditions: 

 │
│   TypeStatus  ReasonMessage   

   │
│   --  -----   

   │
│   AbleToScale TrueReadyForNewScale  recommended size matches 
current size
│
│   ScalingActive   TrueValidMetricFound  the HPA was able to 
successfully calculate a replica count from cpu resource utilization 
(percentage of request) │
│   ScalingLimited  TrueTooFewReplicasthe desired replica count is 
less than the minimum replica count 
│
│ Events:   

   │
│   TypeReason AgeFrom   Message

   │
│   --       ---

   │
│   Normal  SuccessfulRescale  2m31s  horizontal-pod-autoscaler  New size: 
2; reason: All metrics below target 
│
│   Normal  SuccessfulRescale  91shorizontal-pod-autoscaler  New size: 
1; reason: All metrics below target
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #135 from dongjoon-hyun/hpa.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../templates/operator-rbac.yaml   |  6 +++
 examples/cluster-with-hpa.yaml | 45 +
 .../k8s/operator/context/SparkClusterContext.java  |  7 +++
 .../SparkClusterResourceSpecFactory.java   |  3 ++
 .../reconciler/reconcilesteps/ClusterInitStep.java | 10 
 .../k8s/operator/SparkClusterResourceSpec.java | 57 ++
 .../k8s/operator/SparkClusterResourceSpecTest.java | 31 
 .../operator/SparkClusterSubmissionWorkerTest.java |  1 +
 8 files changed, 160 insertions(+)

diff --git 
a/build-tools/helm/spark-kubernetes-operator/templates/operator-rbac.yaml 
b/build-tools/helm/spark-kubernetes-operator/templates/operator-rbac.yaml
index e9fc7f0..eebbf55 100644
--- a/build-tools/helm/spark-kubernetes-operator/templates/operator-rbac.yaml
+++ b/build-tools/helm/spark-kubernetes-operator/templates/operator-rbac.yaml
@@ -35,6 +35,12 @@ rules:
   - statefulsets
 verbs:
   - '*'
+  - apiGroups:
+  - "autoscaling"
+resources:
+  - horizontalpodautoscalers
+verbs:
+  - '*'
   - apiGroups:
   - "spark.apache.org"
 resources:
diff --git a/examples/cluster-with-hpa.yaml b/examples/cluster-with-hpa.yaml
new file mode 100644
index 000..a91384a
--- /dev/null
+++ b/examples/cluster-with-hpa.yaml
@@ -0,0 +1,45 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file dist

(spark-kubernetes-operator) branch main updated: [SPARK-49742] Upgrade `README`, examples, tests to use `preview2`

2024-09-20 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git


The following commit(s) were added to refs/heads/main by this push:
 new 4aaae71  [SPARK-49742] Upgrade `README`, examples, tests to use 
`preview2`
4aaae71 is described below

commit 4aaae715ff784f64c2f35379254255ffbb8dc384
Author: Dongjoon Hyun 
AuthorDate: Fri Sep 20 16:54:21 2024 -0700

[SPARK-49742] Upgrade `README`, examples, tests to use `preview2`

### What changes were proposed in this pull request?

This PR aims to update README, examples, tests to use `Apache Spark 
4.0.0-preview2`.

### Why are the changes needed?

We can use the launched `SparkApp`s and `SparkCluster`s.
- Spark K8s Operator is built with `4.0.0-preview2` already via #133
- Apache Spark 4.0.0-preview2 images are ready via
- https://github.com/apache/spark-docker/pull/70
- https://github.com/apache/spark-docker/pull/71

### Does this PR introduce _any_ user-facing change?

No behavior change.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #134 from dongjoon-hyun/SPARK-49742.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 README.md | 6 +++---
 examples/cluster-java21.yaml  | 4 ++--
 examples/cluster-on-yunikorn.yaml | 4 ++--
 examples/cluster-with-template.yaml   | 4 ++--
 examples/pi-java21.yaml   | 4 ++--
 examples/pi-on-yunikorn.yaml  | 4 ++--
 examples/pi-scala.yaml| 4 ++--
 examples/pi-with-one-pod.yaml | 4 ++--
 examples/pi.yaml  | 4 ++--
 examples/prod-cluster-with-three-workers.yaml | 4 ++--
 examples/pyspark-pi.yaml  | 4 ++--
 examples/qa-cluster-with-one-worker.yaml  | 4 ++--
 examples/sql.yaml | 4 ++--
 .../org/apache/spark/k8s/operator/SparkClusterResourceSpec.java   | 2 +-
 tests/e2e/spark-versions/chainsaw-test.yaml   | 8 
 tests/e2e/state-transition/spark-cluster-example-succeeded.yaml   | 6 +++---
 tests/e2e/state-transition/spark-example-succeeded.yaml   | 6 +++---
 tests/e2e/watched-namespaces/spark-example.yaml   | 6 +++---
 18 files changed, 41 insertions(+), 41 deletions(-)

diff --git a/README.md b/README.md
index e306889..8db0ab1 100644
--- a/README.md
+++ b/README.md
@@ -64,7 +64,7 @@ $ ./examples/submit-pi-to-prod.sh
 {
   "action" : "CreateSubmissionResponse",
   "message" : "Driver successfully submitted as driver-20240821181327-",
-  "serverSparkVersion" : "4.0.0-preview1",
+  "serverSparkVersion" : "4.0.0-preview2",
   "submissionId" : "driver-20240821181327-",
   "success" : true
 }
@@ -73,7 +73,7 @@ $ curl 
http://localhost:6066/v1/submissions/status/driver-20240821181327-/
 {
   "action" : "SubmissionStatusResponse",
   "driverState" : "FINISHED",
-  "serverSparkVersion" : "4.0.0-preview1",
+  "serverSparkVersion" : "4.0.0-preview2",
   "submissionId" : "driver-20240821181327-",
   "success" : true,
   "workerHostPort" : "10.1.5.188:42099",
@@ -100,7 +100,7 @@ Events:
   Normal  Scheduled  14s   yunikorn  Successfully assigned 
default/pi-on-yunikorn-0-driver to node docker-desktop
   Normal  PodBindSuccessful  14s   yunikorn  Pod 
default/pi-on-yunikorn-0-driver is successfully bound to node docker-desktop
   Normal  TaskCompleted  6syunikorn  Task 
default/pi-on-yunikorn-0-driver is completed
-  Normal  Pulled 13s   kubelet   Container image 
"apache/spark:4.0.0-preview1" already present on machine
+  Normal  Pulled 13s   kubelet   Container image 
"apache/spark:4.0.0-preview2" already present on machine
   Normal  Created13s   kubelet   Created container 
spark-kubernetes-driver
   Normal  Started13s   kubelet   Started container 
spark-kubernetes-driver
 
diff --git a/examples/cluster-java21.yaml b/examples/cluster-java21.yaml
index fefe90c..abc4826 100644
--- a/examples/cluster-java21.yaml
+++ b/examples/cluster-java21.yaml
@@ -18,14 +18,14 @@ metadata:
   name: cluster-java21
 spec:
   runtimeVersions:
-spa

(spark-docker) branch master updated: [SPARK-49740] Update `publish-java17.yaml` and `publish-java21.yaml` to use `preview2` by default

2024-09-20 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new 63c0ce3  [SPARK-49740] Update `publish-java17.yaml` and 
`publish-java21.yaml` to use `preview2` by default
63c0ce3 is described below

commit 63c0ce3c0abfc94b591da7bc8cd5f90e02f16464
Author: Dongjoon Hyun 
AuthorDate: Fri Sep 20 11:15:32 2024 -0700

[SPARK-49740] Update `publish-java17.yaml` and `publish-java21.yaml` to use 
`preview2` by default

### What changes were proposed in this pull request?

This PR aims to update `publish-java17.yaml` and `publish-java21.yaml` to 
use `preview2` by default.

### Why are the changes needed?

To publish the latest images.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual review.

Closes #71 from dongjoon-hyun/SPARK-49740.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/publish-java17.yaml | 3 ++-
 .github/workflows/publish-java21.yaml | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/.github/workflows/publish-java17.yaml 
b/.github/workflows/publish-java17.yaml
index 610f839..2ca08e6 100644
--- a/.github/workflows/publish-java17.yaml
+++ b/.github/workflows/publish-java17.yaml
@@ -25,10 +25,11 @@ on:
   spark:
 description: 'The Spark version of Spark image.'
 required: true
-default: '4.0.0-preview1'
+default: '4.0.0-preview2'
 type: choice
 options:
 - 4.0.0-preview1
+- 4.0.0-preview2
   publish:
 description: 'Publish the image or not.'
 default: false
diff --git a/.github/workflows/publish-java21.yaml 
b/.github/workflows/publish-java21.yaml
index 1a2078a..b98718f 100644
--- a/.github/workflows/publish-java21.yaml
+++ b/.github/workflows/publish-java21.yaml
@@ -25,10 +25,11 @@ on:
   spark:
 description: 'The Spark version of Spark image.'
 required: true
-default: '4.0.0-preview1'
+default: '4.0.0-preview2'
 type: choice
 options:
 - 4.0.0-preview1
+- 4.0.0-preview2
   publish:
 description: 'Publish the image or not.'
 default: false


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark-docker) branch master updated: [SPARK-49736] Add Apache Spark `4.0.0-preview2` Dockerfiles

2024-09-20 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new 217a942  [SPARK-49736] Add Apache Spark `4.0.0-preview2` Dockerfiles
217a942 is described below

commit 217a9422bedb7fc3aab47c0bebed32acf0e1a737
Author: Dongjoon Hyun 
AuthorDate: Fri Sep 20 10:54:45 2024 -0700

[SPARK-49736] Add Apache Spark `4.0.0-preview2` Dockerfiles

### What changes were proposed in this pull request?

This PR aims to add `4.0.0-preview2` Dockerfiles.

### Why are the changes needed?

New release.

### Does this PR introduce _any_ user-facing change?

New release.

### How was this patch tested?

New release.

Closes #70 from dongjoon-hyun/SPARK-49736.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/build_4.0.0-preview2.yaml|  43 +++
 .github/workflows/test.yml |   1 +
 .../scala2.13-java17-python3-r-ubuntu/Dockerfile   |  29 +
 .../scala2.13-java17-python3-ubuntu/Dockerfile |  26 +
 .../scala2.13-java17-r-ubuntu/Dockerfile   |  28 +
 4.0.0-preview2/scala2.13-java17-ubuntu/Dockerfile  |  81 +
 .../scala2.13-java17-ubuntu/entrypoint.sh  | 130 +
 .../scala2.13-java21-python3-r-ubuntu/Dockerfile   |  29 +
 .../scala2.13-java21-python3-ubuntu/Dockerfile |  26 +
 .../scala2.13-java21-r-ubuntu/Dockerfile   |  28 +
 4.0.0-preview2/scala2.13-java21-ubuntu/Dockerfile  |  81 +
 .../scala2.13-java21-ubuntu/entrypoint.sh  | 130 +
 tools/template.py  |   4 +-
 versions.json  |  56 +
 14 files changed, 691 insertions(+), 1 deletion(-)

diff --git a/.github/workflows/build_4.0.0-preview2.yaml 
b/.github/workflows/build_4.0.0-preview2.yaml
new file mode 100644
index 000..7e7dbea
--- /dev/null
+++ b/.github/workflows/build_4.0.0-preview2.yaml
@@ -0,0 +1,43 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+name: "Build and Test (4.0.0-preview2)"
+
+on:
+  pull_request:
+branches:
+  - 'master'
+paths:
+  - '4.0.0-preview2/**'
+
+jobs:
+  run-build:
+strategy:
+  matrix:
+image-type: ["all", "python", "scala", "r"]
+java: [17, 21]
+name: Run
+secrets: inherit
+uses: ./.github/workflows/main.yml
+with:
+  spark: 4.0.0-preview2
+  scala: 2.13
+  java: ${{ matrix.java }}
+  image-type: ${{ matrix.image-type }}
+
diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
index f627405..e8f941c 100644
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -28,6 +28,7 @@ on:
 default: '3.5.2'
 type: choice
 options:
+- 4.0.0-preview2
 - 4.0.0-preview1
 - 3.5.2
 - 3.5.1
diff --git a/4.0.0-preview2/scala2.13-java17-python3-r-ubuntu/Dockerfile 
b/4.0.0-preview2/scala2.13-java17-python3-r-ubuntu/Dockerfile
new file mode 100644
index 000..7c575a8
--- /dev/null
+++ b/4.0.0-preview2/scala2.13-java17-python3-r-ubuntu/Dockerfile
@@ -0,0 +1,29 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing per

(spark-kubernetes-operator) branch main updated: [SPARK-49735] Upgrade Spark to `4.0.0-preview2`

2024-09-20 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git


The following commit(s) were added to refs/heads/main by this push:
 new 1c1dbb6  [SPARK-49735] Upgrade Spark to `4.0.0-preview2`
1c1dbb6 is described below

commit 1c1dbb61a65d65e97226f67ccb04be03f076f105
Author: Dongjoon Hyun 
AuthorDate: Fri Sep 20 10:53:43 2024 -0700

[SPARK-49735] Upgrade Spark to `4.0.0-preview2`

### What changes were proposed in this pull request?

This PR aims to upgrade `Spark` dependency to `4.0.0-preview2`.

### Why are the changes needed?

To use the latest updates.
- https://github.com/apache/spark/releases/tag/v4.0.0-preview2

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #133 from dongjoon-hyun/SPARK-49735.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 gradle/libs.versions.toml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gradle/libs.versions.toml b/gradle/libs.versions.toml
index 2ebed13..339b34f 100644
--- a/gradle/libs.versions.toml
+++ b/gradle/libs.versions.toml
@@ -20,7 +20,7 @@ lombok = "1.18.32"
 operator-sdk = "4.9.0"
 okhttp = "4.12.0"
 dropwizard-metrics = "4.2.25"
-spark = "4.0.0-preview1"
+spark = "4.0.0-preview2"
 log4j = "2.22.1"
 
 # Test


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-49531][PYTHON][CONNECT] Support line plot with plotly backend

2024-09-20 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 22a7edce0a7c [SPARK-49531][PYTHON][CONNECT] Support line plot with 
plotly backend
22a7edce0a7c is described below

commit 22a7edce0a7c70d6c1a5dcf995c6c723f0c3352b
Author: Xinrong Meng 
AuthorDate: Fri Sep 20 08:53:52 2024 -0700

[SPARK-49531][PYTHON][CONNECT] Support line plot with plotly backend

### What changes were proposed in this pull request?
Support line plot with plotly backend on both Spark Connect and Spark 
classic.

### Why are the changes needed?
While Pandas on Spark supports plotting, PySpark currently lacks this 
feature. The proposed API will enable users to generate visualizations, such as 
line plots, by leveraging libraries like Plotly. This will provide users with 
an intuitive, interactive way to explore and understand large datasets directly 
from PySpark DataFrames, streamlining the data analysis workflow in distributed 
environments.

See more at [PySpark Plotting API 
Specification](https://docs.google.com/document/d/1IjOEzC8zcetG86WDvqkereQPj_NGLNW7Bdu910g30Dg/edit?usp=sharing)
 in progress.

Part of https://issues.apache.org/jira/browse/SPARK-49530.

### Does this PR introduce _any_ user-facing change?
Yes.

```python
>>> data = [("A", 10, 1.5), ("B", 30, 2.5), ("C", 20, 3.5)]
>>> columns = ["category", "int_val", "float_val"]
>>> sdf = spark.createDataFrame(data, columns)
>>> sdf.show()
++---+-+
|category|int_val|float_val|
++---+-+
|   A| 10|  1.5|
|   B| 30|  2.5|
|   C| 20|  3.5|
++---+-+

>>> f = sdf.plot(kind="line", x="category", y="int_val")
>>> f.show()  # see below
>>> g = sdf.plot.line(x="category", y=["int_val", "float_val"])
>>> g.show()  # see below
```
`f.show()`:

![newplot](https://github.com/user-attachments/assets/ebd50bbc-0dd1-437f-ae0c-0b4de8f3c722)

`g.show()`:
![newplot 
(1)](https://github.com/user-attachments/assets/46d28840-a147-428f-8d88-d424aa76ad06)

### How was this patch tested?
Unit tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #48139 from xinrong-meng/plot_line_w_dep.

Authored-by: Xinrong Meng 
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/build_python_connect.yml |   2 +-
 dev/requirements.txt   |   2 +-
 dev/sparktestsupport/modules.py|   4 +
 python/docs/source/getting_started/install.rst |   1 +
 python/packaging/classic/setup.py  |   1 +
 python/packaging/connect/setup.py  |   2 +
 python/pyspark/errors/error-conditions.json|   5 +
 python/pyspark/sql/classic/dataframe.py|   9 ++
 python/pyspark/sql/connect/dataframe.py|   8 ++
 python/pyspark/sql/dataframe.py|  28 +
 python/pyspark/sql/plot/__init__.py|  21 
 python/pyspark/sql/plot/core.py| 135 +
 python/pyspark/sql/plot/plotly.py  |  30 +
 .../sql/tests/connect/test_parity_frame_plot.py|  36 ++
 .../tests/connect/test_parity_frame_plot_plotly.py |  36 ++
 python/pyspark/sql/tests/plot/__init__.py  |  16 +++
 python/pyspark/sql/tests/plot/test_frame_plot.py   |  80 
 .../sql/tests/plot/test_frame_plot_plotly.py   |  64 ++
 python/pyspark/sql/utils.py|  17 +++
 python/pyspark/testing/sqlutils.py |   7 ++
 .../org/apache/spark/sql/internal/SQLConf.scala|  27 +
 21 files changed, 529 insertions(+), 2 deletions(-)

diff --git a/.github/workflows/build_python_connect.yml 
b/.github/workflows/build_python_connect.yml
index 3ac1a0117e41..f668d813ef26 100644
--- a/.github/workflows/build_python_connect.yml
+++ b/.github/workflows/build_python_connect.yml
@@ -71,7 +71,7 @@ jobs:
   python packaging/connect/setup.py sdist
   cd dist
   pip install pyspark*connect-*.tar.gz
-  pip install 'six==1.16.0' 'pandas<=2.2.2' scipy 'plotly>=4.8' 
'mlflow>=2.8.1' coverage matplotlib openpyxl 'memory-profiler>=0.61.0' 
'scikit-learn>=1.3.2' 'graphviz==0.20.3' torch torchvision torcheval deepspeed 
unittest-xml-reporting
+  pip install 'six==1.16.0' 'pandas<=2.2.2' scipy 

(spark) branch master updated: [SPARK-49704][BUILD] Upgrade `commons-io` to 2.17.0

2024-09-20 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3d8c078ddefe [SPARK-49704][BUILD] Upgrade `commons-io` to 2.17.0
3d8c078ddefe is described below

commit 3d8c078ddefe3bb74fc78ffc9391a067156c8499
Author: panbingkun 
AuthorDate: Fri Sep 20 08:44:14 2024 -0700

[SPARK-49704][BUILD] Upgrade `commons-io` to 2.17.0

### What changes were proposed in this pull request?
This PR aims to upgrade `commons-io` from `2.16.1` to `2.17.0`.

### Why are the changes needed?
The full release notes: 
https://commons.apache.org/proper/commons-io/changes-report.html#a2.17.0

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #48154 from panbingkun/SPARK-49704.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 9871cc0bca04..419625f48fa1 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -44,7 +44,7 @@ commons-compiler/3.1.9//commons-compiler-3.1.9.jar
 commons-compress/1.27.1//commons-compress-1.27.1.jar
 commons-crypto/1.1.0//commons-crypto-1.1.0.jar
 commons-dbcp/1.4//commons-dbcp-1.4.jar
-commons-io/2.16.1//commons-io-2.16.1.jar
+commons-io/2.17.0//commons-io-2.17.0.jar
 commons-lang/2.6//commons-lang-2.6.jar
 commons-lang3/3.17.0//commons-lang3-3.17.0.jar
 commons-math3/3.6.1//commons-math3-3.6.1.jar
diff --git a/pom.xml b/pom.xml
index ddabc82d2ad1..b7c87beec0f9 100644
--- a/pom.xml
+++ b/pom.xml
@@ -187,7 +187,7 @@
 3.0.3
 1.17.1
 1.27.1
-2.16.1
+2.17.0
 
 2.6
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark-kubernetes-operator) branch main updated (20bdb70 -> fcb8a8f)

2024-09-20 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git


 discard 20bdb70  [SPARK-45923] Add java21 to the e2e tests
 new fcb8a8f  [SPARK-49724] Add java21 to the e2e tests

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (20bdb70)
\
 N -- N -- N   refs/heads/main (fcb8a8f)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark-kubernetes-operator) 01/01: [SPARK-49724] Add java21 to the e2e tests

2024-09-20 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git

commit fcb8a8f3bf3afa75a0fb55a2bfc986af556541bc
Author: Qi Tan 
AuthorDate: Fri Sep 20 04:51:50 2024 -0700

[SPARK-49724] Add java21 to the e2e tests

Add java 21 e2e teste

E2E coverage on java version

no

e2e workflow should cover this

no

Closes #132 from TQJADE/java-21.

Authored-by: Qi Tan 
Signed-off-by: Dongjoon Hyun 
---
 tests/e2e/spark-versions/chainsaw-test.yaml | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/tests/e2e/spark-versions/chainsaw-test.yaml 
b/tests/e2e/spark-versions/chainsaw-test.yaml
index f4a07f1..8f6c263 100644
--- a/tests/e2e/spark-versions/chainsaw-test.yaml
+++ b/tests/e2e/spark-versions/chainsaw-test.yaml
@@ -38,7 +38,7 @@ spec:
   - name: "JAVA_VERSION"
 value: "17"
   - name: "IMAGE"
-value: 'spark:3.5.2-scala2.12-java17-ubuntu'
+value: 'apache/spark:3.5.2-scala2.12-java17-ubuntu'
   - bindings:
   - name: "SPARK_VERSION"
 value: "3.4.3"
@@ -47,7 +47,16 @@ spec:
   - name: "JAVA_VERSION"
 value: "11"
   - name: "IMAGE"
-value: 'spark:3.4.3-scala2.12-java11-ubuntu'
+value: 'apache/spark:3.4.3-scala2.12-java11-ubuntu'
+  - bindings:
+  - name: "SPARK_VERSION"
+value: "4.0.0-preview1"
+  - name: "SCALA_VERSION"
+value: "2.13"
+  - name: "JAVA_VERSION"
+value: "21"
+  - name: "IMAGE"
+value: 'apache/spark:4.0.0-preview1-java21-scala'
   steps:
 - name: install-spark-application
   try:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark-kubernetes-operator) branch main updated: [SPARK-45923] Add java21 to the e2e tests

2024-09-20 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git


The following commit(s) were added to refs/heads/main by this push:
 new 20bdb70  [SPARK-45923] Add java21 to the e2e tests
20bdb70 is described below

commit 20bdb7095a45d57c0e2caca692cc72658072d258
Author: Qi Tan 
AuthorDate: Fri Sep 20 04:50:15 2024 -0700

[SPARK-45923] Add java21 to the e2e tests

### What changes were proposed in this pull request?
Add java 21 e2e teste

### Why are the changes needed?
E2E coverage on java version

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
e2e workflow should cover this

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #132 from TQJADE/java-21.

Authored-by: Qi Tan 
Signed-off-by: Dongjoon Hyun 
---
 tests/e2e/spark-versions/chainsaw-test.yaml | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/tests/e2e/spark-versions/chainsaw-test.yaml 
b/tests/e2e/spark-versions/chainsaw-test.yaml
index f4a07f1..8f6c263 100644
--- a/tests/e2e/spark-versions/chainsaw-test.yaml
+++ b/tests/e2e/spark-versions/chainsaw-test.yaml
@@ -38,7 +38,7 @@ spec:
   - name: "JAVA_VERSION"
 value: "17"
   - name: "IMAGE"
-value: 'spark:3.5.2-scala2.12-java17-ubuntu'
+value: 'apache/spark:3.5.2-scala2.12-java17-ubuntu'
   - bindings:
   - name: "SPARK_VERSION"
 value: "3.4.3"
@@ -47,7 +47,16 @@ spec:
   - name: "JAVA_VERSION"
 value: "11"
   - name: "IMAGE"
-value: 'spark:3.4.3-scala2.12-java11-ubuntu'
+value: 'apache/spark:3.4.3-scala2.12-java11-ubuntu'
+  - bindings:
+  - name: "SPARK_VERSION"
+value: "4.0.0-preview1"
+  - name: "SCALA_VERSION"
+value: "2.13"
+  - name: "JAVA_VERSION"
+value: "21"
+  - name: "IMAGE"
+value: 'apache/spark:4.0.0-preview1-java21-scala'
   steps:
 - name: install-spark-application
   try:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) tag v4.0.0-preview2 created (now f0d465e09b8d)

2024-09-20 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to tag v4.0.0-preview2
in repository https://gitbox.apache.org/repos/asf/spark.git


  at f0d465e09b8d (commit)
No new revisions were added by this update.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r71774 - /dev/spark/v4.0.0-preview2-rc1-bin/ /release/spark/spark-4.0.0-preview2/

2024-09-20 Thread dongjoon
Author: dongjoon
Date: Fri Sep 20 10:00:47 2024
New Revision: 71774

Log:
Release Apache Spark 4.0.0-preview2

Added:
release/spark/spark-4.0.0-preview2/
  - copied from r71773, dev/spark/v4.0.0-preview2-rc1-bin/
Removed:
dev/spark/v4.0.0-preview2-rc1-bin/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-49721][BUILD] Upgrade `protobuf-java` to 3.25.5

2024-09-19 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ca726c10925a [SPARK-49721][BUILD] Upgrade `protobuf-java` to 3.25.5
ca726c10925a is described below

commit ca726c10925a3677bf057f65ecf415e608c63cd5
Author: Dongjoon Hyun 
AuthorDate: Thu Sep 19 17:16:25 2024 -0700

[SPARK-49721][BUILD] Upgrade `protobuf-java` to 3.25.5

### What changes were proposed in this pull request?

This PR aims to upgrade `protobuf-java` to 3.25.5.

### Why are the changes needed?

To bring the latest bug fixes.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #48170

Closes #48171 from dongjoon-hyun/SPARK-49721.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 pom.xml  | 2 +-
 project/SparkBuild.scala | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/pom.xml b/pom.xml
index 694ea31e6f37..ddabc82d2ad1 100644
--- a/pom.xml
+++ b/pom.xml
@@ -124,7 +124,7 @@
 
 3.4.0
 
-3.25.4
+3.25.5
 3.11.4
 ${hadoop.version}
 3.9.2
diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index d93a52985b77..2f390cb70baa 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -89,7 +89,7 @@ object BuildCommons {
 
   // Google Protobuf version used for generating the protobuf.
   // SPARK-41247: needs to be consistent with `protobuf.version` in `pom.xml`.
-  val protoVersion = "3.25.4"
+  val protoVersion = "3.25.5"
   // GRPC version used for Spark Connect.
   val grpcVersion = "1.62.2"
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-49720][PYTHON][INFRA] Add a script to clean up PySpark temp files

2024-09-19 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 04455797bfb3 [SPARK-49720][PYTHON][INFRA] Add a script to clean up 
PySpark temp files
04455797bfb3 is described below

commit 04455797bfb3631b13b41cfa5d2604db3bf8acc2
Author: Ruifeng Zheng 
AuthorDate: Thu Sep 19 12:32:30 2024 -0700

[SPARK-49720][PYTHON][INFRA] Add a script to clean up PySpark temp files

### What changes were proposed in this pull request?
Add a script to clean up PySpark temp files

### Why are the changes needed?
Sometimes I encounter weird issues due to the out-dated `pyspark.zip` file, 
and removing it can result in expected behavior.
So I think we can add such a script.

### Does this PR introduce _any_ user-facing change?
no, dev-only

### How was this patch tested?
manually test

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #48167 from zhengruifeng/py_infra_cleanup.

Authored-by: Ruifeng Zheng 
Signed-off-by: Dongjoon Hyun 
---
 dev/py-cleanup | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/dev/py-cleanup b/dev/py-cleanup
new file mode 100755
index ..6a2edd104017
--- /dev/null
+++ b/dev/py-cleanup
@@ -0,0 +1,31 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Utility for temporary files cleanup in 'python'.
+#   usage: ./dev/py-cleanup
+
+set -ex
+
+SPARK_HOME="$(cd "`dirname $0`"/..; pwd)"
+cd "$SPARK_HOME"
+
+rm -rf python/target
+rm -rf python/lib/pyspark.zip
+rm -rf python/docs/build
+rm -rf python/docs/source/reference/*/api


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-49718][PS] Switch `Scatter` plot to sampled data

2024-09-19 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6d1815eceea2 [SPARK-49718][PS] Switch `Scatter` plot to sampled data
6d1815eceea2 is described below

commit 6d1815eceea2003de2e3602f0f64e8188e8288d8
Author: Ruifeng Zheng 
AuthorDate: Thu Sep 19 12:31:48 2024 -0700

[SPARK-49718][PS] Switch `Scatter` plot to sampled data

### What changes were proposed in this pull request?
Switch `Scatter` plot to sampled data

### Why are the changes needed?
when the data distribution has relationship with the order, the first n 
rows will not be representative of the whole dataset

for example:
```
import pandas as pd
import numpy as np
import pyspark.pandas as ps

# ps.set_option("plotting.max_rows", 1)
np.random.seed(123)

pdf = pd.DataFrame(np.random.randn(1, 4), 
columns=list('ABCD')).sort_values("A")
psdf = ps.DataFrame(pdf)

psdf.plot.scatter(x='B', y='A')
```

all 10k datapoints:

![image](https://github.com/user-attachments/assets/72cf7e97-ad10-41e0-a8a6-351747d5285f)

before (first 1k datapoints):

![image](https://github.com/user-attachments/assets/1ed50d2c-7772-4579-a84c-6062542d9367)

after (sampled 1k datapoints):

![image](https://github.com/user-attachments/assets/6c684cba-4119-4c38-8228-2bedcdeb9e59)

### Does this PR introduce _any_ user-facing change?
yes

### How was this patch tested?
ci and manually test

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #48164 from zhengruifeng/ps_scatter_sampling.

    Authored-by: Ruifeng Zheng 
Signed-off-by: Dongjoon Hyun 
---
 python/pyspark/pandas/plot/core.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/python/pyspark/pandas/plot/core.py 
b/python/pyspark/pandas/plot/core.py
index 429e97ecf07b..6f036b766924 100644
--- a/python/pyspark/pandas/plot/core.py
+++ b/python/pyspark/pandas/plot/core.py
@@ -479,7 +479,7 @@ class PandasOnSparkPlotAccessor(PandasObject):
 "pie": TopNPlotBase().get_top_n,
 "bar": TopNPlotBase().get_top_n,
 "barh": TopNPlotBase().get_top_n,
-"scatter": TopNPlotBase().get_top_n,
+"scatter": SampledPlotBase().get_sampled,
 "area": SampledPlotBase().get_sampled,
 "line": SampledPlotBase().get_sampled,
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.4 updated: [SPARK-46535][SQL][3.4] Fix NPE when describe extended a column without col stats

2024-09-19 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new cb89d18a4d75 [SPARK-46535][SQL][3.4] Fix NPE when describe extended a 
column without col stats
cb89d18a4d75 is described below

commit cb89d18a4d750fc88e5d747601352488223e97b5
Author: saitharun15 
AuthorDate: Thu Sep 19 12:19:10 2024 -0700

[SPARK-46535][SQL][3.4] Fix NPE when describe extended a column without col 
stats

### What changes were proposed in this pull request?

Backport  [#44524 ] to 3.4 for 
[[SPARK-46535]](https://issues.apache.org/jira/browse/SPARK-46535)[SQL] Fix NPE 
when describe extended a column without col stats

### Why are the changes needed?

Currently executing DESCRIBE TABLE EXTENDED a column without col stats with 
v2 table will throw a null pointer exception.

```
Cannot invoke 
"org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because 
the return value of "scala.Option.get()" is null
java.lang.NullPointerException: Cannot invoke 
"org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because 
the return value of "scala.Option.get()" is null
at 
org.apache.spark.sql.execution.datasources.v2.DescribeColumnExec.run(DescribeColumnExec.scala:63)
at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:118)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103)
```

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Add a new test describe extended (formatted) a column without col stats

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #48160 from saitharun15/SPARK-46535-branch-3.4.

Lead-authored-by: saitharun15 
Co-authored-by: Sai Tharun 
Signed-off-by: Dongjoon Hyun 
---
 .../datasources/v2/DescribeColumnExec.scala |  2 +-
 .../execution/command/v2/DescribeTableSuite.scala   | 21 +
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeColumnExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeColumnExec.scala
index 61ccda3fc954..2683d8d547f0 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeColumnExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeColumnExec.scala
@@ -53,7 +53,7 @@ case class DescribeColumnExec(
   read.newScanBuilder(CaseInsensitiveStringMap.empty()).build() match {
 case s: SupportsReportStatistics =>
   val stats = s.estimateStatistics()
-  Some(stats.columnStats().get(FieldReference.column(column.name)))
+  
Option(stats.columnStats().get(FieldReference.column(column.name)))
 case _ => None
   }
 case _ => None
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/DescribeTableSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/DescribeTableSuite.scala
index 25363dcea699..a12bb92072bc 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/DescribeTableSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/DescribeTableSuite.scala
@@ -175,4 +175,25 @@ class DescribeTableSuite extends 
command.DescribeTableSuiteBase
   Row("max_col_len", "NULL")))
 }
   }
+
+  test("SPARK-46535: describe extended (formatted) a column without col 
stats") {
+withNamespaceAndTable("ns", "tbl") { tbl =>
+  sql(
+s"""
+   |CREATE TABLE $tbl
+   |(key INT COMMENT 'column_comment', col STRING)
+   |$defaultUsing""".stripMargin)
+
+  val descriptionDf = sql(s"DESCRIBE TABLE EXTENDED $tbl key")
+  assert(descriptionDf.schema.map(field => (field.name, field.dataTy

(spark) branch master updated (f0fb0c89ec29 -> 92cad2abd54e)

2024-09-19 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from f0fb0c89ec29 [SPARK-49719][SQL] Make `UUID` and `SHUFFLE` accept 
integer `seed`
 add 92cad2abd54e [SPARK-49716][PS][DOCS][TESTS] Fix documentation and add 
test of barh plot

No new revisions were added by this update.

Summary of changes:
 python/pyspark/pandas/plot/core.py | 13 ++---
 python/pyspark/pandas/tests/plot/test_frame_plot_plotly.py |  5 +++--
 2 files changed, 13 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (94dca78c128f -> f0fb0c89ec29)

2024-09-19 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 94dca78c128f [SPARK-49693][PYTHON][CONNECT] Refine the string 
representation of `timedelta`
 add f0fb0c89ec29 [SPARK-49719][SQL] Make `UUID` and `SHUFFLE` accept 
integer `seed`

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/expressions/randomExpressions.scala | 1 +
 .../sql/catalyst/expressions/CollectionExpressionsSuite.scala | 8 
 .../spark/sql/catalyst/expressions/MiscExpressionsSuite.scala | 7 +++
 3 files changed, 16 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark-kubernetes-operator) branch main updated: [SPARK-49715] Add `Java 21`-based `SparkCluster` example

2024-09-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git


The following commit(s) were added to refs/heads/main by this push:
 new b187764  [SPARK-49715] Add `Java 21`-based `SparkCluster` example
b187764 is described below

commit b18776458b8f26db8e43f6ddb9275f3cbdbafffe
Author: Dongjoon Hyun 
AuthorDate: Wed Sep 18 22:30:11 2024 -0700

[SPARK-49715] Add `Java 21`-based `SparkCluster` example

### What changes were proposed in this pull request?

This PR aims to add `Java 21`-based `SparkCluster` example.

### Why are the changes needed?

Apache Spark starts to publish Java 21-based image from Today (2024-09-19). 
This new example will illustrate how to use it.
- https://github.com/apache/spark-docker/pull/69

### Does this PR introduce _any_ user-facing change?

No, this is a new example.

### How was this patch tested?

Manual review.

```
$ kubectl apply -f cluster-java21.yaml
```

```
$ kubectl get sparkcluster
NAME CURRENT STATEAGE
cluster-java21   RunningHealthy   9s
```

```
$ kubectl describe sparkcluster cluster-java21
Name: cluster-java21
Namespace:default
Labels:   
Annotations:  
API Version:  spark.apache.org/v1alpha1
Kind: SparkCluster
Metadata:
  Creation Timestamp:  2024-09-19T04:25:30Z
  Finalizers:
sparkclusters.spark.apache.org/finalizer
  Generation:2
  Resource Version:  96663
  UID:   9421c957-380d-4a26-a266-379304bf83ee
Spec:
  Cluster Tolerations:
Instance Config:
  Init Workers:  3
  Max Workers:   3
  Min Workers:   3
  Master Spec:
  Runtime Versions:
Spark Version:  4.0.0-preview1
  Spark Conf:
spark.kubernetes.container.image:  apache/spark:4.0.0-preview1-java21
spark.master.rest.enabled: true
spark.master.rest.host:0.0.0.0
spark.master.ui.title: Prod Spark Cluster (Java 21)
spark.ui.reverseProxy: true
  Worker Spec:
Status:
  Current Attempt Summary:
Attempt Info:
  Id:  0
  Current State:
Current State Summary:  RunningHealthy
Last Transition Time:   2024-09-19T04:25:30.665095088Z
Message:Cluster has reached ready state.
  State Transition History:
0:
  Current State Summary:  Submitted
  Last Transition Time:   2024-09-19T04:25:30.640072963Z
  Message:Spark cluster has been submitted to 
Kubernetes Cluster.
1:
  Current State Summary:  RunningHealthy
  Last Transition Time:   2024-09-19T04:25:30.665095088Z
  Message:Cluster has reached ready state.
Events:   
```

```
$ k get pod
NAME READY   STATUSRESTARTS   
AGE
cluster-java21-master-0  1/1 Running   0  
3m20s
cluster-java21-worker-0  1/1 Running   0  
3m20s
cluster-java21-worker-1  1/1 Running   0  
3m20s
cluster-java21-worker-2  1/1 Running   0  
3m20s
spark-kubernetes-operator-778b9bbdc6-fqks9   1/1 Running   0  
20m
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #130 from dongjoon-hyun/SPARK-49715.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 examples/cluster-java21.yaml | 32 
 1 file changed, 32 insertions(+)

diff --git a/examples/cluster-java21.yaml b/examples/cluster-java21.yaml
new file mode 100644
index 000..fefe90c
--- /dev/null
+++ b/examples/cluster-java21.yaml
@@ -0,0 +1,32 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+apiVersion: spark.apache.org/v1alpha1
+kind: SparkCluster
+metadata:
+  

(spark-kubernetes-operator) branch main updated: [SPARK-49714] Add `Java 21`-based `SparkPi` example

2024-09-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git


The following commit(s) were added to refs/heads/main by this push:
 new 815cefe  [SPARK-49714] Add `Java 21`-based `SparkPi` example
815cefe is described below

commit 815cefe136c196cf987ac73300e55c84d3216816
Author: Dongjoon Hyun 
AuthorDate: Wed Sep 18 22:29:13 2024 -0700

[SPARK-49714] Add `Java 21`-based `SparkPi` example

### What changes were proposed in this pull request?

This PR aims to add `Java 21`-based SparkPi example.

### Why are the changes needed?

Apache Spark starts to publish Java 21-based image from Today (2024-09-19). 
This new example will illustrate how to use it.
- https://github.com/apache/spark-docker/pull/69

### Does this PR introduce _any_ user-facing change?

No, this is a new example.

### How was this patch tested?

Manual review.

```
$ kubectl apply -f examples/pi-java21.yaml
```

```
$ kubectl get sparkapp
NAMECURRENT STATE  AGE
pi-java21   ResourceReleased   28s
```

```
$ kubectl describe sparkapp pi-java21
Name: pi-java21
Namespace:default
Labels:   
Annotations:  
API Version:  spark.apache.org/v1alpha1
Kind: SparkApplication
Metadata:
  Creation Timestamp:  2024-09-19T04:08:16Z
  Finalizers:
sparkapplications.spark.apache.org/finalizer
  Generation:2
  Resource Version:  95294
  UID:   2bc46e8d-6339-4867-9a28-6552c6c471e5
Spec:
  Application Tolerations:
Application Timeout Config:
  Driver Ready Timeout Millis:30
  Driver Start Timeout Millis:30
  Executor Start Timeout Millis:  30
  Force Termination Grace Period Millis:  30
  Termination Requeue Period Millis:  2000
Instance Config:
  Init Executors:0
  Max Executors: 0
  Min Executors: 0
Resource Retain Policy:  OnFailure
Restart Config:
  Max Restart Attempts:3
  Restart Backoff Millis:  3
  Restart Policy:  Never
  Deployment Mode: ClusterMode
  Driver Args:
  Jars:local:///opt/spark/examples/jars/spark-examples.jar
  Main Class:  org.apache.spark.examples.SparkPi
  Runtime Versions:
Scala Version:  2.13
Spark Version:  4.0.0-preview1
  Spark Conf:
spark.dynamicAllocation.enabled:  true
spark.dynamicAllocation.maxExecutors: 3
spark.dynamicAllocation.shuffleTracking.enabled:  true
spark.kubernetes.authenticate.driver.serviceAccountName:  spark
spark.kubernetes.container.image: 
apache/spark:4.0.0-preview1-java21-scala
spark.log.structuredLogging.enabled:  false
Status:
  Current Attempt Summary:
Attempt Info:
  Id:  0
  Current State:
Current State Summary:  ResourceReleased
Last Transition Time:   2024-09-19T04:08:33.316041381Z
  State Transition History:
0:
  Current State Summary:  Submitted
  Last Transition Time:   2024-09-19T04:08:16.584629470Z
  Message:Spark application has been created on 
Kubernetes Cluster.
1:
  Current State Summary:  DriverRequested
  Last Transition Time:   2024-09-19T04:08:17.269457304Z
  Message:Requested driver from resource scheduler.
2:
  Current State Summary:  DriverStarted
  Last Transition Time:   2024-09-19T04:08:17.809898304Z
  Message:Driver has started running.
3:
  Current State Summary:  DriverReady
  Last Transition Time:   2024-09-19T04:08:17.810393971Z
  Message:Driver has reached ready state.
4:
  Current State Summary:  RunningHealthy
  Last Transition Time:   2024-09-19T04:08:17.828526471Z
  Message:Application is running healthy.
5:
  Current State Summary:  Succeeded
  Last Transition Time:   2024-09-19T04:08:33.241514089Z
  Message:Spark application completed successfully.
6:
  Current State Summary:  ResourceReleased
  Last Transition Time:   2024-09-19T04:08:33.316041381Z
Events:   
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #129 from dongjoon-hyun/SPARK-49714.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 examples/pi-java21.yaml | 33

(spark-kubernetes-operator) branch main updated: [SPARK-49705] Use `spark-examples.jar` instead of `spark-examples_2.13-4.0.0-preview1.jar`

2024-09-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git


The following commit(s) were added to refs/heads/main by this push:
 new 491e7db  [SPARK-49705] Use `spark-examples.jar` instead of 
`spark-examples_2.13-4.0.0-preview1.jar`
491e7db is described below

commit 491e7db1d62b9f0e4f641f2a59eee083140dc819
Author: Dongjoon Hyun 
AuthorDate: Wed Sep 18 22:28:36 2024 -0700

[SPARK-49705] Use `spark-examples.jar` instead of 
`spark-examples_2.13-4.0.0-preview1.jar`

### What changes were proposed in this pull request?

This PR aims to use `spark-examples.jar` instead of 
`spark-examples_2.13-4.0.0-preview1.jar`.

### Why are the changes needed?

To simplify the examples for Apache Spark 4+ via SPARK-45497.
- https://github.com/apache/spark/pull/43324

### Does this PR introduce _any_ user-facing change?

Yes, but only example images.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #127 from dongjoon-hyun/SPARK-49705.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 examples/pi-on-yunikorn.yaml| 2 +-
 examples/pi-scala.yaml  | 2 +-
 examples/pi-with-one-pod.yaml   | 2 +-
 examples/pi.yaml| 2 +-
 examples/sql.yaml   | 2 +-
 examples/submit-pi-to-prod.sh   | 2 +-
 tests/e2e/state-transition/spark-example-succeeded.yaml | 2 +-
 tests/e2e/watched-namespaces/spark-example.yaml | 2 +-
 8 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/examples/pi-on-yunikorn.yaml b/examples/pi-on-yunikorn.yaml
index 029c9f3..d8f6ccf 100644
--- a/examples/pi-on-yunikorn.yaml
+++ b/examples/pi-on-yunikorn.yaml
@@ -18,7 +18,7 @@ metadata:
   name: pi-on-yunikorn
 spec:
   mainClass: "org.apache.spark.examples.SparkPi"
-  jars: 
"local:///opt/spark/examples/jars/spark-examples_2.13-4.0.0-preview1.jar"
+  jars: "local:///opt/spark/examples/jars/spark-examples.jar"
   driverArgs: [ "2" ]
   sparkConf:
 spark.dynamicAllocation.enabled: "true"
diff --git a/examples/pi-scala.yaml b/examples/pi-scala.yaml
index 3744ae1..29b5018 100644
--- a/examples/pi-scala.yaml
+++ b/examples/pi-scala.yaml
@@ -18,7 +18,7 @@ metadata:
   name: pi-scala
 spec:
   mainClass: "org.apache.spark.examples.SparkPi"
-  jars: 
"local:///opt/spark/examples/jars/spark-examples_2.13-4.0.0-preview1.jar"
+  jars: "local:///opt/spark/examples/jars/spark-examples.jar"
   sparkConf:
 spark.dynamicAllocation.enabled: "true"
 spark.dynamicAllocation.shuffleTracking.enabled: "true"
diff --git a/examples/pi-with-one-pod.yaml b/examples/pi-with-one-pod.yaml
index f46d977..739058d 100644
--- a/examples/pi-with-one-pod.yaml
+++ b/examples/pi-with-one-pod.yaml
@@ -18,7 +18,7 @@ metadata:
   name: pi-with-one-pod
 spec:
   mainClass: "org.apache.spark.examples.SparkPi"
-  jars: 
"local:///opt/spark/examples/jars/spark-examples_2.13-4.0.0-preview1.jar"
+  jars: "local:///opt/spark/examples/jars/spark-examples.jar"
   sparkConf:
 spark.kubernetes.driver.master: "local[10]"
 spark.kubernetes.driver.request.cores: "5"
diff --git a/examples/pi.yaml b/examples/pi.yaml
index f99499d..8b20dcd 100644
--- a/examples/pi.yaml
+++ b/examples/pi.yaml
@@ -18,7 +18,7 @@ metadata:
   name: pi
 spec:
   mainClass: "org.apache.spark.examples.SparkPi"
-  jars: 
"local:///opt/spark/examples/jars/spark-examples_2.13-4.0.0-preview1.jar"
+  jars: "local:///opt/spark/examples/jars/spark-examples.jar"
   sparkConf:
 spark.dynamicAllocation.enabled: "true"
 spark.dynamicAllocation.shuffleTracking.enabled: "true"
diff --git a/examples/sql.yaml b/examples/sql.yaml
index 9639723..1be9779 100644
--- a/examples/sql.yaml
+++ b/examples/sql.yaml
@@ -18,7 +18,7 @@ metadata:
   name: sql
 spec:
   mainClass: "org.apache.spark.examples.sql.JavaSparkSQLCli"
-  jars: 
"local:///opt/spark/examples/jars/spark-examples_2.13-4.0.0-preview1.jar"
+  jars: "local:///opt/spark/examples/jars/spark-examples.jar"
   driverArgs: [ "SHOW DATABASES", "SHOW TABLES", "SELECT VERSION()"  ]
   sparkConf:
 spark.dynamicAllocation.enabled: "true"
diff --git a/examples/submit-pi-to-prod.sh b/examples/submit-pi-to-prod.sh
index 915706a..b3b4707 100755
--- a/examples/submit-pi-to-prod.sh
+++ b/examples/submit-pi-to-prod.sh
@@ -30,7 +30,7 @@ curl -XPOST http://localhost:6066/v1/submissions/create 

(spark-docker) branch master updated: [SPARK-49703] Publish Java 21 Docker image for preview1

2024-09-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new 0402e13  [SPARK-49703] Publish Java 21 Docker image for preview1
0402e13 is described below

commit 0402e13bb797363f6b99d6aa56c4185317deeaf4
Author: Dongjoon Hyun 
AuthorDate: Wed Sep 18 20:37:56 2024 -0700

[SPARK-49703] Publish Java 21 Docker image for preview1

### What changes were proposed in this pull request?

This PR aims to publish Java 21 Docker image for `preview1` and will be 
extended for `preview2`.

### Why are the changes needed?

Apache Spark supports Java 21 via SPARK-43831.

### Does this PR introduce _any_ user-facing change?

No, this is a new image.

### How was this patch tested?

Pass the CIs.

Closes #69 from dongjoon-hyun/SPARK-49703.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/build_4.0.0-preview1.yaml|   2 +-
 .github/workflows/publish-java21.yaml  |  88 ++
 .../scala2.13-java21-python3-r-ubuntu/Dockerfile   |  29 +
 .../scala2.13-java21-python3-ubuntu/Dockerfile |  26 +
 .../scala2.13-java21-r-ubuntu/Dockerfile   |  28 +
 4.0.0-preview1/scala2.13-java21-ubuntu/Dockerfile  |  81 +
 .../scala2.13-java21-ubuntu/entrypoint.sh  | 130 +
 README.md  |   2 +-
 add-dockerfiles.sh |  10 +-
 versions.json  |  28 +
 10 files changed, 420 insertions(+), 4 deletions(-)

diff --git a/.github/workflows/build_4.0.0-preview1.yaml 
b/.github/workflows/build_4.0.0-preview1.yaml
index aa683f7..31df15a 100644
--- a/.github/workflows/build_4.0.0-preview1.yaml
+++ b/.github/workflows/build_4.0.0-preview1.yaml
@@ -31,7 +31,7 @@ jobs:
 strategy:
   matrix:
 image-type: ["all", "python", "scala", "r"]
-java: [17]
+java: [17, 21]
 name: Run
 secrets: inherit
 uses: ./.github/workflows/main.yml
diff --git a/.github/workflows/publish-java21.yaml 
b/.github/workflows/publish-java21.yaml
new file mode 100644
index 000..1a2078a
--- /dev/null
+++ b/.github/workflows/publish-java21.yaml
@@ -0,0 +1,88 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+name: "Publish (Java 21 only)"
+
+on:
+  workflow_dispatch:
+inputs:
+  spark:
+description: 'The Spark version of Spark image.'
+required: true
+default: '4.0.0-preview1'
+type: choice
+options:
+- 4.0.0-preview1
+  publish:
+description: 'Publish the image or not.'
+default: false
+type: boolean
+required: true
+  repository:
+description: The registry to be published (Available only when publish 
is true).
+required: false
+default: ghcr.io/apache/spark-docker
+type: choice
+options:
+# GHCR: This required the write permission of apache/spark-docker 
(Spark Committer)
+- ghcr.io/apache/spark-docker
+# Dockerhub: This required the DOCKERHUB_TOKEN and DOCKERHUB_USER 
(Spark Committer)
+- apache
+
+jobs:
+  # We first build and publish the base image
+  run-base-build:
+strategy:
+  matrix:
+scala: [2.13]
+java: [21]
+image-type: ["scala"]
+permissions:
+  packages: write
+name: Run Base
+secrets: inherit
+uses: ./.github/workflows/main.yml
+with:
+  spark: ${{ inputs.spark }}
+  scala: ${{ matrix.scala }}
+  java: ${{ matrix.java }}
+  publish: ${{ inputs.publish }}
+  repository: ${{ inputs.repository }}
+  image-type: ${{ matrix.image-type }}
+
+  # Then publish the all / python / r images
+  run-build:
+needs: run-base-build
+strategy:
+  matrix:
+scala: [2.13]
+java: [21]
+image-type: ["all&

(spark-kubernetes-operator) branch main updated: [SPARK-49706] Use `apache/spark` images instead of `spark`

2024-09-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git


The following commit(s) were added to refs/heads/main by this push:
 new 41e7c3b  [SPARK-49706] Use `apache/spark` images instead of `spark`
41e7c3b is described below

commit 41e7c3bba5857f6a678570c64e1c360375395611
Author: Dongjoon Hyun 
AuthorDate: Wed Sep 18 18:16:35 2024 -0700

[SPARK-49706] Use `apache/spark` images instead of `spark`

### What changes were proposed in this pull request?

This PR aims to propose to use `apache/spark` images instead of `spark` 
because `apache/spark` images are published first. For example, the following 
are only available in `apache/spark` as of now.
- https://github.com/apache/spark-docker/pull/66
- https://github.com/apache/spark-docker/pull/67
- https://github.com/apache/spark-docker/pull/68

### Why are the changes needed?

To apply the latest bits earlier.

### Does this PR introduce _any_ user-facing change?

There is no change from `Apache Spark K8s Operator`.
Only the underlying images are changed.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #128 from dongjoon-hyun/SPARK-49706.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 README.md   | 2 +-
 examples/cluster-on-yunikorn.yaml   | 2 +-
 examples/cluster-with-template.yaml | 2 +-
 examples/pi-on-yunikorn.yaml| 2 +-
 examples/pi-scala.yaml  | 2 +-
 examples/pi-with-one-pod.yaml   | 2 +-
 examples/pi.yaml| 2 +-
 examples/prod-cluster-with-three-workers.yaml   | 2 +-
 examples/pyspark-pi.yaml| 2 +-
 examples/qa-cluster-with-one-worker.yaml| 2 +-
 examples/sql.yaml   | 2 +-
 tests/e2e/python/chainsaw-test.yaml | 4 ++--
 tests/e2e/spark-versions/chainsaw-test.yaml | 2 +-
 tests/e2e/state-transition/spark-cluster-example-succeeded.yaml | 2 +-
 tests/e2e/state-transition/spark-example-succeeded.yaml | 2 +-
 tests/e2e/watched-namespaces/spark-example.yaml | 2 +-
 16 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/README.md b/README.md
index e9cdee7..e306889 100644
--- a/README.md
+++ b/README.md
@@ -100,7 +100,7 @@ Events:
   Normal  Scheduled  14s   yunikorn  Successfully assigned 
default/pi-on-yunikorn-0-driver to node docker-desktop
   Normal  PodBindSuccessful  14s   yunikorn  Pod 
default/pi-on-yunikorn-0-driver is successfully bound to node docker-desktop
   Normal  TaskCompleted  6syunikorn  Task 
default/pi-on-yunikorn-0-driver is completed
-  Normal  Pulled 13s   kubelet   Container image 
"spark:4.0.0-preview1" already present on machine
+  Normal  Pulled 13s   kubelet   Container image 
"apache/spark:4.0.0-preview1" already present on machine
   Normal  Created13s   kubelet   Created container 
spark-kubernetes-driver
   Normal  Started13s   kubelet   Started container 
spark-kubernetes-driver
 
diff --git a/examples/cluster-on-yunikorn.yaml 
b/examples/cluster-on-yunikorn.yaml
index 0032c84..4c1d142 100644
--- a/examples/cluster-on-yunikorn.yaml
+++ b/examples/cluster-on-yunikorn.yaml
@@ -25,7 +25,7 @@ spec:
   minWorkers: 1
   maxWorkers: 1
   sparkConf:
-spark.kubernetes.container.image: "spark:4.0.0-preview1"
+spark.kubernetes.container.image: "apache/spark:4.0.0-preview1"
 spark.kubernetes.scheduler.name: "yunikorn"
 spark.master.ui.title: "Spark Cluster on YuniKorn Scheduler"
 spark.master.rest.enabled: "true"
diff --git a/examples/cluster-with-template.yaml 
b/examples/cluster-with-template.yaml
index c0d17b8..69add4d 100644
--- a/examples/cluster-with-template.yaml
+++ b/examples/cluster-with-template.yaml
@@ -87,7 +87,7 @@ spec:
   annotations:
 customAnnotation: "annotation"
   sparkConf:
-spark.kubernetes.container.image: "spark:4.0.0-preview1"
+spark.kubernetes.container.image: "apache/spark:4.0.0-preview1"
 spark.master.ui.title: "Spark Cluster with Template"
 spark.master.rest.enabled: "true"
 spark.master.rest.host: "0.0.0.0"
diff --git a/examples/pi-on-yunikorn.yaml b/examples/pi-on-yunikorn.yaml
index 9e115b4..029c9f3 100644
--- a/exa

(spark-docker) branch master updated: [SPARK-44935] Fix `RELEASE` file to have the correct information in Docker images if exists

2024-09-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new a5f0168  [SPARK-44935] Fix `RELEASE` file to have the correct 
information in Docker images if exists
a5f0168 is described below

commit a5f016858ca0ce6809a06d3c159bfdb221df68e0
Author: Dongjoon Hyun 
AuthorDate: Wed Sep 18 16:12:41 2024 -0700

[SPARK-44935] Fix `RELEASE` file to have the correct information in Docker 
images if exists

### What changes were proposed in this pull request?

This PR aims to fix `RELEASE` file to have the correct information in 
Docker images if exists.

Apache Spark repository already fixed this.
- https://github.com/apache/spark/pull/42636

### Why are the changes needed?

To provide a correct information for Spark 3.4+

### Does this PR introduce _any_ user-facing change?

No behavior change. Only `RELEASE` file.

### How was this patch tested?

Pass the CIs.

Closes #68 from dongjoon-hyun/SPARK-44935.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 3.4.0/scala2.12-java11-ubuntu/Dockerfile  | 1 +
 3.4.1/scala2.12-java11-ubuntu/Dockerfile  | 1 +
 3.4.2/scala2.12-java11-ubuntu/Dockerfile  | 1 +
 3.4.3/scala2.12-java11-ubuntu/Dockerfile  | 1 +
 3.5.0/scala2.12-java11-ubuntu/Dockerfile  | 1 +
 3.5.0/scala2.12-java17-ubuntu/Dockerfile  | 1 +
 3.5.1/scala2.12-java11-ubuntu/Dockerfile  | 1 +
 3.5.1/scala2.12-java17-ubuntu/Dockerfile  | 1 +
 3.5.2/scala2.12-java11-ubuntu/Dockerfile  | 1 +
 3.5.2/scala2.12-java17-ubuntu/Dockerfile  | 1 +
 4.0.0-preview1/scala2.13-java17-ubuntu/Dockerfile | 1 +
 Dockerfile.template   | 1 +
 12 files changed, 12 insertions(+)

diff --git a/3.4.0/scala2.12-java11-ubuntu/Dockerfile 
b/3.4.0/scala2.12-java11-ubuntu/Dockerfile
index a4b081e..c756d8a 100644
--- a/3.4.0/scala2.12-java11-ubuntu/Dockerfile
+++ b/3.4.0/scala2.12-java11-ubuntu/Dockerfile
@@ -55,6 +55,7 @@ RUN set -ex; \
 tar -xf spark.tgz --strip-components=1; \
 chown -R spark:spark .; \
 mv jars /opt/spark/; \
+mv RELEASE /opt/spark/; \
 mv bin /opt/spark/; \
 mv sbin /opt/spark/; \
 mv kubernetes/dockerfiles/spark/decom.sh /opt/; \
diff --git a/3.4.1/scala2.12-java11-ubuntu/Dockerfile 
b/3.4.1/scala2.12-java11-ubuntu/Dockerfile
index d8bba7e..c18afb0 100644
--- a/3.4.1/scala2.12-java11-ubuntu/Dockerfile
+++ b/3.4.1/scala2.12-java11-ubuntu/Dockerfile
@@ -55,6 +55,7 @@ RUN set -ex; \
 tar -xf spark.tgz --strip-components=1; \
 chown -R spark:spark .; \
 mv jars /opt/spark/; \
+mv RELEASE /opt/spark/; \
 mv bin /opt/spark/; \
 mv sbin /opt/spark/; \
 mv kubernetes/dockerfiles/spark/decom.sh /opt/; \
diff --git a/3.4.2/scala2.12-java11-ubuntu/Dockerfile 
b/3.4.2/scala2.12-java11-ubuntu/Dockerfile
index 2a472f9..4c28cc0 100644
--- a/3.4.2/scala2.12-java11-ubuntu/Dockerfile
+++ b/3.4.2/scala2.12-java11-ubuntu/Dockerfile
@@ -55,6 +55,7 @@ RUN set -ex; \
 tar -xf spark.tgz --strip-components=1; \
 chown -R spark:spark .; \
 mv jars /opt/spark/; \
+mv RELEASE /opt/spark/; \
 mv bin /opt/spark/; \
 mv sbin /opt/spark/; \
 mv kubernetes/dockerfiles/spark/decom.sh /opt/; \
diff --git a/3.4.3/scala2.12-java11-ubuntu/Dockerfile 
b/3.4.3/scala2.12-java11-ubuntu/Dockerfile
index b749f07..4432dea 100644
--- a/3.4.3/scala2.12-java11-ubuntu/Dockerfile
+++ b/3.4.3/scala2.12-java11-ubuntu/Dockerfile
@@ -55,6 +55,7 @@ RUN set -ex; \
 tar -xf spark.tgz --strip-components=1; \
 chown -R spark:spark .; \
 mv jars /opt/spark/; \
+mv RELEASE /opt/spark/; \
 mv bin /opt/spark/; \
 mv sbin /opt/spark/; \
 mv kubernetes/dockerfiles/spark/decom.sh /opt/; \
diff --git a/3.5.0/scala2.12-java11-ubuntu/Dockerfile 
b/3.5.0/scala2.12-java11-ubuntu/Dockerfile
index 15f4b31..afc7e9a 100644
--- a/3.5.0/scala2.12-java11-ubuntu/Dockerfile
+++ b/3.5.0/scala2.12-java11-ubuntu/Dockerfile
@@ -55,6 +55,7 @@ RUN set -ex; \
 tar -xf spark.tgz --strip-components=1; \
 chown -R spark:spark .; \
 mv jars /opt/spark/; \
+mv RELEASE /opt/spark/; \
 mv bin /opt/spark/; \
 mv sbin /opt/spark/; \
 mv kubernetes/dockerfiles/spark/decom.sh /opt/; \
diff --git a/3.5.0/scala2.12-java17-ubuntu/Dockerfile 
b/3.5.0/scala2.12-java17-ubuntu/Dockerfile
index a2749bb..61c6687 100644
--- a/3.5.0/scala2.12-java17-ubuntu/Dockerfile
+++ b/3.5.0/scala2.12-java17-ubuntu/Dockerfile
@@ -55,6 +55,7 @@ RUN set -ex; \
 tar -xf spark.tgz --strip-components=1; \
 chown -R spark:spark .; \
 mv jars /opt/spark/; \
+mv RELEASE /opt/spark/; \
 mv bin /opt/spark/; \
 mv sbin /opt/spark/; \
 mv kubernetes/dockerfiles/spark/decom.sh /opt

(spark-docker) branch master updated: [SPARK-45497] Add a symbolic link file `spark-examples.jar` in K8s Docker images

2024-09-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new b69d21d  [SPARK-45497] Add a symbolic link file `spark-examples.jar` 
in K8s Docker images
b69d21d is described below

commit b69d21da5da1f5d35ea0125188a700c4ca9897bb
Author: Dongjoon Hyun 
AuthorDate: Wed Sep 18 16:09:30 2024 -0700

[SPARK-45497] Add a symbolic link file `spark-examples.jar` in K8s Docker 
images

### What changes were proposed in this pull request?

This PR aims to add a symbolic link file, `spark-examples.jar`, in the 
example jar directory.

Apache Spark repository is updated already via
- https://github.com/apache/spark/pull/43324

```
$ docker run -it --rm spark:latest ls -al /opt/spark/examples/jars  | tail 
-n6
total 1620
drwxr-xr-x 1 root root4096 Oct 11 04:37 .
drwxr-xr-x 1 root root4096 Sep  9 02:08 ..
-rw-r--r-- 1 root root   78803 Sep  9 02:08 scopt_2.12-3.7.1.jar
-rw-r--r-- 1 root root 1564255 Sep  9 02:08 spark-examples_2.12-3.5.0.jar
lrwxrwxrwx 1 root root  29 Oct 11 04:37 spark-examples.jar -> 
spark-examples_2.12-3.5.0.jar
```

### Why are the changes needed?

Like PySpark example (`pi.py`), we can submit the examples without 
considering the version numbers which was painful before.
```
bin/spark-submit \
--master k8s://$K8S_MASTER \
--deploy-mode cluster \
...
--class org.apache.spark.examples.SparkPi \
local:///opt/spark/examples/jars/spark-examples.jar 1
```

The following is the driver pod log.
```
+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit ...
--deploy-mode client
--properties-file /opt/spark/conf/spark.properties
--class org.apache.spark.examples.SparkPi
local:///opt/spark/examples/jars/spark-examples.jar 1
Files  local:///opt/spark/examples/jars/spark-examples.jar from 
/opt/spark/examples/jars/spark-examples.jar to 
/opt/spark/work-dir/./spark-examples.jar
```

### Does this PR introduce _any_ user-facing change?

No, this is an additional file.

### How was this patch tested?

Manually build the docker image and do `ls`.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #67 from dongjoon-hyun/SPARK-45497.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 4.0.0-preview1/scala2.13-java17-ubuntu/Dockerfile | 1 +
 Dockerfile.template   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/4.0.0-preview1/scala2.13-java17-ubuntu/Dockerfile 
b/4.0.0-preview1/scala2.13-java17-ubuntu/Dockerfile
index 0a487dd..2685a87 100644
--- a/4.0.0-preview1/scala2.13-java17-ubuntu/Dockerfile
+++ b/4.0.0-preview1/scala2.13-java17-ubuntu/Dockerfile
@@ -59,6 +59,7 @@ RUN set -ex; \
 mv sbin /opt/spark/; \
 mv kubernetes/dockerfiles/spark/decom.sh /opt/; \
 mv examples /opt/spark/; \
+ln -s "$(basename $(ls /opt/spark/examples/jars/spark-examples_*.jar))" 
/opt/spark/examples/jars/spark-examples.jar; \
 mv kubernetes/tests /opt/spark/; \
 mv data /opt/spark/; \
 mv python/pyspark /opt/spark/python/pyspark/; \
diff --git a/Dockerfile.template b/Dockerfile.template
index 3d0aacf..c19e961 100644
--- a/Dockerfile.template
+++ b/Dockerfile.template
@@ -59,6 +59,7 @@ RUN set -ex; \
 mv sbin /opt/spark/; \
 mv kubernetes/dockerfiles/spark/decom.sh /opt/; \
 mv examples /opt/spark/; \
+ln -s "$(basename $(ls /opt/spark/examples/jars/spark-examples_*.jar))" 
/opt/spark/examples/jars/spark-examples.jar; \
 mv kubernetes/tests /opt/spark/; \
 mv data /opt/spark/; \
 mv python/pyspark /opt/spark/python/pyspark/; \


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark-docker) branch master updated: [SPARK-49701] Use JDK for Spark 3.5+ Docker image

2024-09-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new daa6f94  [SPARK-49701] Use JDK for Spark 3.5+ Docker image
daa6f94 is described below

commit daa6f940a133d63a4e813bff811b98b1f05f1c4a
Author: Dongjoon Hyun 
AuthorDate: Wed Sep 18 16:08:32 2024 -0700

[SPARK-49701] Use JDK for Spark 3.5+ Docker image

### What changes were proposed in this pull request?

This PR aims to use JDK for Spark 3.5+ Docker image. Apache Spark 
Dockerfile are updated already.
- https://github.com/apache/spark/pull/45762
- https://github.com/apache/spark/pull/45761

### Why are the changes needed?

Since Apache Spark 3.5.0, SPARK-44153 starts to use `jmap` like the 
following.

- https://github.com/apache/spark/pull/41709


https://github.com/apache/spark/blob/c832e2ac1d04668c77493577662c639785808657/core/src/main/scala/org/apache/spark/util/Utils.scala#L2030

### Does this PR introduce _any_ user-facing change?

Yes, the user can use `Heap Histogram` feature.

### How was this patch tested?

Pass the CIs.

Closes #66 from dongjoon-hyun/SPARK-49701.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 3.5.0/scala2.12-java17-ubuntu/Dockerfile  | 2 +-
 3.5.1/scala2.12-java17-ubuntu/Dockerfile  | 2 +-
 3.5.2/scala2.12-java17-ubuntu/Dockerfile  | 2 +-
 4.0.0-preview1/scala2.13-java17-ubuntu/Dockerfile | 2 +-
 add-dockerfiles.sh| 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/3.5.0/scala2.12-java17-ubuntu/Dockerfile 
b/3.5.0/scala2.12-java17-ubuntu/Dockerfile
index ed29cba..a2749bb 100644
--- a/3.5.0/scala2.12-java17-ubuntu/Dockerfile
+++ b/3.5.0/scala2.12-java17-ubuntu/Dockerfile
@@ -14,7 +14,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
-FROM eclipse-temurin:17-jre-jammy
+FROM eclipse-temurin:17-jammy
 
 ARG spark_uid=185
 
diff --git a/3.5.1/scala2.12-java17-ubuntu/Dockerfile 
b/3.5.1/scala2.12-java17-ubuntu/Dockerfile
index 562d938..1682e72 100644
--- a/3.5.1/scala2.12-java17-ubuntu/Dockerfile
+++ b/3.5.1/scala2.12-java17-ubuntu/Dockerfile
@@ -14,7 +14,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
-FROM eclipse-temurin:17-jre-jammy
+FROM eclipse-temurin:17-jammy
 
 ARG spark_uid=185
 
diff --git a/3.5.2/scala2.12-java17-ubuntu/Dockerfile 
b/3.5.2/scala2.12-java17-ubuntu/Dockerfile
index 280bd0c..34b1214 100644
--- a/3.5.2/scala2.12-java17-ubuntu/Dockerfile
+++ b/3.5.2/scala2.12-java17-ubuntu/Dockerfile
@@ -14,7 +14,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
-FROM eclipse-temurin:17-jre-jammy
+FROM eclipse-temurin:17-jammy
 
 ARG spark_uid=185
 
diff --git a/4.0.0-preview1/scala2.13-java17-ubuntu/Dockerfile 
b/4.0.0-preview1/scala2.13-java17-ubuntu/Dockerfile
index 1102caf..0a487dd 100644
--- a/4.0.0-preview1/scala2.13-java17-ubuntu/Dockerfile
+++ b/4.0.0-preview1/scala2.13-java17-ubuntu/Dockerfile
@@ -14,7 +14,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
-FROM eclipse-temurin:17-jre-jammy
+FROM eclipse-temurin:17-jammy
 
 ARG spark_uid=185
 
diff --git a/add-dockerfiles.sh b/add-dockerfiles.sh
index 63f610c..ccc4ac1 100755
--- a/add-dockerfiles.sh
+++ b/add-dockerfiles.sh
@@ -72,7 +72,7 @@ for TAG in $TAGS; do
 fi
 
 if echo $TAG | grep -q "java17"; then
-OPTS+=" --java-version 17 --image eclipse-temurin:17-jre-jammy"
+OPTS+=" --java-version 17 --image eclipse-temurin:17-jammy"
 elif echo $TAG | grep -q "java11"; then
 OPTS+=" --java-version 11 --image eclipse-temurin:11-jre-focal"
 fi


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-49691][PYTHON][CONNECT] Function `substring` should accept column names

2024-09-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ed3a9b1aa929 [SPARK-49691][PYTHON][CONNECT] Function `substring` 
should accept column names
ed3a9b1aa929 is described below

commit ed3a9b1aa92957015592b399167a960b68b73beb
Author: Ruifeng Zheng 
AuthorDate: Wed Sep 18 09:28:09 2024 -0700

[SPARK-49691][PYTHON][CONNECT] Function `substring` should accept column 
names

### What changes were proposed in this pull request?
Function `substring` should accept column names

### Why are the changes needed?
Bug fix:

```
In [1]: >>> import pyspark.sql.functions as sf
   ...: >>> df = spark.createDataFrame([('Spark', 2, 3)], ['s', 'p', 
'l'])
   ...: >>> df.select('*', sf.substring('s', 'p', 'l')).show()
```

works in PySpark Classic, but fail in Connect with:
```
NumberFormatException Traceback (most recent call last)
Cell In[2], line 1
> 1 df.select('*', sf.substring('s', 'p', 'l')).show()

File ~/Dev/spark/python/pyspark/sql/connect/dataframe.py:1170, in 
DataFrame.show(self, n, truncate, vertical)
   1169 def show(self, n: int = 20, truncate: Union[bool, int] = True, 
vertical: bool = False) -> None:
-> 1170 print(self._show_string(n, truncate, vertical))

File ~/Dev/spark/python/pyspark/sql/connect/dataframe.py:927, in 
DataFrame._show_string(self, n, truncate, vertical)
910 except ValueError:
911 raise PySparkTypeError(
912 errorClass="NOT_BOOL",
913 messageParameters={
   (...)
916 },
917 )
919 table, _ = DataFrame(
920 plan.ShowString(
921 child=self._plan,
922 num_rows=n,
923 truncate=_truncate,
924 vertical=vertical,
925 ),
926 session=self._session,
--> 927 )._to_table()
928 return table[0][0].as_py()

File ~/Dev/spark/python/pyspark/sql/connect/dataframe.py:1844, in 
DataFrame._to_table(self)
   1842 def _to_table(self) -> Tuple["pa.Table", Optional[StructType]]:
   1843 query = self._plan.to_proto(self._session.client)
-> 1844 table, schema, self._execution_info = 
self._session.client.to_table(
   1845 query, self._plan.observations
   1846 )
   1847 assert table is not None
   1848 return (table, schema)

File ~/Dev/spark/python/pyspark/sql/connect/client/core.py:892, in 
SparkConnectClient.to_table(self, plan, observations)
890 req = self._execute_plan_request_with_metadata()
891 req.plan.CopyFrom(plan)
--> 892 table, schema, metrics, observed_metrics, _ = 
self._execute_and_fetch(req, observations)
894 # Create a query execution object.
895 ei = ExecutionInfo(metrics, observed_metrics)

File ~/Dev/spark/python/pyspark/sql/connect/client/core.py:1517, in 
SparkConnectClient._execute_and_fetch(self, req, observations, self_destruct)
   1514 properties: Dict[str, Any] = {}
   1516 with Progress(handlers=self._progress_handlers, 
operation_id=req.operation_id) as progress:
-> 1517 for response in self._execute_and_fetch_as_iterator(
   1518 req, observations, progress=progress
   1519 ):
   1520 if isinstance(response, StructType):
   1521 schema = response

File ~/Dev/spark/python/pyspark/sql/connect/client/core.py:1494, in 
SparkConnectClient._execute_and_fetch_as_iterator(self, req, observations, 
progress)
   1492 raise kb
   1493 except Exception as error:
-> 1494 self._handle_error(error)

File ~/Dev/spark/python/pyspark/sql/connect/client/core.py:1764, in 
SparkConnectClient._handle_error(self, error)
   1762 self.thread_local.inside_error_handling = True
   1763 if isinstance(error, grpc.RpcError):
-> 1764 self._handle_rpc_error(error)
   1765 elif isinstance(error, ValueError):
   1766 if "Cannot invoke RPC" in str(error) and "closed" in str(error):

File ~/Dev/spark/python/pyspark/sql/connect/client/core.py:1840, in 
SparkConnectClient._handle_rpc_error(self, rpc_error)
   1837 if info.metadata["errorClass"] == 
"INVALID_HANDLE.SESSION_CHANGED":
   1838 self._closed = True
-> 1840 raise convert_exception(
   1841 info,
   1842 status.message,
   1843 

(spark) branch master updated: [SPARK-49495][DOCS][FOLLOWUP] Enable GitHub Pages settings via .asf.yml

2024-09-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b86e5d2ab1fb [SPARK-49495][DOCS][FOLLOWUP] Enable GitHub Pages 
settings via .asf.yml
b86e5d2ab1fb is described below

commit b86e5d2ab1fb17f8dcbb5b4d50f3361494270438
Author: Kent Yao 
AuthorDate: Wed Sep 18 07:44:42 2024 -0700

[SPARK-49495][DOCS][FOLLOWUP] Enable GitHub Pages settings via .asf.yml

### What changes were proposed in this pull request?

A followup of SPARK-49495 to enable GitHub Pages settings via 
[.asf.yaml](https://cwiki.apache.org/confluence/pages/viewpage.action?spaceKey=INFRA&title=git+-+.asf.yaml+features#Git.asf.yamlfeatures-GitHubPages)

### Why are the changes needed?
Meet the requirement for `actions/configure-pagesv5` action

```
Run actions/configure-pagesv5
  with:
token: ***
enablement: false
  env:
SPARK_TESTING: 1
RELEASE_VERSION: In-Progress
JAVA_HOME: 
/opt/hostedtoolcache/Java_Zulu_jdk/17.0.1[2](https://github.com/apache/spark/actions/runs/10916383676/job/30297716064#step:10:2)-7/x64
JAVA_HOME_17_X64: /opt/hostedtoolcache/Java_Zulu_jdk/17.0.12-7/x64
pythonLocation: 
/opt/hostedtoolcache/Python/[3](https://github.com/apache/spark/actions/runs/10916383676/job/30297716064#step:10:3).9.19/x64
PKG_CONFIG_PATH: 
/opt/hostedtoolcache/Python/3.9.19/x6[4](https://github.com/apache/spark/actions/runs/10916383676/job/30297716064#step:10:4)/lib/pkgconfig
Python_ROOT_DIR: 
/opt/hostedtoolcache/Python/3.9.19/x[6](https://github.com/apache/spark/actions/runs/10916383676/job/30297716064#step:10:6)4
Python2_ROOT_DIR: /opt/hostedtoolcache/Python/3.9.19/x64
Python3_ROOT_DIR: 
/opt/hostedtoolcache/Python/3.[9](https://github.com/apache/spark/actions/runs/10916383676/job/30297716064#step:10:9).19/x64
LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.9.19/x64/lib
Error: Get Pages site failed. Please verify that the repository has Pages 
enabled and configured to build using GitHub Actions, or consider exploring the 
`enablement` parameter for this action. Error: Not Found - 
https://docs.github.com/rest/pages/pages#get-a-apiname-pages-site
Error: HttpError: Not Found - 
https://docs.github.com/rest/pages/pages#get-a-apiname-pages-site
```

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
NA

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #48141 from yaooqinn/SPARK-49495-FF.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 .asf.yaml | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/.asf.yaml b/.asf.yaml
index 22042b355b2f..91a5f9b2bb1a 100644
--- a/.asf.yaml
+++ b/.asf.yaml
@@ -31,6 +31,8 @@ github:
 merge: false
 squash: true
 rebase: true
+  ghp_branch: master
+  ghp_path: /docs/_site
 
 notifications:
   pullrequests: revi...@spark.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r71714 - /dev/spark/v3.5.3-rc3-bin/ /release/spark/spark-3.5.3/

2024-09-17 Thread dongjoon
Author: dongjoon
Date: Wed Sep 18 04:20:19 2024
New Revision: 71714

Log:
Release Apache Spark 3.5.3

Added:
release/spark/spark-3.5.3/
  - copied from r71713, dev/spark/v3.5.3-rc3-bin/
Removed:
dev/spark/v3.5.3-rc3-bin/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-49682][BUILD] Upgrade joda-time to 2.13.0

2024-09-17 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4590538df095 [SPARK-49682][BUILD] Upgrade joda-time to 2.13.0
4590538df095 is described below

commit 4590538df095b20c0736ecc992ed9c0dfb926c0e
Author: panbingkun 
AuthorDate: Tue Sep 17 21:14:52 2024 -0700

[SPARK-49682][BUILD] Upgrade joda-time to 2.13.0

### What changes were proposed in this pull request?
The pr aims to upgrade joda-time from `2.12.7` to `2.13.0`.

### Why are the changes needed?
The version `DateTimeZone` data updated to version `2024bgtz`.
The full release notes: 
https://www.joda.org/joda-time/changes-report.html#a2.13.0

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #48130 from panbingkun/SPARK-49682.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index e1ac039f2546..9871cc0bca04 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -146,7 +146,7 @@ jjwt-api/0.12.6//jjwt-api-0.12.6.jar
 jline/2.14.6//jline-2.14.6.jar
 jline/3.25.1//jline-3.25.1.jar
 jna/5.14.0//jna-5.14.0.jar
-joda-time/2.12.7//joda-time-2.12.7.jar
+joda-time/2.13.0//joda-time-2.13.0.jar
 jodd-core/3.5.2//jodd-core-3.5.2.jar
 jpam/1.1//jpam-1.1.jar
 json/1.8//json-1.8.jar
diff --git a/pom.xml b/pom.xml
index b9f28eb61925..694ea31e6f37 100644
--- a/pom.xml
+++ b/pom.xml
@@ -199,7 +199,7 @@
 2.11.0
 3.1.9
 3.0.12
-2.12.7
+2.13.0
 3.5.2
 3.0.0
 2.2.11


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-49687][SQL] Delay sorting in `validateAndMaybeEvolveStateSchema`

2024-09-17 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c38844c9ecc6 [SPARK-49687][SQL] Delay sorting in 
`validateAndMaybeEvolveStateSchema`
c38844c9ecc6 is described below

commit c38844c9ecc6dd648500b2ef6ff01acbe46255f4
Author: Zhihong Yu 
AuthorDate: Tue Sep 17 10:58:05 2024 -0700

[SPARK-49687][SQL] Delay sorting in `validateAndMaybeEvolveStateSchema`

### What changes were proposed in this pull request?
In `validateAndMaybeEvolveStateSchema`, existing schema and new schema are 
sorted by column family name.
The sorting can be delayed until `createSchemaFile` is called.
When computing `colFamiliesAddedOrRemoved`, we can use `toSet` to compare 
column families.

### Why are the changes needed?
This would make `validateAndMaybeEvolveStateSchema` faster.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?

Existing tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #48116 from tedyu/ty-comp-chk.

Authored-by: Zhihong Yu 
Signed-off-by: Dongjoon Hyun 
---
 .../streaming/state/StateSchemaCompatibilityChecker.scala  | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateSchemaCompatibilityChecker.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateSchemaCompatibilityChecker.scala
index 3a1793f71794..721d72b6a099 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateSchemaCompatibilityChecker.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateSchemaCompatibilityChecker.scala
@@ -168,12 +168,12 @@ class StateSchemaCompatibilityChecker(
   newStateSchema: List[StateStoreColFamilySchema],
   ignoreValueSchema: Boolean,
   stateSchemaVersion: Int): Boolean = {
-val existingStateSchemaList = 
getExistingKeyAndValueSchema().sortBy(_.colFamilyName)
-val newStateSchemaList = newStateSchema.sortBy(_.colFamilyName)
+val existingStateSchemaList = getExistingKeyAndValueSchema()
+val newStateSchemaList = newStateSchema
 
 if (existingStateSchemaList.isEmpty) {
   // write the schema file if it doesn't exist
-  createSchemaFile(newStateSchemaList, stateSchemaVersion)
+  createSchemaFile(newStateSchemaList.sortBy(_.colFamilyName), 
stateSchemaVersion)
   true
 } else {
   // validate if the new schema is compatible with the existing schema
@@ -188,9 +188,9 @@ class StateSchemaCompatibilityChecker(
 }
   }
   val colFamiliesAddedOrRemoved =
-newStateSchemaList.map(_.colFamilyName) != 
existingStateSchemaList.map(_.colFamilyName)
+(newStateSchemaList.map(_.colFamilyName).toSet != 
existingSchemaMap.keySet)
   if (stateSchemaVersion == SCHEMA_FORMAT_V3 && colFamiliesAddedOrRemoved) 
{
-createSchemaFile(newStateSchemaList, stateSchemaVersion)
+createSchemaFile(newStateSchemaList.sortBy(_.colFamilyName), 
stateSchemaVersion)
   }
   // TODO: [SPARK-49535] Write Schema files after schema has changed for 
StateSchemaV3
   colFamiliesAddedOrRemoved


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark-kubernetes-operator) branch main updated: [SPARK-49657] Add multi instances e2e

2024-09-16 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git


The following commit(s) were added to refs/heads/main by this push:
 new e161205  [SPARK-49657] Add multi instances e2e
e161205 is described below

commit e161205c781338114ddb86acafa2d0b2e19e05af
Author: Qi Tan 
AuthorDate: Mon Sep 16 21:27:01 2024 -0700

[SPARK-49657] Add multi instances e2e

### What changes were proposed in this pull request?
Add e2e test for two instances of Spark Operator running at the same time.

### Why are the changes needed?
There is one scenario, in a cluster, user deployed multi instances of 
operator and make them watching on several namespaces. For example,
operator A in default namespace, watching on namespace spark-1...
operator B in default-2 namespace, watching on namespace spark-2..

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
test locally

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #126 from TQJADE/multi-instance.

Authored-by: Qi Tan 
Signed-off-by: Dongjoon Hyun 
---
 .../dynamic-config-values-2.yaml}  | 43 +++
 tests/e2e/watched-namespaces/chainsaw-test.yaml| 50 +-
 tests/e2e/watched-namespaces/spark-example.yaml|  2 +
 3 files changed, 76 insertions(+), 19 deletions(-)

diff --git a/tests/e2e/watched-namespaces/spark-example.yaml 
b/tests/e2e/helm/dynamic-config-values-2.yaml
similarity index 54%
copy from tests/e2e/watched-namespaces/spark-example.yaml
copy to tests/e2e/helm/dynamic-config-values-2.yaml
index dba59ab..aacc0f1 100644
--- a/tests/e2e/watched-namespaces/spark-example.yaml
+++ b/tests/e2e/helm/dynamic-config-values-2.yaml
@@ -1,4 +1,3 @@
-#
 # Licensed to the Apache Software Foundation (ASF) under one or more
 # contributor license agreements.  See the NOTICE file distributed with
 # this work for additional information regarding copyright ownership.
@@ -6,27 +5,35 @@
 # (the "License"); you may not use this file except in compliance with
 # the License.  You may obtain a copy of the License at
 #
-#http://www.apache.org/licenses/LICENSE-2.0
+# http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-#
 
-apiVersion: spark.apache.org/v1alpha1
-kind: SparkApplication
-metadata:
-  name: spark-job-succeeded-test
-  namespace: ($SPARK_APP_NAMESPACE)
-spec:
-  mainClass: "org.apache.spark.examples.SparkPi"
-  jars: 
"local:///opt/spark/examples/jars/spark-examples_2.13-4.0.0-preview1.jar"
-  sparkConf:
-spark.executor.instances: "1"
-spark.kubernetes.container.image: 
"spark:4.0.0-preview1-scala2.13-java17-ubuntu"
-spark.kubernetes.authenticate.driver.serviceAccountName: "spark"
-  runtimeVersions:
-sparkVersion: 4.0.0-preview1
-scalaVersion: "2.13"
\ No newline at end of file
+workloadResources:
+  namespaces:
+overrideWatchedNamespaces: false
+data:
+  - "spark-3"
+  role:
+create: true
+  roleBinding:
+create: true
+  clusterRole:
+name: spark-workload-clusterrole-2
+
+operatorConfiguration:
+  dynamicConfig:
+enable: true
+create: true
+data:
+  spark.kubernetes.operator.watchedNamespaces: "spark-3"
+
+operatorRbac:
+  clusterRole:
+name: "spark-operator-clusterrole-2"
+  clusterRoleBinding:
+name: "spark-operator-clusterrolebinding-2"
\ No newline at end of file
diff --git a/tests/e2e/watched-namespaces/chainsaw-test.yaml 
b/tests/e2e/watched-namespaces/chainsaw-test.yaml
index 82ed409..fdffa0a 100644
--- a/tests/e2e/watched-namespaces/chainsaw-test.yaml
+++ b/tests/e2e/watched-namespaces/chainsaw-test.yaml
@@ -71,4 +71,52 @@ spec:
   content: |
 kubectl delete sparkapplication spark-job-succeeded-test -n 
spark-1 --ignore-not-found=true
 kubectl delete sparkapplication spark-job-succeeded-test -n 
spark-2 --ignore-not-found=true
-kubectl replace -f spark-operator-dynamic-config-2.yaml
\ No newline at end of file
+kubectl replace -f spark-operator-dynamic-config-2.yaml
+  - try:
+  - script:
+  content: |
+echo "Installing another spark operator in default-2 namespaces, 
watching on namespace: spark-3"
+helm install spark-kubernetes-operator -n default-2 
--create-namespace -f \
+../../../build-tools/helm/spark-kube

(spark-kubernetes-operator) branch main updated: [SPARK-49658] Refactor e2e tests pipelines

2024-09-16 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git


The following commit(s) were added to refs/heads/main by this push:
 new 2c6c507  [SPARK-49658] Refactor e2e tests pipelines
2c6c507 is described below

commit 2c6c50757f94d639ad97445477fb1a022812dc0b
Author: Qi Tan 
AuthorDate: Mon Sep 16 21:24:45 2024 -0700

[SPARK-49658] Refactor e2e tests pipelines

### What changes were proposed in this pull request?
e2e pipelines refactor

### Why are the changes needed?
* Current helm installation does not require creation of additional 
clusterrolebinding, serviceacount etc. Remove it.
* Current e2e pipeline run 3 times(because of the dimension of test-group) 
run of dynamic operator installation and watched-namespaces tests for one k8s 
version. Reduce it to 1 run only per k8s version.

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
test locally

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #125 from TQJADE/workflow-fix.

Authored-by: Qi Tan 
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/build_and_test.yml | 25 -
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 851a09d..bb30476 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -69,12 +69,23 @@ jobs:
 kubernetes-version:
   - "1.28.0"
   - "1.31.0"
+mode:
+  - dynamic
+  - static
 test-group:
   - spark-versions
   - python
   - state-transition
-dynamic-config-test-group:
   - watched-namespaces
+exclude:
+  - mode: dynamic
+test-group: spark-versions
+  - mode: dynamic
+test-group: python
+  - mode: dynamic
+test-group: state-transition
+  - mode: static
+test-group: watched-namespaces
 steps:
   - name: Checkout repository
 uses: actions/checkout@v4
@@ -101,8 +112,8 @@ jobs:
   kubectl get pods -A
   kubectl describe node
   - name: Run Spark K8s Operator on K8S with Dynamic Configuration Disabled
+if: matrix.mode == 'static'
 run: |
-  kubectl create clusterrolebinding serviceaccounts-cluster-admin 
--clusterrole=cluster-admin --group=system:serviceaccounts || true
   eval $(minikube docker-env)
   ./gradlew buildDockerImage
   ./gradlew spark-operator-api:relocateGeneratedCRD
@@ -111,20 +122,24 @@ jobs:
   # Use remote host' s docker image
   minikube docker-env --unset
   - name: Run E2E Test with Dynamic Configuration Disabled
-run: |
+if: matrix.mode == 'static'
+run: |
   chainsaw test --test-dir ./tests/e2e/${{ matrix.test-group }} 
--parallel 2
   - name: Run Spark K8s Operator on K8S with Dynamic Configuration Enabled
+if: matrix.mode == 'dynamic'
 run: |
-  helm uninstall spark-kubernetes-operator
   eval $(minikube docker-env)
+  ./gradlew buildDockerImage
+  ./gradlew spark-operator-api:relocateGeneratedCRD
   helm install spark-kubernetes-operator --create-namespace -f \
   build-tools/helm/spark-kubernetes-operator/values.yaml -f \
   tests/e2e/helm/dynamic-config-values.yaml \
   build-tools/helm/spark-kubernetes-operator/
   minikube docker-env --unset
   - name: Run E2E Test with Dynamic Configuration Enabled
+if: matrix.mode == 'dynamic'
 run: |
-  chainsaw test --test-dir ./tests/e2e/${{ 
matrix.dynamic-config-test-group }} --parallel 2
+  chainsaw test --test-dir ./tests/e2e/${{ matrix.test-group }} 
--parallel 2
 
   lint:
 name: "Linter and documentation"


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-49678][CORE] Support `spark.test.master` in `SparkSubmitArguments`

2024-09-16 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 370453adba17 [SPARK-49678][CORE] Support `spark.test.master` in 
`SparkSubmitArguments`
370453adba17 is described below

commit 370453adba1730b5412750b34e87a35147d71aa2
Author: Dongjoon Hyun 
AuthorDate: Mon Sep 16 20:53:35 2024 -0700

[SPARK-49678][CORE] Support `spark.test.master` in `SparkSubmitArguments`

### What changes were proposed in this pull request?

This PR aims to support `spark.test.master` in `SparkSubmitArguments`.

### Why are the changes needed?

To allow users to control the default master setting during testing and 
documentation generation.

 First, currently, we cannot build `Python Documentation` on M3 Max 
(and high-core machines) without this. Only it succeeds on GitHub Action 
runners (4 cores) or equivalent low-core docker run. Please try the following 
on your Macs.

**BEFORE**
```
$ build/sbt package -Phive-thriftserver
$ cd python/docs
$ make html
...
java.lang.OutOfMemoryError: Java heap space
...
24/09/16 14:09:55 WARN PythonRunner: Incomplete task 7.0 in stage 30 (TID 
177) interrupted: Attempting to kill Python Worker
...
make: *** [html] Error 2
```

**AFTER**
```
$ build/sbt package -Phive-thriftserver
$ cd python/docs
$ JDK_JAVA_OPTIONS="-Dspark.test.master=local[1]" make html
...
build succeeded.

The HTML pages are in build/html.
```

 Second, in general, we can control all `SparkSubmit` (eg. Spark 
Shells) like the following.

**BEFORE (`local[*]`)**
```
$ bin/pyspark
Python 3.9.19 (main, Jun 17 2024, 15:39:29)
[Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
WARNING: Using incubator modules: jdk.incubator.vector
Using Spark's default log4j profile: 
org/apache/spark/log4j2-pattern-layout-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
24/09/16 13:53:02 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 4.0.0-SNAPSHOT
  /_/

Using Python version 3.9.19 (main, Jun 17 2024 15:39:29)
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = 
local-1726519982935).
SparkSession available as 'spark'.
>>>
```

**AFTER (`local[1]`)**
```
$ JDK_JAVA_OPTIONS="-Dspark.test.master=local[1]" bin/pyspark
NOTE: Picked up JDK_JAVA_OPTIONS: -Dspark.test.master=local[1]
Python 3.9.19 (main, Jun 17 2024, 15:39:29)
[Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
NOTE: Picked up JDK_JAVA_OPTIONS: -Dspark.test.master=local[1]
NOTE: Picked up JDK_JAVA_OPTIONS: -Dspark.test.master=local[1]
WARNING: Using incubator modules: jdk.incubator.vector
Using Spark's default log4j profile: 
org/apache/spark/log4j2-pattern-layout-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
24/09/16 13:51:03 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 4.0.0-SNAPSHOT
  /_/

Using Python version 3.9.19 (main, Jun 17 2024 15:39:29)
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[1], app id = 
local-1726519863363).
SparkSession available as 'spark'.
>>>
```

### Does this PR introduce _any_ user-facing change?

No. `spark.test.master` is a new parameter.

### How was this patch tested?

Manual tests.

    ### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #48126 from dongjoon-hyun/SPARK-49678.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala | 3 ++-
 1 file changed

(spark) branch master updated: [SPARK-49680][PYTHON] Limit `Sphinx` build parallelism to 4 by default

2024-09-16 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 294af6e31639 [SPARK-49680][PYTHON] Limit `Sphinx` build parallelism to 
4 by default
294af6e31639 is described below

commit 294af6e31639d6f6ac51f961f319866f077b5302
Author: Dongjoon Hyun 
AuthorDate: Mon Sep 16 20:52:28 2024 -0700

[SPARK-49680][PYTHON] Limit `Sphinx` build parallelism to 4 by default

### What changes were proposed in this pull request?

This PR aims to limit `Sphinx` build parallelism to 4 by default for the 
following goals.
- This will preserve the same speed in GitHub Action environment.
- This will prevent the exhaustive `SparkSubmit` invocation in large 
machines like `c6i.24xlarge`.
- The user still can override by providing `SPHINXOPTS`.

### Why are the changes needed?

`Sphinx` parallelism feature was added via the following on 2024-01-10.

- #44680

However, unfortunately, this breaks Python API doc generation in large 
machines because this means the number of parallel `SparkSubmit` invocation of 
PySpark. In addition, given that each `PySpark` currently is launched with 
`local[*]`, this ends up `N * N` `pyspark.daemon`s.

In other words, as of today, this default setting, `auto`, seems to work on 
low-core machine like `GitHub Action` runners (4 cores). For example, this 
breaks `Python` documentations build even on M3 Max environment and this is 
worse on large EC2 machines (c7i.24xlarge). You can see the failure locally 
like this.

```
$ build/sbt package -Phive-thriftserver
$ cd python/docs
$ make html
...
24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 
4040. Attempting port 4041.
24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 
4041. Attempting port 4042.
24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 
4042. Attempting port 4043.
24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 
4040. Attempting port 4041.
24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 
4041. Attempting port 4042.
24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 
4042. Attempting port 4043.
24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 
4043. Attempting port 4044.
24/09/16 17:04:39 WARN Utils: Service 'SparkUI' could not bind on port 
4040. Attempting port 4041.
24/09/16 17:04:39 WARN Utils: Service 'SparkUI' could not bind on port 
4041. Attempting port 4042.
24/09/16 17:04:39 WARN Utils: Service 'SparkUI' could not bind on port 
4042. Attempting port 4043.
24/09/16 17:04:39 WARN Utils: Service 'SparkUI' could not bind on port 
4043. Attempting port 4044.
24/09/16 17:04:39 WARN Utils: Service 'SparkUI' could not bind on port 
4044. Attempting port 4045.
...
java.lang.OutOfMemoryError: Java heap space
...
24/09/16 14:09:55 WARN PythonRunner: Incomplete task 7.0 in stage 30 (TID 
177) interrupted: Attempting to kill Python Worker
...
make: *** [html] Error 2
```

### Does this PR introduce _any_ user-facing change?

No, this is a dev-only change.

### How was this patch tested?

Pass the CIs and do manual tests.
    
### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #48129 from dongjoon-hyun/SPARK-49680.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 python/docs/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/python/docs/Makefile b/python/docs/Makefile
index 5058c1206171..428b0d24b568 100644
--- a/python/docs/Makefile
+++ b/python/docs/Makefile
@@ -16,7 +16,7 @@
 # Minimal makefile for Sphinx documentation
 
 # You can set these variables from the command line.
-SPHINXOPTS?= "-W" "-j" "auto"
+SPHINXOPTS?= "-W" "-j" "4"
 SPHINXBUILD   ?= sphinx-build
 SOURCEDIR ?= source
 BUILDDIR  ?= build


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r71610 - in /dev/spark/v4.0.0-preview2-rc1-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/R/articles/ _site/api/R/articles/sparkr-vignettes_files/ _site/api/R/articles/sparkr-vignettes_

2024-09-15 Thread dongjoon
Author: dongjoon
Date: Mon Sep 16 06:20:30 2024
New Revision: 71610

Log:
Apache Spark v4.0.0-preview2-rc1 docs


[This commit notification would consist of 4901 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r71609 - /dev/spark/v4.0.0-preview2-rc1-bin/

2024-09-15 Thread dongjoon
Author: dongjoon
Date: Mon Sep 16 04:21:15 2024
New Revision: 71609

Log:
Apache Spark v4.0.0-preview2-rc1

Added:
dev/spark/v4.0.0-preview2-rc1-bin/
dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz   (with 
props)
dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz.asc
dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz.sha512
dev/spark/v4.0.0-preview2-rc1-bin/pyspark-4.0.0.dev2.tar.gz   (with props)
dev/spark/v4.0.0-preview2-rc1-bin/pyspark-4.0.0.dev2.tar.gz.asc
dev/spark/v4.0.0-preview2-rc1-bin/pyspark-4.0.0.dev2.tar.gz.sha512
dev/spark/v4.0.0-preview2-rc1-bin/pyspark_connect-4.0.0.dev2.tar.gz   (with 
props)
dev/spark/v4.0.0-preview2-rc1-bin/pyspark_connect-4.0.0.dev2.tar.gz.asc
dev/spark/v4.0.0-preview2-rc1-bin/pyspark_connect-4.0.0.dev2.tar.gz.sha512
dev/spark/v4.0.0-preview2-rc1-bin/spark-4.0.0-preview2-bin-hadoop3.tgz   
(with props)
dev/spark/v4.0.0-preview2-rc1-bin/spark-4.0.0-preview2-bin-hadoop3.tgz.asc

dev/spark/v4.0.0-preview2-rc1-bin/spark-4.0.0-preview2-bin-hadoop3.tgz.sha512

dev/spark/v4.0.0-preview2-rc1-bin/spark-4.0.0-preview2-bin-without-hadoop.tgz   
(with props)

dev/spark/v4.0.0-preview2-rc1-bin/spark-4.0.0-preview2-bin-without-hadoop.tgz.asc

dev/spark/v4.0.0-preview2-rc1-bin/spark-4.0.0-preview2-bin-without-hadoop.tgz.sha512
dev/spark/v4.0.0-preview2-rc1-bin/spark-4.0.0-preview2.tgz   (with props)
dev/spark/v4.0.0-preview2-rc1-bin/spark-4.0.0-preview2.tgz.asc
dev/spark/v4.0.0-preview2-rc1-bin/spark-4.0.0-preview2.tgz.sha512

Added: dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz.asc
==
--- dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz.asc (added)
+++ dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz.asc Mon Sep 
16 04:21:15 2024
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJIBAABCgAyFiEE8oycklwYjDXjRWFN7aAM6DTw/FwFAmbnrf8UHGRvbmdqb29u
+QGFwYWNoZS5vcmcACgkQ7aAM6DTw/Fz4mxAAi4XIEwNrYyR9BZmVDYwugMureFuN
+B0+8b04SUdFO0DEIG1Lr5P3B2M1Ku8dRoKSZ0dyECWHqGzqIk+fW5Su2A6Jz7FPY
+RUwLQgd2CrP1He08gpS1vvjD0sOU2pJq4pGIh0BKwFKhTneUxvUO88jNTooPkRcJ
+GU8Zd68/2h5iAKl+qGn9wL8g3Vh856+TwgKoD2g/P4kB5LstoYo/l7cdooRFs/B+
+vGwNsaNWD0JCMNCKY7E1TWNqk5l8vjQqZZ3VkPhrUIFmmXYbb92af89ro4YeRGZx
+rLZeKWRUcp8DukOp4Qa0vMY67sPjW1uYPpx4Qy/vmnwjQ9+hsqEq29k0zlwhOn/B
+agHnsNPJM2LVabtr+H5/bZOh2Oovyb3rFenHS4sgIE7khycmtyAc9VMZynRt+hLo
+9IrK4GQ4/OtgXnE9U/hq/s3DdtgyWy3pqRhWi3cFlEkWAiUrKwTHmv7V4Drk4jVV
+vHawTH5RF1ZhjQsFfrX+tk4Rkws3qyj8LXdKOLh3f8eG4c2kIy8RwBETmUE6w6zN
+RQvA+gCFiBoBPVnsOZ9umgFHqdwCUet5vMEle9oRp5qPZwtBksECOr43mXWLkJmn
+odq4j7nFcY4o7D76lwO1OOKncIAbewbwJWXQWeAUQqkv0UhEKfgMQm0JDRkIruyr
+2V085CdzD5hmN3A=
+=Hbfw
+-END PGP SIGNATURE-

Added: dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz.sha512
==
--- dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz.sha512 
(added)
+++ dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz.sha512 Mon 
Sep 16 04:21:15 2024
@@ -0,0 +1 @@
+c1ed617c4eea52dd30fd933e19211dd400aa3fb412e9504fe90d5a35383aac0ae2690d8aab5b623abc7d94b73c3c544ff538fa3f74c055f528962c863f823394
  SparkR_4.0.0-preview2.tar.gz

Added: dev/spark/v4.0.0-preview2-rc1-bin/pyspark-4.0.0.dev2.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v4.0.0-preview2-rc1-bin/pyspark-4.0.0.dev2.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v4.0.0-preview2-rc1-bin/pyspark-4.0.0.dev2.tar.gz.asc
==
--- dev/spark/v4.0.0-preview2-rc1-bin/pyspark-4.0.0.dev2.tar.gz.asc (added)
+++ dev/spark/v4.0.0-preview2-rc1-bin/pyspark-4.0.0.dev2.tar.gz.asc Mon Sep 16 
04:21:15 2024
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJIBAABCgAyFiEE8oycklwYjDXjRWFN7aAM6DTw/FwFAmbnrgEUHGRvbmdqb29u
+QGFwYWNoZS5vcmcACgkQ7aAM6DTw/Fy4yg//ZtSKpiZeUCdZFfk4iupr6rbzIzJl
+p4b2tdquNm37zbh/kdR7jwFfXxkXR0AEHnRHy1H4htVFkL28VijVCzin7qkbXe0w
+wfA0B5XUaStCx3ri7M8OOhLsZZUVTdXFyab0qq21Dd09THTphsg8UpQ9XqGOYjlq
+fRbc2BykgqPxN5fqATXXTw1PtC6ej3Eup5VdDGvcgHN0Fbw3XgDRolpTvbMvk2Vx
+V0avnMd5OzugLlHujW7LErJz1ugYIJKDUjtYnu9iYdFcBLP6HFwiMEu0MptjjM9T
+Tojyj6qCZJ2mBd5BKuzLxY0PrlwS/EZWkao6gRuJ+TbiudSHza4UOonATt3HqeOw
+3WgIrf14fveQT6jX4KoP4aSBrWAWiZF+BhDermh0Dq3ksSyj4RK2gHWlcyIdm+6j

(spark) 01/01: Preparing Spark release v4.0.0-preview2-rc1

2024-09-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to tag v4.0.0-preview2-rc1
in repository https://gitbox.apache.org/repos/asf/spark.git

commit f0d465e09b8d89d5e56ec21f4bd7e3ecbeeb318a
Author: Dongjoon Hyun 
AuthorDate: Mon Sep 16 03:31:26 2024 +

Preparing Spark release v4.0.0-preview2-rc1
---
 R/pkg/R/sparkR.R   | 4 ++--
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 common/utils/pom.xml   | 2 +-
 common/variant/pom.xml | 2 +-
 connector/avro/pom.xml | 2 +-
 connector/connect/client/jvm/pom.xml   | 2 +-
 connector/docker-integration-tests/pom.xml | 2 +-
 connector/kafka-0-10-assembly/pom.xml  | 2 +-
 connector/kafka-0-10-sql/pom.xml   | 2 +-
 connector/kafka-0-10-token-provider/pom.xml| 2 +-
 connector/kafka-0-10/pom.xml   | 2 +-
 connector/kinesis-asl-assembly/pom.xml | 2 +-
 connector/kinesis-asl/pom.xml  | 2 +-
 connector/profiler/pom.xml | 2 +-
 connector/protobuf/pom.xml | 2 +-
 connector/spark-ganglia-lgpl/pom.xml   | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/api/pom.xml| 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/connect/common/pom.xml | 2 +-
 sql/connect/server/pom.xml | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 46 files changed, 49 insertions(+), 49 deletions(-)

diff --git a/R/pkg/R/sparkR.R b/R/pkg/R/sparkR.R
index 29c05b0db7c2..2b57ca77a4ed 100644
--- a/R/pkg/R/sparkR.R
+++ b/R/pkg/R/sparkR.R
@@ -461,8 +461,8 @@ sparkR.session <- function(
 
   # Check if version number of SparkSession matches version number of SparkR 
package
   jvmVersion <- callJMethod(sparkSession, "version")
-  # Remove -SNAPSHOT from jvm versions
-  jvmVersionStrip <- gsub("-SNAPSHOT", "", jvmVersion, fixed = TRUE)
+  # Remove -preview2 from jvm versions
+  jvmVersionStrip <- gsub("-preview2", "", jvmVersion, fixed = TRUE)
   rPackageVersion <- paste0(packageVersion("SparkR"))
 
   if (jvmVersionStrip != rPackageVersion) {
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 01bd324efc11..b72cf3ef3f3b 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.13
-4.0.0-SNAPSHOT
+4.0.0-preview2
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 046648e9c2ae..322279d66e17 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.13
-4.0.0-SNAPSHOT
+4.0.0-preview2
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index cdb5bd72158a..93bca7177ce0 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.13
-4.0.0-SNAPSHOT
+4.0.0-preview2
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 0f7036ef

(spark) tag v4.0.0-preview2-rc1 created (now f0d465e09b8d)

2024-09-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to tag v4.0.0-preview2-rc1
in repository https://gitbox.apache.org/repos/asf/spark.git


  at f0d465e09b8d (commit)
This tag includes the following new commits:

 new f0d465e09b8d Preparing Spark release v4.0.0-preview2-rc1

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r71607 - /dev/spark/v4.0.0-preview2-rc1-bin/

2024-09-15 Thread dongjoon
Author: dongjoon
Date: Mon Sep 16 01:35:01 2024
New Revision: 71607

Log:
Remove v4.0.0-preview2-rc1-bin


Removed:
dev/spark/v4.0.0-preview2-rc1-bin/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) tag v4.0.0-preview2-rc1 deleted (was 383afc7aca30)

2024-09-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to tag v4.0.0-preview2-rc1
in repository https://gitbox.apache.org/repos/asf/spark.git


*** WARNING: tag v4.0.0-preview2-rc1 was deleted! ***

 was 383afc7aca30 Preparing Spark release v4.0.0-preview2-rc1

This change permanently discards the following revisions:

 discard 383afc7aca30 Preparing Spark release v4.0.0-preview2-rc1


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-49655][BUILD] Link `python3` to `python3.9` in `spark-rm` Docker image

2024-09-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b4f4d9b7a7d4 [SPARK-49655][BUILD] Link `python3` to `python3.9` in 
`spark-rm` Docker image
b4f4d9b7a7d4 is described below

commit b4f4d9b7a7d470158af39d75824bcc501e3506da
Author: Dongjoon Hyun 
AuthorDate: Sun Sep 15 18:17:34 2024 -0700

[SPARK-49655][BUILD] Link `python3` to `python3.9` in `spark-rm` Docker 
image

### What changes were proposed in this pull request?

This PR aims to link `python3` to `python3.9` in `spark-rm` docker image.

### Why are the changes needed?

We already link `python` to `python3.9`.

https://github.com/apache/spark/blob/931ab065df3952487028316ebd49c2895d947bf2/dev/create-release/spark-rm/Dockerfile#L139

We need to link `python3` to `python3.9` to fix Spark Documentation 
generation failure in release script.

```
$ dev/create-release/do-release-docker.sh -d /run/user/1000/spark -s docs
...
= Building documentation...
Command: /opt/spark-rm/release-build.sh docs
Log file: docs.log
Command FAILED. Check full logs for details.
from 
/opt/spark-rm/output/spark/docs/.local_ruby_bundle/ruby/3.0.0/gems/jekyll-4.3.3/lib/jekyll/command.rb:91:in
 `process_with_graceful_fail'
```

The root cause is `mkdocs` module import error during 
`error-conditions.html` generation.

### Does this PR introduce _any_ user-facing change?

No. This is a release-script.

### How was this patch tested?

Manual review.

After this PR, `error docs` generation succeeds.
```

* Building error docs. *

Generated: docs/_generated/error-conditions.html
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #48117 from dongjoon-hyun/SPARK-49655.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 dev/create-release/spark-rm/Dockerfile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/dev/create-release/spark-rm/Dockerfile 
b/dev/create-release/spark-rm/Dockerfile
index e7f558b523d0..3cba72d042ed 100644
--- a/dev/create-release/spark-rm/Dockerfile
+++ b/dev/create-release/spark-rm/Dockerfile
@@ -137,6 +137,7 @@ RUN python3.9 -m pip list
 
 RUN gem install --no-document "bundler:2.4.22"
 RUN ln -s "$(which python3.9)" "/usr/local/bin/python"
+RUN ln -s "$(which python3.9)" "/usr/local/bin/python3"
 
 WORKDIR /opt/spark-rm/output
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r71606 - /dev/spark/v4.0.0-preview2-rc1-bin/

2024-09-15 Thread dongjoon
Author: dongjoon
Date: Sun Sep 15 23:24:31 2024
New Revision: 71606

Log:
Apache Spark v4.0.0-preview2-rc1

Added:
dev/spark/v4.0.0-preview2-rc1-bin/
dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz   (with 
props)
dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz.asc
dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz.sha512
dev/spark/v4.0.0-preview2-rc1-bin/pyspark-4.0.0.dev2.tar.gz   (with props)
dev/spark/v4.0.0-preview2-rc1-bin/pyspark-4.0.0.dev2.tar.gz.asc
dev/spark/v4.0.0-preview2-rc1-bin/pyspark-4.0.0.dev2.tar.gz.sha512
dev/spark/v4.0.0-preview2-rc1-bin/pyspark_connect-4.0.0.dev2.tar.gz   (with 
props)
dev/spark/v4.0.0-preview2-rc1-bin/pyspark_connect-4.0.0.dev2.tar.gz.asc
dev/spark/v4.0.0-preview2-rc1-bin/pyspark_connect-4.0.0.dev2.tar.gz.sha512
dev/spark/v4.0.0-preview2-rc1-bin/spark-4.0.0-preview2-bin-hadoop3.tgz   
(with props)
dev/spark/v4.0.0-preview2-rc1-bin/spark-4.0.0-preview2-bin-hadoop3.tgz.asc

dev/spark/v4.0.0-preview2-rc1-bin/spark-4.0.0-preview2-bin-hadoop3.tgz.sha512

dev/spark/v4.0.0-preview2-rc1-bin/spark-4.0.0-preview2-bin-without-hadoop.tgz   
(with props)

dev/spark/v4.0.0-preview2-rc1-bin/spark-4.0.0-preview2-bin-without-hadoop.tgz.asc

dev/spark/v4.0.0-preview2-rc1-bin/spark-4.0.0-preview2-bin-without-hadoop.tgz.sha512
dev/spark/v4.0.0-preview2-rc1-bin/spark-4.0.0-preview2.tgz   (with props)
dev/spark/v4.0.0-preview2-rc1-bin/spark-4.0.0-preview2.tgz.asc
dev/spark/v4.0.0-preview2-rc1-bin/spark-4.0.0-preview2.tgz.sha512

Added: dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz.asc
==
--- dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz.asc (added)
+++ dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz.asc Sun Sep 
15 23:24:31 2024
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJIBAABCgAyFiEE8oycklwYjDXjRWFN7aAM6DTw/FwFAmbnaEoUHGRvbmdqb29u
+QGFwYWNoZS5vcmcACgkQ7aAM6DTw/Fxasg//YTaWh8pe9fzQFS88xg7y57LYn7wZ
+H1Yp27Zp9bnFcVzfdbCHxVB+2vyCXRf9ssMXuepbluIaH7C6esljwU85RMd8xWK6
+AA3HyoQFfUGG+ItC4eVUeZmrq8C0sCd7f8NEt4x7WgSuvDBRmkt3cmRN42kqCPeo
+rbglfmpHH4Dkh43LCecPTIgraJ44+1k7mUd5RhjEQ/21rxa2SpBAqpfhT4lL5wvl
+cQzr690pmDb2+tSMhmbEfrLU3gmUDy9HnlvNUGK1/MHEUfjAcxMGeue7B+PX9eue
+pOyRWYeMMhXoM+CVij4gmnJEUelBDGzLlYOtzd6A3REy5XYkh+3Jpryv3Zc+6iVH
+YQbbhO7eSEf3XPQJkz/dcX80UG0mVpsHMnEyOufOdB4hrKjjnsUa4u45PJ3h+Kt+
+R6KnIv79QT/m9IGSTEdl3rCHf1WvXHwUNnOW/XltyHDCceVHemMfA/qScw7IhEDR
+ZeIlz1+qbFptdDznmbrJQRu33L0Td0brSggaLFPrOjN0UV0mpmZIk8MN0cFoImmp
+XE8hPKIHSd9YFLi4VD6n3cUSnBORHQIXNIWW59HCgRiBJJmMV+8Hh8Vm/LDF9UuN
+1/wjCwF1KzQOtp8eC/GmmPUJROk2mfmEp7jlGVqPmYuCnzfrpZy833628ZDpoYjI
+JoG3U7goRsAjq1g=
+=c4Pk
+-END PGP SIGNATURE-

Added: dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz.sha512
==
--- dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz.sha512 
(added)
+++ dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz.sha512 Sun 
Sep 15 23:24:31 2024
@@ -0,0 +1 @@
+92ebfbcedd6e9f3f74b142815f0852c3f98935521863a5dca27ab76cdd90a81d7c26b5f5621b7dbb5bed1605c276b7677dec90391c13cb2095bc4095163dabe3
  SparkR_4.0.0-preview2.tar.gz

Added: dev/spark/v4.0.0-preview2-rc1-bin/pyspark-4.0.0.dev2.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v4.0.0-preview2-rc1-bin/pyspark-4.0.0.dev2.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v4.0.0-preview2-rc1-bin/pyspark-4.0.0.dev2.tar.gz.asc
==
--- dev/spark/v4.0.0-preview2-rc1-bin/pyspark-4.0.0.dev2.tar.gz.asc (added)
+++ dev/spark/v4.0.0-preview2-rc1-bin/pyspark-4.0.0.dev2.tar.gz.asc Sun Sep 15 
23:24:31 2024
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJIBAABCgAyFiEE8oycklwYjDXjRWFN7aAM6DTw/FwFAmbnaEwUHGRvbmdqb29u
+QGFwYWNoZS5vcmcACgkQ7aAM6DTw/FyjdA//f/G8gA1KHtiDgt7eH4jozIYPL2lU
+5xyo8VjI+/iatzYxemsTaUdPnv8PDpdDYE/c7PuXqwXPaXUBwQFE+ReAxnFrhvLM
+yLMjOGYnZoDyHOs38IMASQz5k8L4YceknCWWHmHwkeCl8N0BVMXFozWi4tlz2CWR
+jeB1VA9gUUNu8OXw2WcIhtKU6CvaoOhVc3TTb16Ma2m4cViATBZChvQi6E47sGEb
+mBkwIWCkX+d6NnL0LlqYo6af/CSVZMMfLLFcja4G5j1iAWsPhihvH7rRQZPHaavH
+oSPZj8+7Iy0y5YQbB9f+pt66AptUftNUvJgTAqyTn1iO0LiFHldXAFcxibh3cFYz
+XxZcy/mzY+48umCE9J4Wq7YvLC0RM/wXweQU7JXslAT5m1p74chf/Ax9RO9OxWu2

(spark) 01/01: Preparing Spark release v4.0.0-preview2-rc1

2024-09-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to tag v4.0.0-preview2-rc1
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 383afc7aca302579e7b9094d5890e95bc045e49a
Author: Dongjoon Hyun 
AuthorDate: Sun Sep 15 22:25:30 2024 +

Preparing Spark release v4.0.0-preview2-rc1
---
 R/pkg/R/sparkR.R   | 4 ++--
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 common/utils/pom.xml   | 2 +-
 common/variant/pom.xml | 2 +-
 connector/avro/pom.xml | 2 +-
 connector/connect/client/jvm/pom.xml   | 2 +-
 connector/docker-integration-tests/pom.xml | 2 +-
 connector/kafka-0-10-assembly/pom.xml  | 2 +-
 connector/kafka-0-10-sql/pom.xml   | 2 +-
 connector/kafka-0-10-token-provider/pom.xml| 2 +-
 connector/kafka-0-10/pom.xml   | 2 +-
 connector/kinesis-asl-assembly/pom.xml | 2 +-
 connector/kinesis-asl/pom.xml  | 2 +-
 connector/profiler/pom.xml | 2 +-
 connector/protobuf/pom.xml | 2 +-
 connector/spark-ganglia-lgpl/pom.xml   | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/api/pom.xml| 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/connect/common/pom.xml | 2 +-
 sql/connect/server/pom.xml | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 46 files changed, 49 insertions(+), 49 deletions(-)

diff --git a/R/pkg/R/sparkR.R b/R/pkg/R/sparkR.R
index 29c05b0db7c2..2b57ca77a4ed 100644
--- a/R/pkg/R/sparkR.R
+++ b/R/pkg/R/sparkR.R
@@ -461,8 +461,8 @@ sparkR.session <- function(
 
   # Check if version number of SparkSession matches version number of SparkR 
package
   jvmVersion <- callJMethod(sparkSession, "version")
-  # Remove -SNAPSHOT from jvm versions
-  jvmVersionStrip <- gsub("-SNAPSHOT", "", jvmVersion, fixed = TRUE)
+  # Remove -preview2 from jvm versions
+  jvmVersionStrip <- gsub("-preview2", "", jvmVersion, fixed = TRUE)
   rPackageVersion <- paste0(packageVersion("SparkR"))
 
   if (jvmVersionStrip != rPackageVersion) {
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 01bd324efc11..b72cf3ef3f3b 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.13
-4.0.0-SNAPSHOT
+4.0.0-preview2
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 046648e9c2ae..322279d66e17 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.13
-4.0.0-SNAPSHOT
+4.0.0-preview2
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index cdb5bd72158a..93bca7177ce0 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.13
-4.0.0-SNAPSHOT
+4.0.0-preview2
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 0f7036ef

(spark) tag v4.0.0-preview2-rc1 created (now 383afc7aca30)

2024-09-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to tag v4.0.0-preview2-rc1
in repository https://gitbox.apache.org/repos/asf/spark.git


  at 383afc7aca30 (commit)
This tag includes the following new commits:

 new 383afc7aca30 Preparing Spark release v4.0.0-preview2-rc1

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-48355][SQL][TESTS][FOLLOWUP] Disable a test case failing on non-ANSI mode

2024-09-14 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1346531ccc6e [SPARK-48355][SQL][TESTS][FOLLOWUP] Disable a test case 
failing on non-ANSI mode
1346531ccc6e is described below

commit 1346531ccc6ee814d5b357158a4c4aed2bf1d573
Author: Dongjoon Hyun 
AuthorDate: Sat Sep 14 21:10:29 2024 -0700

[SPARK-48355][SQL][TESTS][FOLLOWUP] Disable a test case failing on non-ANSI 
mode

### What changes were proposed in this pull request?

This PR is a follow-up of https://github.com/apache/spark/pull/47672 to 
disable a test case failing on non-ANSI mode.

### Why are the changes needed?

To recover non-ANSI CI.

- https://github.com/apache/spark/actions/workflows/build_non_ansi.yml

### Does this PR introduce _any_ user-facing change?

No, this is a test-only change.

### How was this patch tested?

Manual review.

```
$ SPARK_ANSI_SQL_MODE=false build/sbt "sql/testOnly 
*.SqlScriptingInterpreterSuite"
...
[info] - simple case mismatched types !!! IGNORED !!!
[info] All tests passed.
[success] Total time: 24 s, completed Sep 14, 2024, 7:51:15 PM
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #48115 from dongjoon-hyun/SPARK-48355.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../org/apache/spark/sql/scripting/SqlScriptingInterpreterSuite.scala  | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/scripting/SqlScriptingInterpreterSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/scripting/SqlScriptingInterpreterSuite.scala
index 3fad99eba509..bc2adec5be3d 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/scripting/SqlScriptingInterpreterSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/scripting/SqlScriptingInterpreterSuite.scala
@@ -701,7 +701,8 @@ class SqlScriptingInterpreterSuite extends QueryTest with 
SharedSparkSession {
 verifySqlScriptResult(commands, expected)
   }
 
-  test("simple case mismatched types") {
+  // This is disabled because it fails in non-ANSI mode
+  ignore("simple case mismatched types") {
 val commands =
   """
 |BEGIN


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: Revert "[SPARK-49531][PYTHON][CONNECT] Support line plot with plotly backend"

2024-09-14 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new fa6a0786bb4b Revert "[SPARK-49531][PYTHON][CONNECT] Support line plot 
with plotly backend"
fa6a0786bb4b is described below

commit fa6a0786bb4b23a895e68a721df9ee88684c4fab
Author: Dongjoon Hyun 
AuthorDate: Sat Sep 14 17:57:35 2024 -0700

Revert "[SPARK-49531][PYTHON][CONNECT] Support line plot with plotly 
backend"

This reverts commit 3b8dddac65bce6f88f51e23e777d521d65fa3373.
---
 dev/sparktestsupport/modules.py|   4 -
 python/pyspark/errors/error-conditions.json|   5 -
 python/pyspark/sql/classic/dataframe.py|   5 -
 python/pyspark/sql/connect/dataframe.py|   5 -
 python/pyspark/sql/dataframe.py|  27 -
 python/pyspark/sql/plot/__init__.py|  21 
 python/pyspark/sql/plot/core.py| 135 -
 python/pyspark/sql/plot/plotly.py  |  30 -
 .../sql/tests/connect/test_parity_frame_plot.py|  36 --
 .../tests/connect/test_parity_frame_plot_plotly.py |  36 --
 python/pyspark/sql/tests/plot/__init__.py  |  16 ---
 python/pyspark/sql/tests/plot/test_frame_plot.py   |  79 
 .../sql/tests/plot/test_frame_plot_plotly.py   |  64 --
 python/pyspark/sql/utils.py|  17 ---
 python/pyspark/testing/sqlutils.py |   7 --
 .../org/apache/spark/sql/internal/SQLConf.scala|  27 -
 16 files changed, 514 deletions(-)

diff --git a/dev/sparktestsupport/modules.py b/dev/sparktestsupport/modules.py
index b9a4bed715f6..34fbb8450d54 100644
--- a/dev/sparktestsupport/modules.py
+++ b/dev/sparktestsupport/modules.py
@@ -548,8 +548,6 @@ pyspark_sql = Module(
 "pyspark.sql.tests.test_udtf",
 "pyspark.sql.tests.test_utils",
 "pyspark.sql.tests.test_resources",
-"pyspark.sql.tests.plot.test_frame_plot",
-"pyspark.sql.tests.plot.test_frame_plot_plotly",
 ],
 )
 
@@ -1053,8 +1051,6 @@ pyspark_connect = Module(
 "pyspark.sql.tests.connect.test_parity_arrow_cogrouped_map",
 "pyspark.sql.tests.connect.test_parity_python_datasource",
 "pyspark.sql.tests.connect.test_parity_python_streaming_datasource",
-"pyspark.sql.tests.connect.test_parity_frame_plot",
-"pyspark.sql.tests.connect.test_parity_frame_plot_plotly",
 "pyspark.sql.tests.connect.test_utils",
 "pyspark.sql.tests.connect.client.test_artifact",
 "pyspark.sql.tests.connect.client.test_artifact_localcluster",
diff --git a/python/pyspark/errors/error-conditions.json 
b/python/pyspark/errors/error-conditions.json
index 92aeb15e21d1..4061d024a83c 100644
--- a/python/pyspark/errors/error-conditions.json
+++ b/python/pyspark/errors/error-conditions.json
@@ -1088,11 +1088,6 @@
   "Function `` should use only POSITIONAL or POSITIONAL OR 
KEYWORD arguments."
 ]
   },
-  "UNSUPPORTED_PLOT_BACKEND": {
-"message": [
-  "`` is not supported, it should be one of the values from 
"
-]
-  },
   "UNSUPPORTED_SIGNATURE": {
 "message": [
   "Unsupported signature: ."
diff --git a/python/pyspark/sql/classic/dataframe.py 
b/python/pyspark/sql/classic/dataframe.py
index d174f7774cc5..91b959162590 100644
--- a/python/pyspark/sql/classic/dataframe.py
+++ b/python/pyspark/sql/classic/dataframe.py
@@ -58,7 +58,6 @@ from pyspark.sql.column import Column
 from pyspark.sql.classic.column import _to_seq, _to_list, _to_java_column
 from pyspark.sql.readwriter import DataFrameWriter, DataFrameWriterV2
 from pyspark.sql.merge import MergeIntoWriter
-from pyspark.sql.plot import PySparkPlotAccessor
 from pyspark.sql.streaming import DataStreamWriter
 from pyspark.sql.types import (
 StructType,
@@ -1863,10 +1862,6 @@ class DataFrame(ParentDataFrame, PandasMapOpsMixin, 
PandasConversionMixin):
 messageParameters={"member": "queryExecution"},
 )
 
-@property
-def plot(self) -> PySparkPlotAccessor:
-return PySparkPlotAccessor(self)
-
 
 class DataFrameNaFunctions(ParentDataFrameNaFunctions):
 def __init__(self, df: ParentDataFrame):
diff --git a/python/pyspark/sql/connect/dataframe.py 
b/python/pyspark/sql/connect/dataframe.py
index e3b1d35b2d5d..768abd655d49 100644
--- a/python/pyspark/sql/connect/dataframe.py
+++ b/python/pyspark/sql/connect/dataframe.py
@@ -83,7 +83,6 @@ from pyspark.sql.connect.expressions import (
 UnresolvedStar,
 )
 from pyspark.sql.connect.functions import builtin as F
-from pyspark.

(spark) branch master updated: [SPARK-49649][DOCS] Make `docs/index.md` up-to-date for 4.0.0

2024-09-13 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2250b35be6a2 [SPARK-49649][DOCS] Make `docs/index.md` up-to-date for 
4.0.0
2250b35be6a2 is described below

commit 2250b35be6a24c777d6fa82b1c6a7a10a6854895
Author: Dongjoon Hyun 
AuthorDate: Fri Sep 13 20:49:01 2024 -0700

[SPARK-49649][DOCS] Make `docs/index.md` up-to-date for 4.0.0

### What changes were proposed in this pull request?

This PR aims to update Spark documentation landing page (`docs/index.md`) 
for Apache Spark 4.0.0-preview2 release.

### Why are the changes needed?

- [SPARK-45314 Drop Scala 2.12 and make Scala 2.13 by 
default](https://issues.apache.org/jira/browse/SPARK-45314)
- #46228
- #47842
- [SPARK-45923 Spark Kubernetes 
Operator](https://issues.apache.org/jira/browse/SPARK-45923)

### Does this PR introduce _any_ user-facing change?

No because this is a documentation-only change.

### How was this patch tested?

Manual review.

https://github.com/user-attachments/assets/bdbd0e61-d71a-41ca-aa1b-1b0805813a45";>

https://github.com/user-attachments/assets/e13a6bba-2149-48fa-983d-c5399defdc70";>

https://github.com/user-attachments/assets/721c7760-bc2e-444c-9209-174e3119c2b4";>

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #48113 from dongjoon-hyun/SPARK-49649.

    Authored-by: Dongjoon Hyun 
    Signed-off-by: Dongjoon Hyun 
---
 docs/index.md | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/docs/index.md b/docs/index.md
index 7e57eddb6da8..fea62865e216 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -34,9 +34,8 @@ source, visit [Building Spark](building-spark.html).
 
 Spark runs on both Windows and UNIX-like systems (e.g. Linux, Mac OS), and it 
should run on any platform that runs a supported version of Java. This should 
include JVMs on x86_64 and ARM64. It's easy to run locally on one machine --- 
all you need is to have `java` installed on your system `PATH`, or the 
`JAVA_HOME` environment variable pointing to a Java installation.
 
-Spark runs on Java 17/21, Scala 2.13, Python 3.8+, and R 3.5+.
-When using the Scala API, it is necessary for applications to use the same 
version of Scala that Spark was compiled for.
-For example, when using Scala 2.13, use Spark compiled for 2.13, and compile 
code/applications for Scala 2.13 as well.
+Spark runs on Java 17/21, Scala 2.13, Python 3.9+, and R 3.5+ (Deprecated).
+When using the Scala API, it is necessary for applications to use the same 
version of Scala that Spark was compiled for. Since Spark 4.0.0, it's Scala 
2.13.
 
 # Running the Examples and Shell
 
@@ -110,7 +109,7 @@ options for deployment:
 * [Spark Streaming](streaming-programming-guide.html): processing data streams 
using DStreams (old API)
 * [MLlib](ml-guide.html): applying machine learning algorithms
 * [GraphX](graphx-programming-guide.html): processing graphs
-* [SparkR](sparkr.html): processing data with Spark in R
+* [SparkR (Deprecated)](sparkr.html): processing data with Spark in R
 * [PySpark](api/python/getting_started/index.html): processing data with Spark 
in Python
 * [Spark SQL CLI](sql-distributed-sql-engine-spark-sql-cli.html): processing 
data with SQL on the command line
 
@@ -128,10 +127,13 @@ options for deployment:
 * [Cluster Overview](cluster-overview.html): overview of concepts and 
components when running on a cluster
 * [Submitting Applications](submitting-applications.html): packaging and 
deploying applications
 * Deployment modes:
-  * [Amazon EC2](https://github.com/amplab/spark-ec2): scripts that let you 
launch a cluster on EC2 in about 5 minutes
   * [Standalone Deploy Mode](spark-standalone.html): launch a standalone 
cluster quickly without a third-party cluster manager
   * [YARN](running-on-yarn.html): deploy Spark on top of Hadoop NextGen (YARN)
-  * [Kubernetes](running-on-kubernetes.html): deploy Spark on top of Kubernetes
+  * [Kubernetes](running-on-kubernetes.html): deploy Spark apps on top of 
Kubernetes directly
+  * [Amazon EC2](https://github.com/amplab/spark-ec2): scripts that let you 
launch a cluster on EC2 in about 5 minutes
+* [Spark Kubernetes 
Operator](https://github.com/apache/spark-kubernetes-operator):
+  * 
[SparkApp](https://github.com/apache/spark-kubernetes-operator/blob/main/examples/pyspark-pi.yaml):
 deploy Spark apps on top of Kubernetes via [operator 
patterns](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/)
+  * 
[SparkCluster](https://github.com/apache/spark-kubernetes-operator/blob/main/examples/cluster-with-template.yaml):
 deploy Spark clusters on top of Kubernetes via [operator 
pa

(spark) branch master updated: [SPARK-49648][DOCS] Update `Configuring Ports for Network Security` section with JWS

2024-09-13 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new df0e34c5a1c3 [SPARK-49648][DOCS] Update `Configuring Ports for Network 
Security` section with JWS
df0e34c5a1c3 is described below

commit df0e34c5a1c30956cb16e8af5569ed72387b6fc3
Author: Dongjoon Hyun 
AuthorDate: Fri Sep 13 18:09:48 2024 -0700

[SPARK-49648][DOCS] Update `Configuring Ports for Network Security` section 
with JWS

### What changes were proposed in this pull request?

This PR aims to update `Configuring Ports for Network Security` section of 
`Security` page with new JWS feature.

### Why are the changes needed?

In addition to the existing restriction, Spark 4 can take advantage of new 
JWS feature. This PR informs it more clearly.


https://github.com/apache/spark/blob/08a26bb56cfb48f27c68a79be1e15bc4c9e466e0/docs/security.md?plain=1#L811-L814

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual review.

https://github.com/user-attachments/assets/2250e65b-cddd-4541-b42f-5284d5ce4b02";>

https://github.com/user-attachments/assets/0c853380-081a-41a3-b66b-7774ec62fd3e";>

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #48112 from dongjoon-hyun/SPARK-49648.

    Authored-by: Dongjoon Hyun 
    Signed-off-by: Dongjoon Hyun 
---
 docs/security.md | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/docs/security.md b/docs/security.md
index a8f4e4ec5389..b97abfeacf24 100644
--- a/docs/security.md
+++ b/docs/security.md
@@ -55,7 +55,8 @@ To enable authorization, Spark Master should have
 `spark.master.rest.filters=org.apache.spark.ui.JWSFilter` and
 `spark.org.apache.spark.ui.JWSFilter.param.secretKey=BASE64URL-ENCODED-KEY` 
configurations, and
 client should provide HTTP `Authorization` header which contains JSON Web 
Token signed by
-the shared secret key.
+the shared secret key. Please note that this feature requires a Spark 
distribution built with
+`jjwt` profile.
 
 ### YARN
 
@@ -813,6 +814,12 @@ They are generally private services, and should only be 
accessible within the ne
 organization that deploys Spark. Access to the hosts and ports used by Spark 
services should
 be limited to origin hosts that need to access the services.
 
+However, like the REST Submission port, Spark also supports HTTP 
`Authorization` header
+with a cryptographically signed JSON Web Token (JWT) for all UI ports.
+To use it, a user needs the Spark distribution built with `jjwt` profile and 
to configure
+`spark.ui.filters=org.apache.spark.ui.JWSFilter` and
+`spark.org.apache.spark.ui.JWSFilter.param.secretKey=BASE64URL-ENCODED-KEY`.
+
 Below are the primary ports that Spark uses for its communication and how to
 configure those ports.
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (08a26bb56cfb -> d3eb99f79e50)

2024-09-13 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 08a26bb56cfb [SPARK-48779][SQL][TESTS] Improve collation support 
testing - add golden files
 add d3eb99f79e50 [SPARK-49647][TESTS] Change SharedSparkContext so that 
its SparkConf loads defaults

No new revisions were added by this update.

Summary of changes:
 core/src/test/scala/org/apache/spark/SharedSparkContext.scala | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark-kubernetes-operator) branch main updated: [SPARK-49645] Update `e2e/python/chainsaw-test.yaml` to use non-R image

2024-09-13 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git


The following commit(s) were added to refs/heads/main by this push:
 new b2adda7  [SPARK-49645] Update `e2e/python/chainsaw-test.yaml` to use 
non-R image
b2adda7 is described below

commit b2adda7d18ca05ea4da1161d13a1fcf26d98f5d1
Author: Dongjoon Hyun 
AuthorDate: Fri Sep 13 09:57:52 2024 -0700

[SPARK-49645] Update `e2e/python/chainsaw-test.yaml` to use non-R image

### What changes were proposed in this pull request?

This PR aims to update `e2e/python/chainsaw-test.yaml` to use non-R image.

This is the only instance we have.
```
$ git grep '\-r-'
tests/e2e/python/chainsaw-test.yaml:  value: 
'spark:3.5.2-scala2.12-java17-python3-r-ubuntu'
```

### Why are the changes needed?

New image is 36% smaller.
```
$ docker images | grep 3.5.2
spark  3.5.2-scala2.12-java17-python3-r-ubuntu   16362acf4adb   4 weeks ago 
1.52GB
spark  3.5.2-scala2.12-java17-python3-ubuntu a79b6b6ef9a4   4 weeks ago 
985MB
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

    Closes #123 from dongjoon-hyun/SPARK-49645.
    
Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 tests/e2e/python/chainsaw-test.yaml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/e2e/python/chainsaw-test.yaml 
b/tests/e2e/python/chainsaw-test.yaml
index ede8d73..4147d2f 100644
--- a/tests/e2e/python/chainsaw-test.yaml
+++ b/tests/e2e/python/chainsaw-test.yaml
@@ -27,7 +27,7 @@ spec:
 - name: "SCALA_VERSION"
   value: "2.12"
 - name: "IMAGE"
-  value: 'spark:3.5.2-scala2.12-java17-python3-r-ubuntu'
+  value: 'spark:3.5.2-scala2.12-java17-python3-ubuntu'
   steps:
 - name: install-spark-application
   try:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-49234][BUILD][FOLLOWUP] Add `LICENSE-xz.txt` to `licenses-binary` folder

2024-09-13 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f92e9489fb23 [SPARK-49234][BUILD][FOLLOWUP] Add `LICENSE-xz.txt` to 
`licenses-binary` folder
f92e9489fb23 is described below

commit f92e9489fb23a85195067cd0f0f5cd9e9d00b138
Author: Dongjoon Hyun 
AuthorDate: Fri Sep 13 09:45:46 2024 -0700

[SPARK-49234][BUILD][FOLLOWUP] Add `LICENSE-xz.txt` to `licenses-binary` 
folder

### What changes were proposed in this pull request?

This PR aims to add `LICENSE-xz.txt` to `licenses-binary` folder.

### Why are the changes needed?

To provide the license properly.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #48107 from dongjoon-hyun/SPARK-49234-2.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 licenses-binary/LICENSE-xz.txt | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/licenses-binary/LICENSE-xz.txt b/licenses-binary/LICENSE-xz.txt
new file mode 100644
index ..4322122aecf1
--- /dev/null
+++ b/licenses-binary/LICENSE-xz.txt
@@ -0,0 +1,11 @@
+Permission to use, copy, modify, and/or distribute this
+software for any purpose with or without fee is hereby granted.
+
+THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL
+WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED
+WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL
+THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR
+CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM
+LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT,
+NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
+CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-49624][BUILD] Upgrade `aircompressor` to 2.0.2

2024-09-12 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e7cf246fb763 [SPARK-49624][BUILD] Upgrade `aircompressor` to 2.0.2
e7cf246fb763 is described below

commit e7cf246fb7635ef7b95c18b7958bcadae00aa281
Author: panbingkun 
AuthorDate: Thu Sep 12 20:52:11 2024 -0700

[SPARK-49624][BUILD] Upgrade `aircompressor` to 2.0.2

### What changes were proposed in this pull request?
The pr aims to upgrade `aircompressor` from `0.27` to `2.0.2`.

### Why are the changes needed?
https://github.com/airlift/aircompressor/releases/tag/2.0
(ps: 2.0.2 was built against `JDK 1.8`).

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #48098 from panbingkun/aircompressor_2.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 2db86ed229a0..e1ac039f2546 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -4,7 +4,7 @@ JTransforms/3.1//JTransforms-3.1.jar
 RoaringBitmap/1.2.1//RoaringBitmap-1.2.1.jar
 ST4/4.0.4//ST4-4.0.4.jar
 activation/1.1.1//activation-1.1.1.jar
-aircompressor/0.27//aircompressor-0.27.jar
+aircompressor/2.0.2//aircompressor-2.0.2.jar
 algebra_2.13/2.8.0//algebra_2.13-2.8.0.jar
 aliyun-java-sdk-core/4.5.10//aliyun-java-sdk-core-4.5.10.jar
 aliyun-java-sdk-kms/2.11.0//aliyun-java-sdk-kms-2.11.0.jar
diff --git a/pom.xml b/pom.xml
index b1497c782685..b9f28eb61925 100644
--- a/pom.xml
+++ b/pom.xml
@@ -2634,7 +2634,7 @@
   
 io.airlift
 aircompressor
-0.27
+2.0.2
   
   
 org.apache.orc


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-43354][PYTHON][TESTS] Re-enable `test_create_dataframe_from_pandas_with_day_time_interval` in PyPy3.9

2024-09-12 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 61814876b26c [SPARK-43354][PYTHON][TESTS] Re-enable 
`test_create_dataframe_from_pandas_with_day_time_interval` in PyPy3.9
61814876b26c is described below

commit 61814876b26c6fef2dc8238b1aeb0594d9a24472
Author: Dongjoon Hyun 
AuthorDate: Thu Sep 12 20:49:16 2024 -0700

[SPARK-43354][PYTHON][TESTS] Re-enable 
`test_create_dataframe_from_pandas_with_day_time_interval` in PyPy3.9

### What changes were proposed in this pull request?

This PR aims to re-enable 
`test_create_dataframe_from_pandas_with_day_time_interval` in PyPy3.9.

### Why are the changes needed?

This was disabled at PyPy3.8, but we dropped Python 3.8 support and the 
test passed with PyPy3.9.
- #46228

**BEFORE: Skipped with `Fails in PyPy Python 3.8, should enable.` message**
```
$ python/run-tests.py --python-executables pypy3 --testnames 
pyspark.sql.tests.test_creation
Running PySpark tests. Output is in 
/Users/dongjoon/APACHE/spark-merge/python/unit-tests.log
Will test against the following Python executables: ['pypy3']
Will test the following Python tests: ['pyspark.sql.tests.test_creation']
pypy3 python_implementation is PyPy
pypy3 version is: Python 3.9.19 (a2113ea87262, Apr 21 2024, 05:41:07)
[PyPy 7.3.16 with GCC Apple LLVM 15.0.0 (clang-1500.1.0.2.5)]
Starting test(pypy3): pyspark.sql.tests.test_creation (temp output: 
/Users/dongjoon/APACHE/spark-merge/python/target/58e26724-5c3e-4451-80f8-cabdb36f0901/pypy3__pyspark.sql.tests.test_creation__n448ay57.log)
Finished test(pypy3): pyspark.sql.tests.test_creation (6s) ... 3 tests were 
skipped
Tests passed in 6 seconds

Skipped tests in pyspark.sql.tests.test_creation with pypy3:
test_create_dataframe_from_pandas_with_day_time_interval 
(pyspark.sql.tests.test_creation.DataFrameCreationTests) ... skipped 'Fails in 
PyPy Python 3.8, should enable.'
test_create_dataframe_required_pandas_not_found 
(pyspark.sql.tests.test_creation.DataFrameCreationTests) ... skipped 'Required 
Pandas was found.'
test_schema_inference_from_pandas_with_dict 
(pyspark.sql.tests.test_creation.DataFrameCreationTests) ... skipped 
'[PACKAGE_NOT_INSTALLED] PyArrow >= 10.0.0 must be installed; however, it was 
not found.'
```

**AFTER**
```
$ python/run-tests.py --python-executables pypy3 --testnames 
pyspark.sql.tests.test_creation
    Running PySpark tests. Output is in 
/Users/dongjoon/APACHE/spark-merge/python/unit-tests.log
Will test against the following Python executables: ['pypy3']
Will test the following Python tests: ['pyspark.sql.tests.test_creation']
pypy3 python_implementation is PyPy
pypy3 version is: Python 3.9.19 (a2113ea87262, Apr 21 2024, 05:41:07)
[PyPy 7.3.16 with GCC Apple LLVM 15.0.0 (clang-1500.1.0.2.5)]
Starting test(pypy3): pyspark.sql.tests.test_creation (temp output: 
/Users/dongjoon/APACHE/spark-merge/python/target/1f0db01f-0beb-4ee2-817f-363eb2f2804d/pypy3__pyspark.sql.tests.test_creation__2w4gy9u1.log)
Finished test(pypy3): pyspark.sql.tests.test_creation (13s) ... 2 tests 
were skipped
Tests passed in 13 seconds

Skipped tests in pyspark.sql.tests.test_creation with pypy3:
test_create_dataframe_required_pandas_not_found 
(pyspark.sql.tests.test_creation.DataFrameCreationTests) ... skipped 'Required 
Pandas was found.'
test_schema_inference_from_pandas_with_dict 
(pyspark.sql.tests.test_creation.DataFrameCreationTests) ... skipped 
'[PACKAGE_NOT_INSTALLED] PyArrow >= 10.0.0 must be installed; however, it was 
not found.'
```

### Does this PR introduce _any_ user-facing change?

No, this is a test only change.

### How was this patch tested?

Manual tests with PyPy3.9.

### Was this patch authored or co-authored using generative AI tooling?
    
No.

Closes #48097 from dongjoon-hyun/SPARK-43354.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 python/pyspark/sql/tests/test_creation.py | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/python/pyspark/sql/tests/test_creation.py 
b/python/pyspark/sql/tests/test_creation.py
index dfe66cdd3edf..c6917aa234b4 100644
--- a/python/pyspark/sql/tests/test_creation.py
+++ b/python/pyspark/sql/tests/test_creation.py
@@ -15,7 +15,6 @@
 # limitations under the License.
 #
 
-import platform
 from decimal import Decimal
 import os
 import time
@@ -111,11 +110,7 @@ class DataFrameCreationTestsMixin:
 os.environ["TZ"] = orig_env_tz
 time.tzset()
 
-# TO

(spark) branch master updated (f69b518446e2 -> 23e61f6b1845)

2024-09-12 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from f69b518446e2 [SPARK-49620][INFRA] Fix `spark-rm` and `infra` docker 
files to create `pypy3.9` links
 add 23e61f6b1845 [SPARK-49621][SQL][TESTS] Remove the flaky `EXEC 
IMMEDIATE STACK OVERFLOW` test case

No new revisions were added by this update.

Summary of changes:
 .../execution/ExecuteImmediateEndToEndSuite.scala  | 27 --
 1 file changed, 27 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-49620][INFRA] Fix `spark-rm` and `infra` docker files to create `pypy3.9` links

2024-09-12 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f69b518446e2 [SPARK-49620][INFRA] Fix `spark-rm` and `infra` docker 
files to create `pypy3.9` links
f69b518446e2 is described below

commit f69b518446e2f18fccdad3e1c23792bbee20f3f5
Author: Dongjoon Hyun 
AuthorDate: Thu Sep 12 20:43:33 2024 -0700

[SPARK-49620][INFRA] Fix `spark-rm` and `infra` docker files to create 
`pypy3.9` links

### What changes were proposed in this pull request?

This PR aims to fix two Dockerfiles to create `pypy3.9` symlinks instead of 
`pypy3.8`.


https://github.com/apache/spark/blob/d2d293e3fb57d6c9dea084b5fe6707d67c715af3/dev/create-release/spark-rm/Dockerfile#L97


https://github.com/apache/spark/blob/d2d293e3fb57d6c9dea084b5fe6707d67c715af3/dev/infra/Dockerfile#L91

### Why are the changes needed?

Apache Spark 4.0 dropped `Python 3.8` support. We should make it sure that 
we don't use `pypy3.8` at all.
- #46228

### Does this PR introduce _any_ user-facing change?

No. This is a dev-only change.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #48095 from dongjoon-hyun/SPARK-49620.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 dev/create-release/spark-rm/Dockerfile | 2 +-
 dev/infra/Dockerfile   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/create-release/spark-rm/Dockerfile 
b/dev/create-release/spark-rm/Dockerfile
index e86b91968bf8..e7f558b523d0 100644
--- a/dev/create-release/spark-rm/Dockerfile
+++ b/dev/create-release/spark-rm/Dockerfile
@@ -94,7 +94,7 @@ ENV R_LIBS_SITE 
"/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library
 RUN add-apt-repository ppa:pypy/ppa
 RUN mkdir -p /usr/local/pypy/pypy3.9 && \
 curl -sqL 
https://downloads.python.org/pypy/pypy3.9-v7.3.16-linux64.tar.bz2 | tar xjf - 
-C /usr/local/pypy/pypy3.9 --strip-components=1 && \
-ln -sf /usr/local/pypy/pypy3.9/bin/pypy /usr/local/bin/pypy3.8 && \
+ln -sf /usr/local/pypy/pypy3.9/bin/pypy /usr/local/bin/pypy3.9 && \
 ln -sf /usr/local/pypy/pypy3.9/bin/pypy /usr/local/bin/pypy3
 RUN curl -sS https://bootstrap.pypa.io/get-pip.py | pypy3
 RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas==2.2.2' scipy coverage 
matplotlib lxml
diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile
index ce4736299928..5939e429b2f3 100644
--- a/dev/infra/Dockerfile
+++ b/dev/infra/Dockerfile
@@ -88,7 +88,7 @@ ENV R_LIBS_SITE 
"/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library
 RUN add-apt-repository ppa:pypy/ppa
 RUN mkdir -p /usr/local/pypy/pypy3.9 && \
 curl -sqL 
https://downloads.python.org/pypy/pypy3.9-v7.3.16-linux64.tar.bz2 | tar xjf - 
-C /usr/local/pypy/pypy3.9 --strip-components=1 && \
-ln -sf /usr/local/pypy/pypy3.9/bin/pypy /usr/local/bin/pypy3.8 && \
+ln -sf /usr/local/pypy/pypy3.9/bin/pypy /usr/local/bin/pypy3.9 && \
 ln -sf /usr/local/pypy/pypy3.9/bin/pypy /usr/local/bin/pypy3
 RUN curl -sS https://bootstrap.pypa.io/get-pip.py | pypy3
 RUN pypy3 -m pip install 'numpy==1.26.4' 'six==1.16.0' 'pandas==2.2.2' scipy 
coverage matplotlib lxml


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark-kubernetes-operator) branch main updated: [SPARK-49625] Add `SparkCluster` state transition test

2024-09-12 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git


The following commit(s) were added to refs/heads/main by this push:
 new 07ff073  [SPARK-49625] Add `SparkCluster` state transition test
07ff073 is described below

commit 07ff073b358e4f3b70c66c629f17d79625e534d3
Author: Qi Tan 
AuthorDate: Thu Sep 12 20:42:23 2024 -0700

[SPARK-49625] Add `SparkCluster` state transition test

### What changes were proposed in this pull request?
Add e2e for Spark Cluster Happy State Transition Test

### Why are the changes needed?
To simulate real user experiences

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
test locally

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #122 from TQJADE/spark-cluster-state-transtition.

Authored-by: Qi Tan 
Signed-off-by: Dongjoon Hyun 
---
 .../spark-cluster-state-transition.yaml| 27 
 tests/e2e/state-transition/chainsaw-test.yaml  | 37 +++---
 .../spark-cluster-example-succeeded.yaml   | 33 +++
 3 files changed, 93 insertions(+), 4 deletions(-)

diff --git 
a/tests/e2e/assertions/spark-cluster/spark-cluster-state-transition.yaml 
b/tests/e2e/assertions/spark-cluster/spark-cluster-state-transition.yaml
new file mode 100644
index 000..4194583
--- /dev/null
+++ b/tests/e2e/assertions/spark-cluster/spark-cluster-state-transition.yaml
@@ -0,0 +1,27 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+apiVersion: spark.apache.org/v1alpha1
+kind: SparkCluster
+metadata:
+  name: spark-cluster-succeeded-test
+  namespace: ($SPARK_APP_NAMESPACE)
+status:
+  stateTransitionHistory:
+(*.currentStateSummary):
+  - "Submitted"
+  - "RunningHealthy"
diff --git a/tests/e2e/state-transition/chainsaw-test.yaml 
b/tests/e2e/state-transition/chainsaw-test.yaml
index 46f5c1f..6e95b69 100644
--- a/tests/e2e/state-transition/chainsaw-test.yaml
+++ b/tests/e2e/state-transition/chainsaw-test.yaml
@@ -18,22 +18,26 @@
 apiVersion: chainsaw.kyverno.io/v1alpha1
 kind: Test
 metadata:
-  name: spark-operator-spark-application-state-transition-validation
+  name: spark-operator-state-transition-validation
 spec:
   scenarios:
   - bindings:
   - name: TEST_NAME
 value: succeeded
-  - name: FILE_NAME
+  - name: APPLICATION_FILE_NAME
 value: spark-example-succeeded.yaml
+  - name: CLUSTER_FILE_NAME
+value: spark-cluster-example-succeeded.yaml
   - name: SPARK_APPLICATION_NAME
 value: spark-job-succeeded-test
+  - name: SPARK_CLUSTER_NAME
+value: spark-cluster-succeeded-test
   steps:
   - try:
 - script:
 env:
   - name: FILE_NAME
-value: ($FILE_NAME)
+value: ($APPLICATION_FILE_NAME)
 content: kubectl apply -f $FILE_NAME
 - assert:
 bindings:
@@ -53,4 +57,29 @@ spec:
 value: ($SPARK_APPLICATION_NAME)
 timeout: 30s
 content: |
-  kubectl delete sparkapplication $SPARK_APPLICATION_NAME
\ No newline at end of file
+  kubectl delete sparkapplication $SPARK_APPLICATION_NAME
+  - try:
+- script:
+env:
+  - name: FILE_NAME
+value: ($CLUSTER_FILE_NAME)
+content: kubectl apply -f $FILE_NAME
+- assert:
+bindings:
+  - name: SPARK_APP_NAMESPACE
+value: default
+timeout: 60s
+file: "../assertions/spark-cluster/spark-cluster-state-transition.yaml"
+catch:
+- describe:
+apiVersion: spark.apache.org/v1alpha1
+kind: SparkCluster
+namespace: default
+finally:
+  - script:
+  env:
+- name: SPARK_CLUSTER_NAME
+  value: ($SPARK_CLUSTER_NAME)
+  timeout: 30s
+  content: |
+kubectl delete sparkcluster $SPARK_CLUSTER_NAME
diff --git a/tests/e2e/state-transition/spark-cluster-example-succeeded.yaml 
b/tests/e2e/sta

(spark-kubernetes-operator) branch main updated: [SPARK-49623] Refactor prefix `appResources` to `workloadResources`

2024-09-12 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git


The following commit(s) were added to refs/heads/main by this push:
 new 9400fe3  [SPARK-49623] Refactor prefix `appResources` to 
`workloadResources`
9400fe3 is described below

commit 9400fe3bdaac2207a930d8ec0d25e90f0486b030
Author: zhou-jiang 
AuthorDate: Thu Sep 12 17:11:38 2024 -0700

[SPARK-49623] Refactor prefix `appResources` to `workloadResources`

### What changes were proposed in this pull request?

This PR refactors previous `appResources` to `workloadResources` for helm 
chart / template / values to avoid confusion.

### Why are the changes needed?

In operator helm chart, prefix / group `appResources` was introduced to 
indicate the resources are created for Spark workloads whereas other resources 
are serving the operator deployment itself.

As we extend the support to cover `SparkCluster`, the naming becomes 
confusing as the resources in fact serve both `SparkApp` and `SparkCluster`. 
Thus, this PR aims to refactor the naming prefix to cover the new resource in a 
more general fashion.

### Does this PR introduce _any_ user-facing change?

No - not yet released

### How was this patch tested?

CI + Helm chart test.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #121 from jiangzho/helm.

Authored-by: zhou-jiang 
Signed-off-by: Dongjoon Hyun 
---
 .../templates/_helpers.tpl |  12 +-
 .../templates/app-rbac.yaml| 176 -
 .../templates/operator-rbac.yaml   |  10 +-
 .../templates/tests/test-rbac.yaml |   5 +-
 .../templates/workload-rbac.yaml   | 176 +
 .../helm/spark-kubernetes-operator/values.yaml |  43 ++---
 tests/e2e/helm/dynamic-config-values.yaml  |   4 +-
 7 files changed, 214 insertions(+), 212 deletions(-)

diff --git a/build-tools/helm/spark-kubernetes-operator/templates/_helpers.tpl 
b/build-tools/helm/spark-kubernetes-operator/templates/_helpers.tpl
index e8ee901..587c833 100644
--- a/build-tools/helm/spark-kubernetes-operator/templates/_helpers.tpl
+++ b/build-tools/helm/spark-kubernetes-operator/templates/_helpers.tpl
@@ -94,11 +94,11 @@ Create the path of the operator image to use
 {{- end }}
 
 {{/*
-List of Spark app namespaces. If not provied in values, use the same namespace 
as operator
+List of Spark workload namespaces. If not provied in values, use the same 
namespace as operator
 */}}
-{{- define "spark-operator.appNamespacesStr" -}}
-{{- if index (.Values.appResources.namespaces) "data" }}
-{{- $ns_list := join "," .Values.appResources.namespaces.data }}
+{{- define "spark-operator.workloadNamespacesStr" -}}
+{{- if index (.Values.workloadResources.namespaces) "data" }}
+{{- $ns_list := join "," .Values.workloadResources.namespaces.data }}
 {{- printf "%s" $ns_list }}
 {{- else }}
 {{- printf "%s" .Release.Namespace }}
@@ -113,8 +113,8 @@ Default property overrides
 spark.kubernetes.operator.namespace={{ .Release.Namespace }}
 spark.kubernetes.operator.name={{- include "spark-operator.name" . }}
 spark.kubernetes.operator.dynamicConfig.enabled={{ 
.Values.operatorConfiguration.dynamicConfig.enable }}
-{{- if .Values.appResources.namespaces.overrideWatchedNamespaces }}
-spark.kubernetes.operator.watchedNamespaces={{ include 
"spark-operator.appNamespacesStr" . | trim }}
+{{- if .Values.workloadResources.namespaces.overrideWatchedNamespaces }}
+spark.kubernetes.operator.watchedNamespaces={{ include 
"spark-operator.workloadNamespacesStr" . | trim }}
 {{- end }}
 {{- end }}
 
diff --git a/build-tools/helm/spark-kubernetes-operator/templates/app-rbac.yaml 
b/build-tools/helm/spark-kubernetes-operator/templates/app-rbac.yaml
deleted file mode 100644
index 1ae62d9..000
--- a/build-tools/helm/spark-kubernetes-operator/templates/app-rbac.yaml
+++ /dev/null
@@ -1,176 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See th

(spark) branch master updated: [SPARK-49081][SQL][DOCS] Add data source options docs of `Protobuf`

2024-09-12 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 875def391665 [SPARK-49081][SQL][DOCS] Add data source options docs of 
`Protobuf`
875def391665 is described below

commit 875def39166549f9de54b141f5397cb3f74a918e
Author: Wei Guo 
AuthorDate: Thu Sep 12 16:26:33 2024 -0700

[SPARK-49081][SQL][DOCS] Add data source options docs of `Protobuf`

### What changes were proposed in this pull request?

This PR aims to add data source options docs of `Protobuf` data source.  
Other data sources such as `csv`, `json` have corresponding options documents.

The document section appears as follows:
https://github.com/user-attachments/assets/6f40a69b-1350-4b6b-9a1e-d780fcabb9f1";>

https://github.com/user-attachments/assets/80402560-474b-4608-be51-0a98d9324109";>

### Why are the changes needed?

In order to facilitate Spark users to better understand and use the options 
of `Protobuf` data source.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass GA and local manual check with `SKIP_API=1 bundle exec jekyll build 
--watch`.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #47570 from wayneguow/pb_docs.

Authored-by: Wei Guo 
    Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/protobuf/utils/ProtobufOptions.scala | 12 ++--
 docs/sql-data-sources-protobuf.md  | 67 +-
 2 files changed, 72 insertions(+), 7 deletions(-)

diff --git 
a/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala
 
b/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala
index 6644bce98293..e85097a272f2 100644
--- 
a/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala
+++ 
b/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala
@@ -43,8 +43,8 @@ private[sql] class ProtobufOptions(
 
   /**
* Adds support for recursive fields. If this option is is not specified, 
recursive fields are
-   * not permitted. Setting it to 0 drops the recursive fields, 1 allows it to 
be recursed once,
-   * and 2 allows it to be recursed twice and so on, up to 10. Values larger 
than 10 are not
+   * not permitted. Setting it to 1 drops the recursive fields, 0 allows it to 
be recursed once,
+   * and 3 allows it to be recursed twice and so on, up to 10. Values larger 
than 10 are not
* allowed in order avoid inadvertently creating very large schemas. If a 
Protobuf message
* has depth beyond this limit, the Spark struct returned is truncated after 
the recursion limit.
*
@@ -52,8 +52,8 @@ private[sql] class ProtobufOptions(
*   `message Person { string name = 1; Person friend = 2; }`
* The following lists the schema with different values for this setting.
*  1:  `struct`
-   *  2:  `struct>`
-   *  3:  `struct>>`
+   *  2:  `struct>`
+   *  3:  `struct>>`
* and so on.
*/
   val recursiveFieldMaxDepth: Int = 
parameters.getOrElse("recursive.fields.max.depth", "-1").toInt
@@ -181,7 +181,7 @@ private[sql] class ProtobufOptions(
   val upcastUnsignedInts: Boolean =
 parameters.getOrElse("upcast.unsigned.ints", false.toString).toBoolean
 
-  // Whether to unwrap the struct representation for well known primitve 
wrapper types when
+  // Whether to unwrap the struct representation for well known primitive 
wrapper types when
   // deserializing. By default, the wrapper types for primitives (i.e. 
google.protobuf.Int32Value,
   // google.protobuf.Int64Value, etc.) will get deserialized as structs. We 
allow the option to
   // deserialize them as their respective primitives.
@@ -221,7 +221,7 @@ private[sql] class ProtobufOptions(
   // By default, in the spark schema field a will be dropped, which result in 
schema
   // b struct
   // If retain.empty.message.types=true, field a will be retained by inserting 
a dummy column.
-  // b struct, name: string>
+  // b struct, name: string>
   val retainEmptyMessage: Boolean =
 parameters.getOrElse("retain.empty.message.types", 
false.toString).toBoolean
 }
diff --git a/docs/sql-data-sources-protobuf.md 
b/docs/sql-data-sources-protobuf.md
index 34cb1d4997d2..4dd6579f92cd 100644
--- a/docs/sql-data-sources-protobuf.md
+++ b/docs/sql-data-sources-protobuf.md
@@ -434,4 +434,69 @@ message Person {
 
 ```
 
-
\ No newline at end of file
+
+
+## Data Source Option
+
+Data source options of Protobuf can be set via:
+* the built-in functions below
+  * `from_protobuf`
+  * `to_protobuf`
+
+
+  Property 
NameDefaultMeaningScope
+  
+ 

(spark-kubernetes-operator) branch main updated: [SPARK-49619] Upgrade Gradle to 8.10.1

2024-09-12 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git


The following commit(s) were added to refs/heads/main by this push:
 new 27a033c  [SPARK-49619] Upgrade Gradle to 8.10.1
27a033c is described below

commit 27a033ca3191c450509d5c9d16e2596512495b63
Author: Dongjoon Hyun 
AuthorDate: Thu Sep 12 13:57:54 2024 -0700

[SPARK-49619] Upgrade Gradle to 8.10.1

### What changes were proposed in this pull request?

This PR aims to upgrade `Gradle` to 8.10.1.

### Why are the changes needed?

To bring the following bug fixes.
- https://github.com/gradle/gradle/releases/tag/v8.10.1
  - https://github.com/gradle/gradle/issues/30239 Gradle 8.10 Significantly 
Slower Due to Dependency Resolution
  - https://github.com/gradle/gradle/issues/30272 Broken equals() contract 
for LifecycleAwareProject
  - https://github.com/gradle/gradle/issues/30385 Gradle should not 
validate isolated projects when isolated projects is disabled

### Does this PR introduce _any_ user-facing change?

No. This is a dev-only change.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #120 from dongjoon-hyun/SPARK-49619.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 build-tools/docker/Dockerfile| 2 +-
 gradle/wrapper/gradle-wrapper.properties | 4 ++--
 gradlew  | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/build-tools/docker/Dockerfile b/build-tools/docker/Dockerfile
index c6fabde..66dc6a3 100644
--- a/build-tools/docker/Dockerfile
+++ b/build-tools/docker/Dockerfile
@@ -14,7 +14,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
-FROM gradle:8.10.0-jdk17-jammy AS builder
+FROM gradle:8.10.1-jdk17-jammy AS builder
 WORKDIR /app
 COPY . .
 RUN ./gradlew clean build -x check
diff --git a/gradle/wrapper/gradle-wrapper.properties 
b/gradle/wrapper/gradle-wrapper.properties
index 3d2e79e..b44d3e9 100644
--- a/gradle/wrapper/gradle-wrapper.properties
+++ b/gradle/wrapper/gradle-wrapper.properties
@@ -17,8 +17,8 @@
 
 distributionBase=GRADLE_USER_HOME
 distributionPath=wrapper/dists
-distributionSha256Sum=682b4df7fe5accdca84a4d1ef6a3a6ab096b3efd5edf7de2bd8c758d95a93703
-distributionUrl=https\://services.gradle.org/distributions/gradle-8.10-all.zip
+distributionSha256Sum=fdfca5dbc2834f0ece5020465737538e5ba679deeff5ab6c09621d67f8bb1a15
+distributionUrl=https\://services.gradle.org/distributions/gradle-8.10.1-all.zip
 networkTimeout=1
 zipStoreBase=GRADLE_USER_HOME
 zipStorePath=wrapper/dists
diff --git a/gradlew b/gradlew
index 482b49a..2ccd3c3 100755
--- a/gradlew
+++ b/gradlew
@@ -87,11 +87,11 @@ APP_BASE_NAME=${0##*/}
 APP_HOME=$( cd "${APP_HOME:-./}" > /dev/null && pwd -P ) || exit
 
 if [ ! -e $APP_HOME/gradle/wrapper/gradle-wrapper.jar -a "$(command -v curl)" 
]; then
-curl -o $APP_HOME/gradle/wrapper/gradle-wrapper.jar 
https://raw.githubusercontent.com/gradle/gradle/v8.10.0/gradle/wrapper/gradle-wrapper.jar
+curl -o $APP_HOME/gradle/wrapper/gradle-wrapper.jar 
https://raw.githubusercontent.com/gradle/gradle/v8.10.1/gradle/wrapper/gradle-wrapper.jar
 fi
 # If the file still doesn't exist, let's try `wget` and cross our fingers
 if [ ! -e $APP_HOME/gradle/wrapper/gradle-wrapper.jar -a "$(command -v wget)" 
]; then
-wget -O $APP_HOME/gradle/wrapper/gradle-wrapper.jar 
https://raw.githubusercontent.com/gradle/gradle/v8.10.0/gradle/wrapper/gradle-wrapper.jar
+wget -O $APP_HOME/gradle/wrapper/gradle-wrapper.jar 
https://raw.githubusercontent.com/gradle/gradle/v8.10.1/gradle/wrapper/gradle-wrapper.jar
 fi
 
 # Use the maximum available, or set MAX_FD != -1 to use that value.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (317eddb7390c -> d2d293e3fb57)

2024-09-12 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 317eddb7390c [SPARK-49606][PS][DOCS] Improve documentation of Pandas 
on Spark plotting API
 add d2d293e3fb57 [SPARK-49598][K8S] Support user-defined labels for 
OnDemand PVCs

No new revisions were added by this update.

Summary of changes:
 docs/running-on-kubernetes.md  | 18 ++
 .../scala/org/apache/spark/deploy/k8s/Config.scala |  2 +-
 .../spark/deploy/k8s/KubernetesVolumeSpec.scala|  3 +-
 .../spark/deploy/k8s/KubernetesVolumeUtils.scala   | 16 -
 .../k8s/features/MountVolumesFeatureStep.scala |  9 ++-
 .../spark/deploy/k8s/KubernetesTestConf.scala  |  9 ++-
 .../deploy/k8s/KubernetesVolumeUtilsSuite.scala| 34 +-
 .../features/MountVolumesFeatureStepSuite.scala| 73 ++
 8 files changed, 154 insertions(+), 10 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (98f0d9f32322 -> 317eddb7390c)

2024-09-12 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 98f0d9f32322 [SPARK-49605][SQL] Fix the prompt when `ascendingOrder` 
is `DataTypeMismatch` in `SortArray`
 add 317eddb7390c [SPARK-49606][PS][DOCS] Improve documentation of Pandas 
on Spark plotting API

No new revisions were added by this update.

Summary of changes:
 python/pyspark/pandas/plot/core.py | 26 --
 1 file changed, 24 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-44811][BUILD] Upgrade Guava to 33.2.1-jre

2024-09-12 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1f24b2d72ed6 [SPARK-44811][BUILD] Upgrade Guava to 33.2.1-jre
1f24b2d72ed6 is described below

commit 1f24b2d72ed6821a6cc6d1d22683d2f3ba2326a2
Author: Cheng Pan 
AuthorDate: Thu Sep 12 09:26:56 2024 -0700

[SPARK-44811][BUILD] Upgrade Guava to 33.2.1-jre

### What changes were proposed in this pull request?

This PR upgrades Spark's built-in Guava from 14 to 33.2.1-jre

Currently, Spark uses Guava 14 because the previous built-in Hive 2.3.9 is 
incompatible with new Guava versions. HIVE-27560 
(https://github.com/apache/hive/pull/4542) makes Hive 2.3.10 compatible with 
Guava 14+ (thanks to LuciferYang)

### Why are the changes needed?

It's a long-standing issue, see prior discussions at 
https://github.com/apache/spark/pull/35584, 
https://github.com/apache/spark/pull/36231, and 
https://github.com/apache/spark/pull/33989

### Does this PR introduce _any_ user-facing change?

Yes, some user-faced error messages changed.

### How was this patch tested?

GA passed.

Closes #42493 from pan3793/guava.

Authored-by: Cheng Pan 
Signed-off-by: Dongjoon Hyun 
---
 assembly/pom.xml   | 2 +-
 core/pom.xml   | 1 +
 dev/deps/spark-deps-hadoop-3-hive-2.3  | 7 ++-
 pom.xml| 3 ++-
 project/SparkBuild.scala   | 2 +-
 .../spark/sql/catalyst/expressions/IntervalExpressionsSuite.scala  | 2 +-
 .../src/test/resources/sql-tests/results/ansi/interval.sql.out | 4 ++--
 sql/core/src/test/resources/sql-tests/results/interval.sql.out | 4 ++--
 8 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/assembly/pom.xml b/assembly/pom.xml
index 4b074a88dab4..01bd324efc11 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -123,7 +123,7 @@
 
 
   com.google.guava
diff --git a/core/pom.xml b/core/pom.xml
index 53d5ad71cebf..19f58940ed94 100644
--- a/core/pom.xml
+++ b/core/pom.xml
@@ -558,6 +558,7 @@
   org.eclipse.jetty:jetty-util
   org.eclipse.jetty:jetty-server
   com.google.guava:guava
+  com.google.guava:failureaccess
   com.google.protobuf:*
   
diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index c89c92815d45..2db86ed229a0 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -33,6 +33,7 @@ breeze-macros_2.13/2.1.0//breeze-macros_2.13-2.1.0.jar
 breeze_2.13/2.1.0//breeze_2.13-2.1.0.jar
 bundle/2.24.6//bundle-2.24.6.jar
 cats-kernel_2.13/2.8.0//cats-kernel_2.13-2.8.0.jar
+checker-qual/3.42.0//checker-qual-3.42.0.jar
 chill-java/0.10.0//chill-java-0.10.0.jar
 chill_2.13/0.10.0//chill_2.13-0.10.0.jar
 commons-cli/1.9.0//commons-cli-1.9.0.jar
@@ -62,12 +63,14 @@ derby/10.16.1.1//derby-10.16.1.1.jar
 derbyshared/10.16.1.1//derbyshared-10.16.1.1.jar
 derbytools/10.16.1.1//derbytools-10.16.1.1.jar
 
dropwizard-metrics-hadoop-metrics2-reporter/0.1.2//dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar
+error_prone_annotations/2.26.1//error_prone_annotations-2.26.1.jar
 esdk-obs-java/3.20.4.2//esdk-obs-java-3.20.4.2.jar
+failureaccess/1.0.2//failureaccess-1.0.2.jar
 flatbuffers-java/24.3.25//flatbuffers-java-24.3.25.jar
 gcs-connector/hadoop3-2.2.21/shaded/gcs-connector-hadoop3-2.2.21-shaded.jar
 gmetric4j/1.0.10//gmetric4j-1.0.10.jar
 gson/2.11.0//gson-2.11.0.jar
-guava/14.0.1//guava-14.0.1.jar
+guava/33.2.1-jre//guava-33.2.1-jre.jar
 hadoop-aliyun/3.4.0//hadoop-aliyun-3.4.0.jar
 hadoop-annotations/3.4.0//hadoop-annotations-3.4.0.jar
 hadoop-aws/3.4.0//hadoop-aws-3.4.0.jar
@@ -101,6 +104,7 @@ icu4j/75.1//icu4j-75.1.jar
 ini4j/0.5.4//ini4j-0.5.4.jar
 istack-commons-runtime/3.0.8//istack-commons-runtime-3.0.8.jar
 ivy/2.5.2//ivy-2.5.2.jar
+j2objc-annotations/3.0.0//j2objc-annotations-3.0.0.jar
 jackson-annotations/2.17.2//jackson-annotations-2.17.2.jar
 jackson-core-asl/1.9.13//jackson-core-asl-1.9.13.jar
 jackson-core/2.17.2//jackson-core-2.17.2.jar
@@ -184,6 +188,7 @@ lapack/3.0.3//lapack-3.0.3.jar
 leveldbjni-all/1.8//leveldbjni-all-1.8.jar
 libfb303/0.9.3//libfb303-0.9.3.jar
 libthrift/0.16.0//libthrift-0.16.0.jar
+listenablefuture/.0-empty-to-avoid-conflict-with-guava//listenablefuture-.0-empty-to-avoid-conflict-with-guava.jar
 log4j-1.2-api/2.22.1//log4j-1.2-api-2.22.1.jar
 log4j-api/2.22.1//log4j-api-2.22.1.jar
 log4j-core/2.22.1//log4j-core-2.22.1.jar
diff --git a/pom.xml b/pom.xml
index 6f5c9b63f86d..b1497c782685 100644
--- a/pom.xml
+++ b/pom.xml

(spark) branch branch-3.4 updated: [SPARK-49261][SQL] Don't replace literals in aggregate expressions with group-by expressions

2024-09-12 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new ba05a6bcd972 [SPARK-49261][SQL] Don't replace literals in aggregate 
expressions with group-by expressions
ba05a6bcd972 is described below

commit ba05a6bcd972ed4d5d1ee7a31f1c770ed7bfaed7
Author: Bruce Robbins 
AuthorDate: Thu Sep 12 08:11:03 2024 -0700

[SPARK-49261][SQL] Don't replace literals in aggregate expressions with 
group-by expressions

### What changes were proposed in this pull request?

Before this PR, `RewriteDistinctAggregates` could potentially replace 
literals in the aggregate expressions with output attributes from the `Expand` 
operator. This can occur when a group-by expression is a literal that happens 
by chance to match a literal used in an aggregate expression. E.g.:

```
create or replace temp view v1(a, b, c) as values
(1, 1.001d, 2), (2, 3.001d, 4), (2, 3.001, 4);

cache table v1;

select
  round(sum(b), 6) as sum1,
  count(distinct a) as count1,
  count(distinct c) as count2
from (
  select
6 as gb,
*
  from v1
)
group by a, gb;
```
In the optimized plan, you can see that the literal 6 in the `round` 
function invocation has been patched with an output attribute (6#163) from the 
`Expand` operator:
```
== Optimized Logical Plan ==
'Aggregate [a#123, 6#163], 
[round(first(sum(__auto_generated_subquery_name.b)#167, true) FILTER (WHERE 
(gid#162 = 0)), 6#163) AS sum1#114, count(__auto_generated_subquery_name.a#164) 
FILTER (WHERE (gid#162 = 1)) AS count1#115L, 
count(__auto_generated_subquery_name.c#165) FILTER (WHERE (gid#162 = 2)) AS 
count2#116L]
+- Aggregate [a#123, 6#163, __auto_generated_subquery_name.a#164, 
__auto_generated_subquery_name.c#165, gid#162], [a#123, 6#163, 
__auto_generated_subquery_name.a#164, __auto_generated_subquery_name.c#165, 
gid#162, sum(__auto_generated_subquery_name.b#166) AS 
sum(__auto_generated_subquery_name.b)#167]
   +- Expand [[a#123, 6, null, null, 0, b#124], [a#123, 6, a#123, null, 1, 
null], [a#123, 6, null, c#125, 2, null]], [a#123, 6#163, 
__auto_generated_subquery_name.a#164, __auto_generated_subquery_name.c#165, 
gid#162, __auto_generated_subquery_name.b#166]
  +- InMemoryRelation [a#123, b#124, c#125], StorageLevel(disk, memory, 
deserialized, 1 replicas)
+- LocalTableScan [a#6, b#7, c#8]
```
This is because the literal 6 was used in the group-by expressions 
(referred to as gb in the query, and renamed 6#163 in the `Expand` operator's 
output attributes).

After this PR, foldable expressions in the aggregate expressions are kept 
as-is.

### Why are the changes needed?

Some expressions require a foldable argument. In the above example, the 
`round` function requires a foldable expression as the scale argument. Because 
the scale argument is patched with an attribute, 
`RoundBase#checkInputDataTypes` returns an error, which leaves the `Aggregate` 
operator unresolved:
```
[INTERNAL_ERROR] Invalid call to dataType on unresolved object SQLSTATE: 
XX000
org.apache.spark.sql.catalyst.analysis.UnresolvedException: 
[INTERNAL_ERROR] Invalid call to dataType on unresolved object SQLSTATE: XX000
at 
org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:255)
at 
org.apache.spark.sql.catalyst.types.DataTypeUtils$.$anonfun$fromAttributes$1(DataTypeUtils.scala:241)
at scala.collection.immutable.List.map(List.scala:247)
at scala.collection.immutable.List.map(List.scala:79)
at 
org.apache.spark.sql.catalyst.types.DataTypeUtils$.fromAttributes(DataTypeUtils.scala:241)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.schema$lzycompute(QueryPlan.scala:428)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.schema(QueryPlan.scala:428)
at 
org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:474)
...
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #47876 from bersprockets/group_by_lit_issue.

Authored-by: Bruce Robbins 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 1a0791d006e25898b67cc17e1420f053a39091b9)
Signed-off-by: Dongjoon Hyun 
---
 .../optimizer/RewriteDistinctAggregates.scala   |  3 ++-
 .../optimizer/RewriteDistinctAggregatesSuite.scala  | 18 +-
 .../apache/spark/sql/DataFrameAggregateSuite.scala  | 21 +
 3 files changed, 40 insertions(+), 

(spark) branch branch-3.5 updated: [SPARK-49261][SQL] Don't replace literals in aggregate expressions with group-by expressions

2024-09-12 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 560efed3b00f [SPARK-49261][SQL] Don't replace literals in aggregate 
expressions with group-by expressions
560efed3b00f is described below

commit 560efed3b00f4ac9be4356714c664bf0e9341c0b
Author: Bruce Robbins 
AuthorDate: Thu Sep 12 08:11:03 2024 -0700

[SPARK-49261][SQL] Don't replace literals in aggregate expressions with 
group-by expressions

### What changes were proposed in this pull request?

Before this PR, `RewriteDistinctAggregates` could potentially replace 
literals in the aggregate expressions with output attributes from the `Expand` 
operator. This can occur when a group-by expression is a literal that happens 
by chance to match a literal used in an aggregate expression. E.g.:

```
create or replace temp view v1(a, b, c) as values
(1, 1.001d, 2), (2, 3.001d, 4), (2, 3.001, 4);

cache table v1;

select
  round(sum(b), 6) as sum1,
  count(distinct a) as count1,
  count(distinct c) as count2
from (
  select
6 as gb,
*
  from v1
)
group by a, gb;
```
In the optimized plan, you can see that the literal 6 in the `round` 
function invocation has been patched with an output attribute (6#163) from the 
`Expand` operator:
```
== Optimized Logical Plan ==
'Aggregate [a#123, 6#163], 
[round(first(sum(__auto_generated_subquery_name.b)#167, true) FILTER (WHERE 
(gid#162 = 0)), 6#163) AS sum1#114, count(__auto_generated_subquery_name.a#164) 
FILTER (WHERE (gid#162 = 1)) AS count1#115L, 
count(__auto_generated_subquery_name.c#165) FILTER (WHERE (gid#162 = 2)) AS 
count2#116L]
+- Aggregate [a#123, 6#163, __auto_generated_subquery_name.a#164, 
__auto_generated_subquery_name.c#165, gid#162], [a#123, 6#163, 
__auto_generated_subquery_name.a#164, __auto_generated_subquery_name.c#165, 
gid#162, sum(__auto_generated_subquery_name.b#166) AS 
sum(__auto_generated_subquery_name.b)#167]
   +- Expand [[a#123, 6, null, null, 0, b#124], [a#123, 6, a#123, null, 1, 
null], [a#123, 6, null, c#125, 2, null]], [a#123, 6#163, 
__auto_generated_subquery_name.a#164, __auto_generated_subquery_name.c#165, 
gid#162, __auto_generated_subquery_name.b#166]
  +- InMemoryRelation [a#123, b#124, c#125], StorageLevel(disk, memory, 
deserialized, 1 replicas)
+- LocalTableScan [a#6, b#7, c#8]
```
This is because the literal 6 was used in the group-by expressions 
(referred to as gb in the query, and renamed 6#163 in the `Expand` operator's 
output attributes).

After this PR, foldable expressions in the aggregate expressions are kept 
as-is.

### Why are the changes needed?

Some expressions require a foldable argument. In the above example, the 
`round` function requires a foldable expression as the scale argument. Because 
the scale argument is patched with an attribute, 
`RoundBase#checkInputDataTypes` returns an error, which leaves the `Aggregate` 
operator unresolved:
```
[INTERNAL_ERROR] Invalid call to dataType on unresolved object SQLSTATE: 
XX000
org.apache.spark.sql.catalyst.analysis.UnresolvedException: 
[INTERNAL_ERROR] Invalid call to dataType on unresolved object SQLSTATE: XX000
at 
org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:255)
at 
org.apache.spark.sql.catalyst.types.DataTypeUtils$.$anonfun$fromAttributes$1(DataTypeUtils.scala:241)
at scala.collection.immutable.List.map(List.scala:247)
at scala.collection.immutable.List.map(List.scala:79)
at 
org.apache.spark.sql.catalyst.types.DataTypeUtils$.fromAttributes(DataTypeUtils.scala:241)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.schema$lzycompute(QueryPlan.scala:428)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.schema(QueryPlan.scala:428)
at 
org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:474)
...
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #47876 from bersprockets/group_by_lit_issue.

Authored-by: Bruce Robbins 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 1a0791d006e25898b67cc17e1420f053a39091b9)
Signed-off-by: Dongjoon Hyun 
---
 .../optimizer/RewriteDistinctAggregates.scala   |  3 ++-
 .../optimizer/RewriteDistinctAggregatesSuite.scala  | 18 +-
 .../apache/spark/sql/DataFrameAggregateSuite.scala  | 21 +
 3 files changed, 40 insertions(+), 

(spark) branch master updated: [SPARK-49261][SQL] Don't replace literals in aggregate expressions with group-by expressions

2024-09-12 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1a0791d006e2 [SPARK-49261][SQL] Don't replace literals in aggregate 
expressions with group-by expressions
1a0791d006e2 is described below

commit 1a0791d006e25898b67cc17e1420f053a39091b9
Author: Bruce Robbins 
AuthorDate: Thu Sep 12 08:11:03 2024 -0700

[SPARK-49261][SQL] Don't replace literals in aggregate expressions with 
group-by expressions

### What changes were proposed in this pull request?

Before this PR, `RewriteDistinctAggregates` could potentially replace 
literals in the aggregate expressions with output attributes from the `Expand` 
operator. This can occur when a group-by expression is a literal that happens 
by chance to match a literal used in an aggregate expression. E.g.:

```
create or replace temp view v1(a, b, c) as values
(1, 1.001d, 2), (2, 3.001d, 4), (2, 3.001, 4);

cache table v1;

select
  round(sum(b), 6) as sum1,
  count(distinct a) as count1,
  count(distinct c) as count2
from (
  select
6 as gb,
*
  from v1
)
group by a, gb;
```
In the optimized plan, you can see that the literal 6 in the `round` 
function invocation has been patched with an output attribute (6#163) from the 
`Expand` operator:
```
== Optimized Logical Plan ==
'Aggregate [a#123, 6#163], 
[round(first(sum(__auto_generated_subquery_name.b)#167, true) FILTER (WHERE 
(gid#162 = 0)), 6#163) AS sum1#114, count(__auto_generated_subquery_name.a#164) 
FILTER (WHERE (gid#162 = 1)) AS count1#115L, 
count(__auto_generated_subquery_name.c#165) FILTER (WHERE (gid#162 = 2)) AS 
count2#116L]
+- Aggregate [a#123, 6#163, __auto_generated_subquery_name.a#164, 
__auto_generated_subquery_name.c#165, gid#162], [a#123, 6#163, 
__auto_generated_subquery_name.a#164, __auto_generated_subquery_name.c#165, 
gid#162, sum(__auto_generated_subquery_name.b#166) AS 
sum(__auto_generated_subquery_name.b)#167]
   +- Expand [[a#123, 6, null, null, 0, b#124], [a#123, 6, a#123, null, 1, 
null], [a#123, 6, null, c#125, 2, null]], [a#123, 6#163, 
__auto_generated_subquery_name.a#164, __auto_generated_subquery_name.c#165, 
gid#162, __auto_generated_subquery_name.b#166]
  +- InMemoryRelation [a#123, b#124, c#125], StorageLevel(disk, memory, 
deserialized, 1 replicas)
+- LocalTableScan [a#6, b#7, c#8]
```
This is because the literal 6 was used in the group-by expressions 
(referred to as gb in the query, and renamed 6#163 in the `Expand` operator's 
output attributes).

After this PR, foldable expressions in the aggregate expressions are kept 
as-is.

### Why are the changes needed?

Some expressions require a foldable argument. In the above example, the 
`round` function requires a foldable expression as the scale argument. Because 
the scale argument is patched with an attribute, 
`RoundBase#checkInputDataTypes` returns an error, which leaves the `Aggregate` 
operator unresolved:
```
[INTERNAL_ERROR] Invalid call to dataType on unresolved object SQLSTATE: 
XX000
org.apache.spark.sql.catalyst.analysis.UnresolvedException: 
[INTERNAL_ERROR] Invalid call to dataType on unresolved object SQLSTATE: XX000
at 
org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:255)
at 
org.apache.spark.sql.catalyst.types.DataTypeUtils$.$anonfun$fromAttributes$1(DataTypeUtils.scala:241)
at scala.collection.immutable.List.map(List.scala:247)
at scala.collection.immutable.List.map(List.scala:79)
at 
org.apache.spark.sql.catalyst.types.DataTypeUtils$.fromAttributes(DataTypeUtils.scala:241)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.schema$lzycompute(QueryPlan.scala:428)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.schema(QueryPlan.scala:428)
at 
org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:474)
...
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #47876 from bersprockets/group_by_lit_issue.

Authored-by: Bruce Robbins 
Signed-off-by: Dongjoon Hyun 
---
 .../optimizer/RewriteDistinctAggregates.scala   |  3 ++-
 .../optimizer/RewriteDistinctAggregatesSuite.scala  | 18 +-
 .../apache/spark/sql/DataFrameAggregateSuite.scala  | 21 +
 3 files changed, 40 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggr

(spark) branch master updated: [SPARK-49578][SQL][TESTS][FOLLOWUP] Regenerate Java 21 golden file for `postgreSQL/float4.sql` and `postgreSQL/int8.sql`

2024-09-12 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 19aad9ee36ed [SPARK-49578][SQL][TESTS][FOLLOWUP] Regenerate Java 21 
golden file for `postgreSQL/float4.sql` and `postgreSQL/int8.sql`
19aad9ee36ed is described below

commit 19aad9ee36edad0906b8223074351bfb76237c0a
Author: yangjie01 
AuthorDate: Thu Sep 12 07:17:28 2024 -0700

[SPARK-49578][SQL][TESTS][FOLLOWUP] Regenerate Java 21 golden file for 
`postgreSQL/float4.sql` and `postgreSQL/int8.sql`

### What changes were proposed in this pull request?
This pr regenerate Java 21 golden file for `postgreSQL/float4.sql` and 
`postgreSQL/int8.sql` to fix Java 21 daily test.

### Why are the changes needed?
Fix Java 21 daily test:
- https://github.com/apache/spark/actions/runs/10823897095/job/30030200710

```
[info] - postgreSQL/float4.sql *** FAILED *** (1 second, 100 milliseconds)
[info]   postgreSQL/float4.sql
[info]   Expected "...arameters" : {
[info]   "[ansiConfig" : "\"spark.sql.ansi.enabled\"",
[info]   "]expression" : "'N A ...", but got "...arameters" : {
[info]   "[]expression" : "'N A ..." Result did not match for query #11
[info]   SELECT float('N A N') (SQLQueryTestSuite.scala:663)
...
[info] - postgreSQL/int8.sql *** FAILED *** (2 seconds, 474 milliseconds)
[info]   postgreSQL/int8.sql
[info]   Expected "...arameters" : {
[info]   "[ansiConfig" : "\"spark.sql.ansi.enabled\"",
[info]   "]sourceType" : "\"BIG...", but got "...arameters" : {
[info]   "[]sourceType" : "\"BIG..." Result did not match for query #66
[info]   SELECT CAST(q1 AS int) FROM int8_tbl WHERE q2 <> 456 
(SQLQueryTestSuite.scala:663)
...
[info] *** 2 TESTS FAILED ***
[error] Failed: Total 3559, Failed 2, Errors 0, Passed 3557, Ignored 4
[error] Failed tests:
[error] org.apache.spark.sql.SQLQueryTestSuite
[error] (sql / Test / test) sbt.TestsFailedException: Tests unsuccessful
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass Github Acitons
- Manual checked: `build/sbt "sql/testOnly 
org.apache.spark.sql.SQLQueryTestSuite" with Java 21, all test passed
`

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #48089 from LuciferYang/SPARK-49578-FOLLOWUP.

Authored-by: yangjie01 
Signed-off-by: Dongjoon Hyun 
---
 .../resources/sql-tests/results/postgreSQL/float4.sql.out.java21   | 7 ---
 .../resources/sql-tests/results/postgreSQL/int8.sql.out.java21 | 4 
 2 files changed, 11 deletions(-)

diff --git 
a/sql/core/src/test/resources/sql-tests/results/postgreSQL/float4.sql.out.java21
 
b/sql/core/src/test/resources/sql-tests/results/postgreSQL/float4.sql.out.java21
index 6126411071bc..3c2189c39963 100644
--- 
a/sql/core/src/test/resources/sql-tests/results/postgreSQL/float4.sql.out.java21
+++ 
b/sql/core/src/test/resources/sql-tests/results/postgreSQL/float4.sql.out.java21
@@ -97,7 +97,6 @@ org.apache.spark.SparkNumberFormatException
   "errorClass" : "CAST_INVALID_INPUT",
   "sqlState" : "22018",
   "messageParameters" : {
-"ansiConfig" : "\"spark.sql.ansi.enabled\"",
 "expression" : "'N A N'",
 "sourceType" : "\"STRING\"",
 "targetType" : "\"FLOAT\""
@@ -122,7 +121,6 @@ org.apache.spark.SparkNumberFormatException
   "errorClass" : "CAST_INVALID_INPUT",
   "sqlState" : "22018",
   "messageParameters" : {
-"ansiConfig" : "\"spark.sql.ansi.enabled\"",
 "expression" : "'NaN x'",
 "sourceType" : "\"STRING\"",
 "targetType" : "\"FLOAT\""
@@ -147,7 +145,6 @@ org.apache.spark.SparkNumberFormatException
   "errorClass" : "CAST_INVALID_INPUT",
   "sqlState" : "22018",
   "messageParameters" : {
-"ansiConfig" : "\"spark.sql.ansi.enabled\"",
 "expression" : "' INFINITYx'",
 "sourceType" : "\"STRING\"",
 "targetType" : "\"FLOAT\""
@@ -196,7 +193,6 @@ org.apache.spark.SparkNumberFormatException
   "errorCl

(spark) branch master updated (591a60df788a -> 07f5b2c1c5ff)

2024-09-11 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 591a60df788a [SPARK-49602][BUILD] Fix `assembly/pom.xml` to use 
`{project.version}` instead of `{version}`
 add 07f5b2c1c5ff [SPARK-49155][SQL][SS] Use more appropriate parameter 
type to construct `GenericArrayData`

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/kafka010/KafkaRecordToRowConverter.scala| 2 +-
 .../sql/catalyst/expressions/aggregate/HistogramNumeric.scala| 9 +
 .../spark/sql/catalyst/expressions/aggregate/collect.scala   | 2 +-
 .../spark/sql/catalyst/expressions/collectionOperations.scala| 2 +-
 .../apache/spark/sql/catalyst/expressions/jsonExpressions.scala  | 2 +-
 .../org/apache/spark/sql/catalyst/expressions/xml/xpath.scala| 2 +-
 .../scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala | 2 +-
 .../org/apache/spark/sql/execution/command/CommandUtils.scala| 2 +-
 8 files changed, 12 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.4 updated: [SPARK-49595][CONNECT][SQL] Fix `DataFrame.unpivot/melt` in Spark Connect Scala Client

2024-09-11 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 6ffa94d6d3be [SPARK-49595][CONNECT][SQL] Fix `DataFrame.unpivot/melt` 
in Spark Connect Scala Client
6ffa94d6d3be is described below

commit 6ffa94d6d3be278910f282b35cc9cb4cd1dd2887
Author: Xinrong Meng 
AuthorDate: Wed Sep 11 08:52:33 2024 -0700

[SPARK-49595][CONNECT][SQL] Fix `DataFrame.unpivot/melt` in Spark Connect 
Scala Client

Fix DataFrame.unpivot/melt in Spark Connect Scala Client by correctly 
assigning the name for the variable column.

The original code used `setValueColumnName` for both the variable and value 
columns.

This fix is necessary to ensure the correct behavior of the unpivot/melt 
operation.

Yes. Variable and value columns can be set correctly as shown below.

```scala
scala> val df = Seq((1, 11, 12L), (2, 21, 22L)).toDF("id", "int", "long")
df: org.apache.spark.sql.package.DataFrame = [id: int, int: int ... 1 more 
field]

scala> df.show()
+---+---++
| id|int|long|
+---+---++
|  1| 11|  12|
|  2| 21|  22|
+---+---++
```
FROM (current master)
```scala
scala> df.unpivot(Array($"id"), Array($"int", $"long"), "variable", 
"value").show()
+---++-+
| id||value|
+---++-+
|  1| int|   11|
|  1|long|   12|
|  2| int|   21|
|  2|long|   22|
+---++-+

```

TO
```scala
scala> df.unpivot(Array($"id"), Array($"int", $"long"), "variable", 
"value").show()
+---++-+
| id|variable|value|
+---++-+
|  1| int|   11|
|  1|long|   12|
|  2| int|   21|
|  2|long|   22|
+---++-+
```

Existing tests.

No.

Closes #48069 from xinrong-meng/fix_unpivot.

Authored-by: Xinrong Meng 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit e63b5601c1bd74b2b0054d48f944424d12b79835)
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 96eebebaf5146144ff900c8081dfa5c5960b3bb2)
Signed-off-by: Dongjoon Hyun 
---
 .../jvm/src/main/scala/org/apache/spark/sql/Dataset.scala |   2 +-
 .../query-tests/explain-results/melt_no_values.explain|   2 +-
 .../query-tests/explain-results/melt_values.explain   |   2 +-
 .../query-tests/explain-results/unpivot_no_values.explain |   2 +-
 .../query-tests/explain-results/unpivot_values.explain|   2 +-
 .../resources/query-tests/queries/melt_no_values.json |   1 +
 .../query-tests/queries/melt_no_values.proto.bin  | Bin 71 -> 77 bytes
 .../test/resources/query-tests/queries/melt_values.json   |   1 +
 .../resources/query-tests/queries/melt_values.proto.bin   | Bin 73 -> 79 bytes
 .../resources/query-tests/queries/unpivot_no_values.json  |   1 +
 .../query-tests/queries/unpivot_no_values.proto.bin   | Bin 64 -> 70 bytes
 .../resources/query-tests/queries/unpivot_values.json |   1 +
 .../query-tests/queries/unpivot_values.proto.bin  | Bin 80 -> 86 bytes
 13 files changed, 9 insertions(+), 5 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala
index ca90afa14cf3..3fd93f09b9af 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala
@@ -1142,7 +1142,7 @@ class Dataset[T] private[sql] (
 val unpivot = builder.getUnpivotBuilder
   .setInput(plan.getRoot)
   .addAllIds(ids.toSeq.map(_.expr).asJava)
-  .setValueColumnName(variableColumnName)
+  .setVariableColumnName(variableColumnName)
   .setValueColumnName(valueColumnName)
 valuesOption.foreach { values =>
   unpivot.getValuesBuilder
diff --git 
a/connector/connect/common/src/test/resources/query-tests/explain-results/melt_no_values.explain
 
b/connector/connect/common/src/test/resources/query-tests/explain-results/melt_no_values.explain
index f61fc30a3a52..053937d84ec8 100644
--- 
a/connector/connect/common/src/test/resources/query-tests/explain-results/melt_no_values.explain
+++ 
b/connector/connect/common/src/test/resources/query-tests/explain-results/melt_no_values.explain
@@ -1,2 +1,2 @@
-Expand [[id#0L, a#0, b, b#0]], [id#0L, a#0, #0, value#0]
+Expand [[id#0L, a#0, b, b#0]], [id#0L, a#0, name#0, value#0]
 +- LocalRelation , [id#0L, a#0, b#0]
diff --git 
a/connector/connect/common/src/test/resources/query-tests/explain-results/melt_values.explain
 
b/connector/conne

(spark) branch master updated: [SPARK-49602][BUILD] Fix `assembly/pom.xml` to use `{project.version}` instead of `{version}`

2024-09-11 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 591a60df788a [SPARK-49602][BUILD] Fix `assembly/pom.xml` to use 
`{project.version}` instead of `{version}`
591a60df788a is described below

commit 591a60df788ae72226375f2d3e85c203200b4b93
Author: Dongjoon Hyun 
AuthorDate: Wed Sep 11 13:03:25 2024 -0700

[SPARK-49602][BUILD] Fix `assembly/pom.xml` to use `{project.version}` 
instead of `{version}`

### What changes were proposed in this pull request?

This PR aims to fix `assembly/pom.xml` to use `{project.version}` instead 
of `{version}`.

The original change was introduced recently by
- #47402

### Why are the changes needed?

**BEFORE**
```
$ mvn clean | head -n9
[INFO] Scanning for projects...
[WARNING]
[WARNING] Some problems were encountered while building the effective model 
for org.apache.spark:spark-assembly_2.13:pom:4.0.0-SNAPSHOT
[WARNING] The expression ${version} is deprecated. Please use 
${project.version} instead.
[WARNING]
[WARNING] It is highly recommended to fix these problems because they 
threaten the stability of your build.
[WARNING]
[WARNING] For this reason, future Maven versions might no longer support 
building such malformed projects.
[WARNING]
```

**AFTER**
```
$ mvn clean | head -n9
[INFO] Scanning for projects...
[INFO] 

[INFO] Detecting the operating system and CPU architecture
[INFO] 

[INFO] os.detected.name: osx
[INFO] os.detected.arch: aarch_64
[INFO] os.detected.version: 15.0
[INFO] os.detected.version.major: 15
[INFO] os.detected.version.minor: 0
```

### Does this PR introduce _any_ user-facing change?

No, this is a dev-only change for building distribution.

### How was this patch tested?

Manual test.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #48081 from dongjoon-hyun/SPARK-49602.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 assembly/pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/assembly/pom.xml b/assembly/pom.xml
index 8b21f7e808ce..4b074a88dab4 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -200,7 +200,7 @@
 
   cp
   
-
${basedir}/../connector/connect/client/jvm/target/spark-connect-client-jvm_${scala.binary.version}-${version}.jar
+
${basedir}/../connector/connect/client/jvm/target/spark-connect-client-jvm_${scala.binary.version}-${project.version}.jar
 
${basedir}/target/scala-${scala.binary.version}/jars/connect-repl
   
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-49310][BUILD] Upgrade `Parquet` to 1.14.2

2024-09-11 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a9502d42a500 [SPARK-49310][BUILD] Upgrade `Parquet` to 1.14.2
a9502d42a500 is described below

commit a9502d42a5002bac9bee74fef8427a28cf1ddf86
Author: Fokko 
AuthorDate: Wed Sep 11 11:53:11 2024 -0700

[SPARK-49310][BUILD] Upgrade `Parquet` to 1.14.2

### What changes were proposed in this pull request?

This PR aims to upgrade Parquet to 1.14.2.

### Why are the changes needed?

To bring the latest bug fixes.
- 
https://mvnrepository.com/artifact/org.apache.parquet/parquet-common/1.14.2

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #47807 from Fokko/fd-parquet.

Lead-authored-by: Fokko 
Co-authored-by: Dongjoon Hyun 
Co-authored-by: Fokko Driesprong 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 12 ++--
 pom.xml   |  2 +-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 69123f91fcaf..c89c92815d45 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -236,12 +236,12 @@ orc-shims/2.0.2//orc-shims-2.0.2.jar
 oro/2.0.8//oro-2.0.8.jar
 osgi-resource-locator/1.0.3//osgi-resource-locator-1.0.3.jar
 paranamer/2.8//paranamer-2.8.jar
-parquet-column/1.14.1//parquet-column-1.14.1.jar
-parquet-common/1.14.1//parquet-common-1.14.1.jar
-parquet-encoding/1.14.1//parquet-encoding-1.14.1.jar
-parquet-format-structures/1.14.1//parquet-format-structures-1.14.1.jar
-parquet-hadoop/1.14.1//parquet-hadoop-1.14.1.jar
-parquet-jackson/1.14.1//parquet-jackson-1.14.1.jar
+parquet-column/1.14.2//parquet-column-1.14.2.jar
+parquet-common/1.14.2//parquet-common-1.14.2.jar
+parquet-encoding/1.14.2//parquet-encoding-1.14.2.jar
+parquet-format-structures/1.14.2//parquet-format-structures-1.14.2.jar
+parquet-hadoop/1.14.2//parquet-hadoop-1.14.2.jar
+parquet-jackson/1.14.2//parquet-jackson-1.14.2.jar
 pickle/1.5//pickle-1.5.jar
 py4j/0.10.9.7//py4j-0.10.9.7.jar
 remotetea-oncrpc/1.1.2//remotetea-oncrpc-1.1.2.jar
diff --git a/pom.xml b/pom.xml
index 4b769b1f7fee..6f5c9b63f86d 100644
--- a/pom.xml
+++ b/pom.xml
@@ -137,7 +137,7 @@
 3.8.0
 
 10.16.1.1
-1.14.1
+1.14.2
 2.0.2
 shaded-protobuf
 11.0.23


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-49085][CONNECT][BUILD][FOLLOWUP] Remove the erroneous `type` definition for `spark-protobuf` from `sql/connect/server/pom.xml`

2024-09-11 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c5fd509ad3c0 [SPARK-49085][CONNECT][BUILD][FOLLOWUP] Remove the 
erroneous `type` definition for `spark-protobuf` from 
`sql/connect/server/pom.xml`
c5fd509ad3c0 is described below

commit c5fd509ad3c0781f8511986f3c4e52692bc783c4
Author: yangjie01 
AuthorDate: Wed Sep 11 10:57:35 2024 -0700

[SPARK-49085][CONNECT][BUILD][FOLLOWUP] Remove the erroneous `type` 
definition for `spark-protobuf` from `sql/connect/server/pom.xml`

### What changes were proposed in this pull request?
This pr corrects the erroneous changes made in 
https://github.com/apache/spark/pull/48051/files, remove the erroneous `type` 
definition for `spark-protobuf` from `sql/connect/server/pom.xml`.

### Why are the changes needed?
When maven testing the connect server module,  what we need is 
`spark-protobuf_2.13-4.0.0-SNAPSHOT.jar` rather than 
`spark-protobuf_2.13-4.0.0-SNAPSHOT-tests.jar`.

And https://github.com/apache/spark/pull/48051/files caused the failure of 
the Maven daily test:
- https://github.com/apache/spark/actions/runs/10812252676/job/30002163824

```
- from_protobuf_messageClassName_descFilePath *** FAILED ***
  org.apache.spark.sql.AnalysisException: 
[PROTOBUF_NOT_LOADED_SQL_FUNCTIONS_UNUSABLE] Cannot call the FROM_PROTOBUF SQL 
function because the Protobuf data source is not loaded.
Please restart your job or session with the 'spark-protobuf' package 
loaded, such as by using the --packages argument on the command line, and then 
retry your query or command again. SQLSTATE: 22KD3
  at 
org.apache.spark.sql.errors.QueryCompilationErrors$.protobufNotLoadedSqlFunctionsUnusable(QueryCompilationErrors.scala:4096)
  at 
org.apache.spark.sql.catalyst.expressions.FromProtobuf.liftedTree1$1(toFromProtobufSqlFunctions.scala:184)
  at 
org.apache.spark.sql.catalyst.expressions.FromProtobuf.replacement$lzycompute(toFromProtobufSqlFunctions.scala:178)
  at 
org.apache.spark.sql.catalyst.expressions.FromProtobuf.replacement(toFromProtobufSqlFunctions.scala:157)
  at 
org.apache.spark.sql.catalyst.expressions.RuntimeReplaceable.dataType(Expression.scala:417)
  at 
org.apache.spark.sql.catalyst.expressions.RuntimeReplaceable.dataType$(Expression.scala:417)
  at 
org.apache.spark.sql.catalyst.expressions.FromProtobuf.dataType(toFromProtobufSqlFunctions.scala:86)
  at 
org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:195)
  at 
org.apache.spark.sql.catalyst.plans.logical.Project.$anonfun$output$1(basicLogicalOperators.scala:74)
  at scala.collection.immutable.List.map(List.scala:247)
  ...
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass GitHub Actions
- Manual check:

```
build/mvn clean install -DskipTests -Phive
build/mvn test -pl sql/connect/server
```

**Before**

```
...
- from_protobuf_messageClassName_descFilePath_options *** FAILED ***
  org.apache.spark.sql.AnalysisException: 
[PROTOBUF_NOT_LOADED_SQL_FUNCTIONS_UNUSABLE] Cannot call the FROM_PROTOBUF SQL 
function because the Protobuf data source is not loaded.
Please restart your job or session with the 'spark-protobuf' package 
loaded, such as by using the --packages argument on the command line, and then 
retry your query or command again. SQLSTATE: 22KD3
  at 
org.apache.spark.sql.errors.QueryCompilationErrors$.protobufNotLoadedSqlFunctionsUnusable(QueryCompilationErrors.scala:4096)
  at 
org.apache.spark.sql.catalyst.expressions.FromProtobuf.liftedTree1$1(toFromProtobufSqlFunctions.scala:184)
  at 
org.apache.spark.sql.catalyst.expressions.FromProtobuf.replacement$lzycompute(toFromProtobufSqlFunctions.scala:178)
  at 
org.apache.spark.sql.catalyst.expressions.FromProtobuf.replacement(toFromProtobufSqlFunctions.scala:157)
  at 
org.apache.spark.sql.catalyst.expressions.RuntimeReplaceable.dataType(Expression.scala:417)
  at 
org.apache.spark.sql.catalyst.expressions.RuntimeReplaceable.dataType$(Expression.scala:417)
  at 
org.apache.spark.sql.catalyst.expressions.FromProtobuf.dataType(toFromProtobufSqlFunctions.scala:86)
  at 
org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:195)
  at 
org.apache.spark.sql.catalyst.plans.logical.Project.$anonfun$output$1(basicLogicalOperators.scala:74)
  at scala.collection.immutable.List.map(List.scala:247)
...
Run completed in 2 minutes, 19 seconds.
Total number of tests run: 948
Suites: completed 27, aborted 0
Tests: succeeded 942, failed 6, canceled 0, ignored 0, pending 0
*** 6 TESTS FAILED ***

(spark) branch master updated: [SPARK-49600][PYTHON] Remove `Python 3.6 and older`-related logic from `try_simplify_traceback`

2024-09-11 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new aee9f60b3966 [SPARK-49600][PYTHON] Remove `Python 3.6 and 
older`-related logic from `try_simplify_traceback`
aee9f60b3966 is described below

commit aee9f60b39669c7f32152a7f754e611de8af2592
Author: Dongjoon Hyun 
AuthorDate: Wed Sep 11 10:33:45 2024 -0700

[SPARK-49600][PYTHON] Remove `Python 3.6 and older`-related logic from 
`try_simplify_traceback`

### What changes were proposed in this pull request?

Apache Spark 4.0.0 supports only Python 3.9+.
- #46228

### Why are the changes needed?

To simplify and clarify the logic. I manually confirmed that this is the 
last logic about `sys.version_info` and `(3, 7)`.

```
$ git grep 'sys.version_info' | grep '(3, 7)'
python/pyspark/util.py:if sys.version_info[:2] < (3, 7):
python/pyspark/util.py:if "pypy" not in 
platform.python_implementation().lower() and sys.version_info[:2] >= (3, 7):
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #48078 from dongjoon-hyun/SPARK-49600.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 python/pyspark/util.py | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/python/pyspark/util.py b/python/pyspark/util.py
index 205e3d957a41..cca44435efe6 100644
--- a/python/pyspark/util.py
+++ b/python/pyspark/util.py
@@ -262,10 +262,6 @@ def try_simplify_traceback(tb: TracebackType) -> 
Optional[TracebackType]:
 if "pypy" in platform.python_implementation().lower():
 # Traceback modification is not supported with PyPy in PySpark.
 return None
-if sys.version_info[:2] < (3, 7):
-# Traceback creation is not supported Python < 3.7.
-# See https://bugs.python.org/issue30579.
-return None
 
 import pyspark
 
@@ -791,7 +787,7 @@ def is_remote_only() -> bool:
 
 
 if __name__ == "__main__":
-if "pypy" not in platform.python_implementation().lower() and 
sys.version_info[:2] >= (3, 7):
+if "pypy" not in platform.python_implementation().lower() and 
sys.version_info[:2] >= (3, 9):
 import doctest
 import pyspark.util
 from pyspark.core.context import SparkContext


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.5 updated: [SPARK-49595][CONNECT][SQL] Fix `DataFrame.unpivot/melt` in Spark Connect Scala Client

2024-09-11 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 96eebebaf514 [SPARK-49595][CONNECT][SQL] Fix `DataFrame.unpivot/melt` 
in Spark Connect Scala Client
96eebebaf514 is described below

commit 96eebebaf5146144ff900c8081dfa5c5960b3bb2
Author: Xinrong Meng 
AuthorDate: Wed Sep 11 08:52:33 2024 -0700

[SPARK-49595][CONNECT][SQL] Fix `DataFrame.unpivot/melt` in Spark Connect 
Scala Client

Fix DataFrame.unpivot/melt in Spark Connect Scala Client by correctly 
assigning the name for the variable column.

The original code used `setValueColumnName` for both the variable and value 
columns.

This fix is necessary to ensure the correct behavior of the unpivot/melt 
operation.

Yes. Variable and value columns can be set correctly as shown below.

```scala
scala> val df = Seq((1, 11, 12L), (2, 21, 22L)).toDF("id", "int", "long")
df: org.apache.spark.sql.package.DataFrame = [id: int, int: int ... 1 more 
field]

scala> df.show()
+---+---++
| id|int|long|
+---+---++
|  1| 11|  12|
|  2| 21|  22|
+---+---++
```
FROM (current master)
```scala
scala> df.unpivot(Array($"id"), Array($"int", $"long"), "variable", 
"value").show()
+---++-+
| id||value|
+---++-+
|  1| int|   11|
|  1|long|   12|
|  2| int|   21|
|  2|long|   22|
+---++-+

```

TO
```scala
scala> df.unpivot(Array($"id"), Array($"int", $"long"), "variable", 
"value").show()
+---++-+
| id|variable|value|
+---++-+
|  1| int|   11|
|  1|long|   12|
|  2| int|   21|
|  2|long|   22|
+---++-+
```

Existing tests.

No.

Closes #48069 from xinrong-meng/fix_unpivot.

Authored-by: Xinrong Meng 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit e63b5601c1bd74b2b0054d48f944424d12b79835)
Signed-off-by: Dongjoon Hyun 
---
 .../jvm/src/main/scala/org/apache/spark/sql/Dataset.scala |   2 +-
 .../query-tests/explain-results/melt_no_values.explain|   2 +-
 .../query-tests/explain-results/melt_values.explain   |   2 +-
 .../query-tests/explain-results/unpivot_no_values.explain |   2 +-
 .../query-tests/explain-results/unpivot_values.explain|   2 +-
 .../resources/query-tests/queries/melt_no_values.json |   1 +
 .../query-tests/queries/melt_no_values.proto.bin  | Bin 71 -> 77 bytes
 .../test/resources/query-tests/queries/melt_values.json   |   1 +
 .../resources/query-tests/queries/melt_values.proto.bin   | Bin 73 -> 79 bytes
 .../resources/query-tests/queries/unpivot_no_values.json  |   1 +
 .../query-tests/queries/unpivot_no_values.proto.bin   | Bin 64 -> 70 bytes
 .../resources/query-tests/queries/unpivot_values.json |   1 +
 .../query-tests/queries/unpivot_values.proto.bin  | Bin 80 -> 86 bytes
 13 files changed, 9 insertions(+), 5 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala
index bdaa4e28ba89..865596a669a0 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala
@@ -1291,7 +1291,7 @@ class Dataset[T] private[sql] (
 val unpivot = builder.getUnpivotBuilder
   .setInput(plan.getRoot)
   .addAllIds(ids.toSeq.map(_.expr).asJava)
-  .setValueColumnName(variableColumnName)
+  .setVariableColumnName(variableColumnName)
   .setValueColumnName(valueColumnName)
 valuesOption.foreach { values =>
   unpivot.getValuesBuilder
diff --git 
a/connector/connect/common/src/test/resources/query-tests/explain-results/melt_no_values.explain
 
b/connector/connect/common/src/test/resources/query-tests/explain-results/melt_no_values.explain
index f61fc30a3a52..053937d84ec8 100644
--- 
a/connector/connect/common/src/test/resources/query-tests/explain-results/melt_no_values.explain
+++ 
b/connector/connect/common/src/test/resources/query-tests/explain-results/melt_no_values.explain
@@ -1,2 +1,2 @@
-Expand [[id#0L, a#0, b, b#0]], [id#0L, a#0, #0, value#0]
+Expand [[id#0L, a#0, b, b#0]], [id#0L, a#0, name#0, value#0]
 +- LocalRelation , [id#0L, a#0, b#0]
diff --git 
a/connector/connect/common/src/test/resources/query-tests/explain-results/melt_values.explain
 
b/connector/connect/common/src/test/resources/query-tests/explain-results/melt_values.explain
index b5742d976dee..5a

(spark) branch master updated: [SPARK-49595][CONNECT][SQL] Fix `DataFrame.unpivot/melt` in Spark Connect Scala Client

2024-09-11 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e63b5601c1bd [SPARK-49595][CONNECT][SQL] Fix `DataFrame.unpivot/melt` 
in Spark Connect Scala Client
e63b5601c1bd is described below

commit e63b5601c1bd74b2b0054d48f944424d12b79835
Author: Xinrong Meng 
AuthorDate: Wed Sep 11 08:52:33 2024 -0700

[SPARK-49595][CONNECT][SQL] Fix `DataFrame.unpivot/melt` in Spark Connect 
Scala Client

### What changes were proposed in this pull request?
Fix DataFrame.unpivot/melt in Spark Connect Scala Client by correctly 
assigning the name for the variable column.

The original code used `setValueColumnName` for both the variable and value 
columns.

### Why are the changes needed?
This fix is necessary to ensure the correct behavior of the unpivot/melt 
operation.

### Does this PR introduce _any_ user-facing change?
Yes. Variable and value columns can be set correctly as shown below.

```scala
scala> val df = Seq((1, 11, 12L), (2, 21, 22L)).toDF("id", "int", "long")
df: org.apache.spark.sql.package.DataFrame = [id: int, int: int ... 1 more 
field]

scala> df.show()
+---+---++
| id|int|long|
+---+---++
|  1| 11|  12|
|  2| 21|  22|
+---+---++
```
FROM (current master)
```scala
scala> df.unpivot(Array($"id"), Array($"int", $"long"), "variable", 
"value").show()
+---++-+
| id||value|
+---++-+
|  1| int|   11|
|  1|long|   12|
|  2| int|   21|
|  2|long|   22|
+---++-+

```

TO
```scala
scala> df.unpivot(Array($"id"), Array($"int", $"long"), "variable", 
"value").show()
+---++-+
| id|variable|value|
+---++-+
|  1| int|   11|
|  1|long|   12|
|  2| int|   21|
|  2|long|   22|
+---++-+
```

### How was this patch tested?
Existing tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #48069 from xinrong-meng/fix_unpivot.

Authored-by: Xinrong Meng 
Signed-off-by: Dongjoon Hyun 
---
 .../jvm/src/main/scala/org/apache/spark/sql/Dataset.scala |   2 +-
 .../query-tests/explain-results/melt_no_values.explain|   2 +-
 .../query-tests/explain-results/melt_values.explain   |   2 +-
 .../query-tests/explain-results/unpivot_no_values.explain |   2 +-
 .../query-tests/explain-results/unpivot_values.explain|   2 +-
 .../resources/query-tests/queries/melt_no_values.json |   1 +
 .../query-tests/queries/melt_no_values.proto.bin  | Bin 71 -> 77 bytes
 .../test/resources/query-tests/queries/melt_values.json   |   1 +
 .../resources/query-tests/queries/melt_values.proto.bin   | Bin 73 -> 79 bytes
 .../resources/query-tests/queries/unpivot_no_values.json  |   1 +
 .../query-tests/queries/unpivot_no_values.proto.bin   | Bin 64 -> 70 bytes
 .../resources/query-tests/queries/unpivot_values.json |   1 +
 .../query-tests/queries/unpivot_values.proto.bin  | Bin 80 -> 86 bytes
 13 files changed, 9 insertions(+), 5 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala
index f5606215be89..519193ebd9c7 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala
@@ -481,7 +481,7 @@ class Dataset[T] private[sql] (
 val unpivot = builder.getUnpivotBuilder
   .setInput(plan.getRoot)
   .addAllIds(ids.toImmutableArraySeq.map(_.expr).asJava)
-  .setValueColumnName(variableColumnName)
+  .setVariableColumnName(variableColumnName)
   .setValueColumnName(valueColumnName)
 valuesOption.foreach { values =>
   unpivot.getValuesBuilder
diff --git 
a/sql/connect/common/src/test/resources/query-tests/explain-results/melt_no_values.explain
 
b/sql/connect/common/src/test/resources/query-tests/explain-results/melt_no_values.explain
index f61fc30a3a52..053937d84ec8 100644
--- 
a/sql/connect/common/src/test/resources/query-tests/explain-results/melt_no_values.explain
+++ 
b/sql/connect/common/src/test/resources/query-tests/explain-results/melt_no_values.explain
@@ -1,2 +1,2 @@
-Expand [[id#0L, a#0, b, b#0]], [id#0L, a#0, #0, value#0]
+Expand [[id#0L, a#0, b, b#0]], [id#0L, a#0, name#0, value#0]
 +- LocalRelation , [id#0L, a#0, b#0]
diff --git 
a/sql/connect/common/src/test/resources/query-tests/explain-results/melt_values.explai

(spark-kubernetes-operator) branch main updated: [SPARK-49527] Add `ConfOptionDocGenerator` to generate Spark Operator Config Property Doc

2024-09-11 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git


The following commit(s) were added to refs/heads/main by this push:
 new e842e3b  [SPARK-49527] Add `ConfOptionDocGenerator` to generate Spark 
Operator Config Property Doc
e842e3b is described below

commit e842e3bf8f08c5999d050aac059414424def5fa7
Author: zhou-jiang 
AuthorDate: Wed Sep 11 08:48:18 2024 -0700

[SPARK-49527] Add `ConfOptionDocGenerator` to generate Spark Operator 
Config Property Doc

### What changes were proposed in this pull request?

This PR adds a module `docs-utils` to automatically generate config 
properties doc page from source.

### Why are the changes needed?

This helps to keep config property docs up-to-date for properties by adding 
it as a gradle task.

### Does this PR introduce _any_ user-facing change?

No (doc only, not released)

### How was this patch tested?

Pass the CIs

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #118 from jiangzho/doc_utils.

Lead-authored-by: zhou-jiang 
Co-authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../docs-utils/build.gradle| 26 ++-
 .../k8s/operator/utils/ConfOptionDocGenerator.java | 87 ++
 .../apache/spark/k8s/operator/utils/DocTable.java  | 69 +
 settings.gradle|  2 +
 .../spark/k8s/operator/config/ConfigOption.java|  4 +-
 5 files changed, 182 insertions(+), 6 deletions(-)

diff --git a/settings.gradle b/build-tools/docs-utils/build.gradle
similarity index 57%
copy from settings.gradle
copy to build-tools/docs-utils/build.gradle
index 8b2b816..2cdde29 100644
--- a/settings.gradle
+++ b/build-tools/docs-utils/build.gradle
@@ -16,7 +16,25 @@
  * specific language governing permissions and limitations
  * under the License.
  */
-rootProject.name = 'apache-spark-kubernetes-operator'
-include 'spark-operator-api'
-include 'spark-submission-worker'
-include 'spark-operator'
+
+ext {
+javaMainClass = 
"org.apache.spark.k8s.operator.utils.ConfOptionDocGenerator"
+docsPath = System.getProperty("user.dir") + "/docs"
+}
+
+dependencies {
+implementation project(":spark-operator")
+implementation(libs.log4j.core)
+implementation(libs.log4j.slf4j.impl)
+compileOnly(libs.lombok)
+annotationProcessor(libs.lombok)
+}
+
+test {
+useJUnitPlatform()
+}
+
+tasks.register('generateConfPropsDoc', Exec) {
+description = "Generate config properties doc for operator"
+commandLine "java", "-classpath", 
sourceSets.main.runtimeClasspath.getAsPath(), javaMainClass, docsPath
+}
diff --git 
a/build-tools/docs-utils/src/main/java/org/apache/spark/k8s/operator/utils/ConfOptionDocGenerator.java
 
b/build-tools/docs-utils/src/main/java/org/apache/spark/k8s/operator/utils/ConfOptionDocGenerator.java
new file mode 100644
index 000..c0f2f49
--- /dev/null
+++ 
b/build-tools/docs-utils/src/main/java/org/apache/spark/k8s/operator/utils/ConfOptionDocGenerator.java
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.spark.k8s.operator.utils;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.PrintWriter;
+import java.lang.reflect.Field;
+import java.util.List;
+
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.spark.k8s.operator.config.ConfigOption;
+import org.apache.spark.k8s.operator.config.SparkOperatorConf;
+
+@Slf4j
+public class ConfOptionDocGenerator {
+  public static final String CONF_FILE_NAME = "config_properties.md";
+  public static final String DEFAULT_DOCS_PATH = "docs";
+  public static final String GENERATED_FILE_HEADER =
+  "This doc is automatically generated by gradle task, manual updates 
would be overridden.";
+
+  public void generate(String docsPath) throws 

(spark) branch master updated (cc6d6f17bdee -> 70482f6f82b1)

2024-09-11 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from cc6d6f17bdee [SPARK-49519][SQL] Merge options of table and relation 
when constructing FileScanBuilder
 add 70482f6f82b1 [SPARK-49599][BUILD] Upgrade snappy-java to 1.1.10.7

No new revisions were added by this update.

Summary of changes:
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >