(spark) branch branch-3.5 updated: [SPARK-49803][SQL][TESTS] Increase `spark.test.docker.connectionTimeout` to 10min
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 50c1783a1f97 [SPARK-49803][SQL][TESTS] Increase `spark.test.docker.connectionTimeout` to 10min 50c1783a1f97 is described below commit 50c1783a1f97e336c8560fc03ef85ec7319672ea Author: Dongjoon Hyun AuthorDate: Thu Sep 26 17:32:29 2024 -0700 [SPARK-49803][SQL][TESTS] Increase `spark.test.docker.connectionTimeout` to 10min This PR aims to increase `spark.test.docker.connectionTimeout` to 10min. Recently, various DB images fails at `connection` stage on multiple branches. **MASTER** branch https://github.com/apache/spark/actions/runs/11045311764/job/30682732260 ``` [info] OracleIntegrationSuite: [info] org.apache.spark.sql.jdbc.OracleIntegrationSuite *** ABORTED *** (5 minutes, 17 seconds) [info] The code passed to eventually never returned normally. Attempted 298 times over 5.004500551155 minutes. Last failure message: ORA-12541: Cannot connect. No listener at host 10.1.0.41 port 41079. (CONNECTION_ID=n9ZWIh+nQn+G9fkwKyoBQA==) ``` **branch-3.5** branch https://github.com/apache/spark/actions/runs/10939696926/job/30370552237 ``` [info] MsSqlServerNamespaceSuite: [info] org.apache.spark.sql.jdbc.v2.MsSqlServerNamespaceSuite *** ABORTED *** (5 minutes, 42 seconds) [info] The code passed to eventually never returned normally. Attempted 11 times over 5.48763128241 minutes. Last failure message: The TCP/IP connection to the host 10.1.0.56, port 35345 has failed. Error: "Connection refused (Connection refused). Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.".. (DockerJDBCInt [...] ``` **branch-3.4** branch https://github.com/apache/spark/actions/runs/10937842509/job/30364658576 ``` [info] MsSqlServerNamespaceSuite: [info] org.apache.spark.sql.jdbc.v2.MsSqlServerNamespaceSuite *** ABORTED *** (5 minutes, 42 seconds) [info] The code passed to eventually never returned normally. Attempted 11 times over 5.48755564563 minutes. Last failure message: The TCP/IP connection to the host 10.1.0.153, port 46153 has failed. Error: "Connection refused (Connection refused). Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.".. (DockerJDBCIn [...] ``` No, this is a test-only change. Pass the CIs. No. Closes #48272 from dongjoon-hyun/SPARK-49803. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 09b7aa67ce64d7d4ecc803215eaf85464df181c5) Signed-off-by: Dongjoon Hyun --- .../scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala index 40e8cbb6546b..55142e6d8de8 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala @@ -97,7 +97,7 @@ abstract class DockerJDBCIntegrationSuite protected val dockerIp = DockerUtils.getDockerIp() val db: DatabaseOnDocker - val connectionTimeout = timeout(5.minutes) + val connectionTimeout = timeout(10.minutes) val keepContainer = sys.props.getOrElse("spark.test.docker.keepContainer", "false").toBoolean val removePulledImage = - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated: [SPARK-49803][SQL][TESTS] Increase `spark.test.docker.connectionTimeout` to 10min
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new c787e9a89a86 [SPARK-49803][SQL][TESTS] Increase `spark.test.docker.connectionTimeout` to 10min c787e9a89a86 is described below commit c787e9a89a867b540b32faf8bb26302af256cc33 Author: Dongjoon Hyun AuthorDate: Thu Sep 26 17:32:29 2024 -0700 [SPARK-49803][SQL][TESTS] Increase `spark.test.docker.connectionTimeout` to 10min This PR aims to increase `spark.test.docker.connectionTimeout` to 10min. Recently, various DB images fails at `connection` stage on multiple branches. **MASTER** branch https://github.com/apache/spark/actions/runs/11045311764/job/30682732260 ``` [info] OracleIntegrationSuite: [info] org.apache.spark.sql.jdbc.OracleIntegrationSuite *** ABORTED *** (5 minutes, 17 seconds) [info] The code passed to eventually never returned normally. Attempted 298 times over 5.004500551155 minutes. Last failure message: ORA-12541: Cannot connect. No listener at host 10.1.0.41 port 41079. (CONNECTION_ID=n9ZWIh+nQn+G9fkwKyoBQA==) ``` **branch-3.5** branch https://github.com/apache/spark/actions/runs/10939696926/job/30370552237 ``` [info] MsSqlServerNamespaceSuite: [info] org.apache.spark.sql.jdbc.v2.MsSqlServerNamespaceSuite *** ABORTED *** (5 minutes, 42 seconds) [info] The code passed to eventually never returned normally. Attempted 11 times over 5.48763128241 minutes. Last failure message: The TCP/IP connection to the host 10.1.0.56, port 35345 has failed. Error: "Connection refused (Connection refused). Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.".. (DockerJDBCInt [...] ``` **branch-3.4** branch https://github.com/apache/spark/actions/runs/10937842509/job/30364658576 ``` [info] MsSqlServerNamespaceSuite: [info] org.apache.spark.sql.jdbc.v2.MsSqlServerNamespaceSuite *** ABORTED *** (5 minutes, 42 seconds) [info] The code passed to eventually never returned normally. Attempted 11 times over 5.48755564563 minutes. Last failure message: The TCP/IP connection to the host 10.1.0.153, port 46153 has failed. Error: "Connection refused (Connection refused). Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.".. (DockerJDBCIn [...] ``` No, this is a test-only change. Pass the CIs. No. Closes #48272 from dongjoon-hyun/SPARK-49803. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 09b7aa67ce64d7d4ecc803215eaf85464df181c5) Signed-off-by: Dongjoon Hyun (cherry picked from commit 50c1783a1f97e336c8560fc03ef85ec7319672ea) Signed-off-by: Dongjoon Hyun --- .../scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala index 40e8cbb6546b..55142e6d8de8 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala @@ -97,7 +97,7 @@ abstract class DockerJDBCIntegrationSuite protected val dockerIp = DockerUtils.getDockerIp() val db: DatabaseOnDocker - val connectionTimeout = timeout(5.minutes) + val connectionTimeout = timeout(10.minutes) val keepContainer = sys.props.getOrElse("spark.test.docker.keepContainer", "false").toBoolean val removePulledImage = - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-49803][SQL][TESTS] Increase `spark.test.docker.connectionTimeout` to 10min
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 09b7aa67ce64 [SPARK-49803][SQL][TESTS] Increase `spark.test.docker.connectionTimeout` to 10min 09b7aa67ce64 is described below commit 09b7aa67ce64d7d4ecc803215eaf85464df181c5 Author: Dongjoon Hyun AuthorDate: Thu Sep 26 17:32:29 2024 -0700 [SPARK-49803][SQL][TESTS] Increase `spark.test.docker.connectionTimeout` to 10min ### What changes were proposed in this pull request? This PR aims to increase `spark.test.docker.connectionTimeout` to 10min. ### Why are the changes needed? Recently, various DB images fails at `connection` stage on multiple branches. **MASTER** branch https://github.com/apache/spark/actions/runs/11045311764/job/30682732260 ``` [info] OracleIntegrationSuite: [info] org.apache.spark.sql.jdbc.OracleIntegrationSuite *** ABORTED *** (5 minutes, 17 seconds) [info] The code passed to eventually never returned normally. Attempted 298 times over 5.004500551155 minutes. Last failure message: ORA-12541: Cannot connect. No listener at host 10.1.0.41 port 41079. (CONNECTION_ID=n9ZWIh+nQn+G9fkwKyoBQA==) ``` **branch-3.5** branch https://github.com/apache/spark/actions/runs/10939696926/job/30370552237 ``` [info] MsSqlServerNamespaceSuite: [info] org.apache.spark.sql.jdbc.v2.MsSqlServerNamespaceSuite *** ABORTED *** (5 minutes, 42 seconds) [info] The code passed to eventually never returned normally. Attempted 11 times over 5.48763128241 minutes. Last failure message: The TCP/IP connection to the host 10.1.0.56, port 35345 has failed. Error: "Connection refused (Connection refused). Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.".. (DockerJDBCInt [...] ``` **branch-3.4** branch https://github.com/apache/spark/actions/runs/10937842509/job/30364658576 ``` [info] MsSqlServerNamespaceSuite: [info] org.apache.spark.sql.jdbc.v2.MsSqlServerNamespaceSuite *** ABORTED *** (5 minutes, 42 seconds) [info] The code passed to eventually never returned normally. Attempted 11 times over 5.48755564563 minutes. Last failure message: The TCP/IP connection to the host 10.1.0.153, port 46153 has failed. Error: "Connection refused (Connection refused). Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.".. (DockerJDBCIn [...] ``` ### Does this PR introduce _any_ user-facing change? No, this is a test-only change. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48272 from dongjoon-hyun/SPARK-49803. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .../scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala index 8d17e0b4e36e..1df01bd3bfb6 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala @@ -115,7 +115,7 @@ abstract class DockerJDBCIntegrationSuite protected val startContainerTimeout: Long = timeStringAsSeconds(sys.props.getOrElse("spark.test.docker.startContainerTimeout", "5min")) protected val connectionTimeout: PatienceConfiguration.Timeout = { -val timeoutStr = sys.props.getOrElse("spark.test.docker.connectionTimeout", "5min") +val timeoutStr = sys.props.getOrElse("spark.test.docker.connectionTimeout", "10min") timeout(timeStringAsSeconds(timeoutStr).seconds) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-49791][SQL][FOLLOWUP][3.5] Fix `import` statement
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new b51db8bcf80c [SPARK-49791][SQL][FOLLOWUP][3.5] Fix `import` statement b51db8bcf80c is described below commit b51db8bcf80cf070f93a05345640ca594301899d Author: Dongjoon Hyun AuthorDate: Thu Sep 26 14:51:57 2024 -0700 [SPARK-49791][SQL][FOLLOWUP][3.5] Fix `import` statement ### What changes were proposed in this pull request? This PR is a follow-up for `branch-3.5` due to the difference of `import` statement. - #48257 ### Why are the changes needed? To fix the compilation. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48271 from dongjoon-hyun/SPARK-49791. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala index f1664f66b7f8..4c0c750246f8 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala @@ -28,7 +28,7 @@ import org.apache.spark.sql.catalyst.catalog._ import org.apache.spark.sql.catalyst.expressions.Literal import org.apache.spark.sql.catalyst.plans.logical.{AppendData, CreateTableAsSelect, InsertIntoStatement, LogicalPlan, OptionList, OverwriteByExpression, OverwritePartitionsDynamic, ReplaceTableAsSelect, UnresolvedTableSpec} import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap -import org.apache.spark.sql.connector.catalog.{CatalogManager, CatalogPlugin, CatalogV2Implicits, CatalogV2Util, DelegatingCatalogExtension, Identifier, SupportsCatalogOptions, Table, TableCatalog, TableProvider, V1Table} +import org.apache.spark.sql.connector.catalog.{CatalogExtension, CatalogManager, CatalogPlugin, CatalogV2Implicits, CatalogV2Util, Identifier, SupportsCatalogOptions, Table, TableCatalog, TableProvider, V1Table} import org.apache.spark.sql.connector.catalog.TableCapability._ import org.apache.spark.sql.connector.catalog.TableWritePrivilege import org.apache.spark.sql.connector.catalog.TableWritePrivilege._ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-49791][SQL] Make DelegatingCatalogExtension more extendable
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new f1c69a5a687f [SPARK-49791][SQL] Make DelegatingCatalogExtension more extendable f1c69a5a687f is described below commit f1c69a5a687fdb4e5a613fe43bbf6f6366f63fda Author: Wenchen Fan AuthorDate: Thu Sep 26 13:39:02 2024 -0700 [SPARK-49791][SQL] Make DelegatingCatalogExtension more extendable ### What changes were proposed in this pull request? This PR updates `DelegatingCatalogExtension` so that it's more extendable - `initialize` becomes not final, so that sub-classes can overwrite it - `delegate` becomes `protected`, so that sub-classes can access it In addition, this PR fixes a mistake that `DelegatingCatalogExtension` is just a convenient default implementation, it's actually the `CatalogExtension` interface that indicates this catalog implementation will delegate requests to the Spark session catalog. https://github.com/apache/spark/pull/47724 should use `CatalogExtension` instead. ### Why are the changes needed? Unblock the Iceberg extension. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #48257 from cloud-fan/catalog. Lead-authored-by: Wenchen Fan Co-authored-by: Wenchen Fan Signed-off-by: Dongjoon Hyun (cherry picked from commit 339dd5b93316fecd0455b53b2cedee2b5333a184) Signed-off-by: Dongjoon Hyun --- .../spark/sql/connector/catalog/DelegatingCatalogExtension.java | 4 ++-- sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala| 2 +- .../apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala| 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/DelegatingCatalogExtension.java b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/DelegatingCatalogExtension.java index f6686d2e4d3b..786821514822 100644 --- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/DelegatingCatalogExtension.java +++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/DelegatingCatalogExtension.java @@ -38,7 +38,7 @@ import org.apache.spark.sql.util.CaseInsensitiveStringMap; @Evolving public abstract class DelegatingCatalogExtension implements CatalogExtension { - private CatalogPlugin delegate; + protected CatalogPlugin delegate; @Override public final void setDelegateCatalog(CatalogPlugin delegate) { @@ -51,7 +51,7 @@ public abstract class DelegatingCatalogExtension implements CatalogExtension { } @Override - public final void initialize(String name, CaseInsensitiveStringMap options) {} + public void initialize(String name, CaseInsensitiveStringMap options) {} @Override public Set capabilities() { diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala index 2506cb736f18..f1664f66b7f8 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala @@ -568,7 +568,7 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { val canUseV2 = lookupV2Provider().isDefined || (df.sparkSession.sessionState.conf.getConf( SQLConf.V2_SESSION_CATALOG_IMPLEMENTATION).isDefined && !df.sparkSession.sessionState.catalogManager.catalog(CatalogManager.SESSION_CATALOG_NAME) - .isInstanceOf[DelegatingCatalogExtension]) + .isInstanceOf[CatalogExtension]) session.sessionState.sqlParser.parseMultipartIdentifier(tableName) match { case nameParts @ NonSessionCatalogAndIdentifier(catalog, ident) => diff --git a/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala b/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala index 7500f32ac2b9..0a86a043985e 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala @@ -27,7 +27,7 @@ import org.apache.spark.sql.catalyst.plans.logical._ import org.apache.spark.sql.catalyst.rules.Rule import org.apache.spark.sql.catalyst.util.{quoteIfNeeded, toPrettySQL, ResolveDefaultColumns => DefaultCols} import org.apache.spark.sql.catalyst.util.ResolveDefaultColumns._ -import org.apache.spark.sql.connector.catalog.{CatalogManager, CatalogPlugin, CatalogV2Util, DelegatingCatalog
(spark) branch master updated: [SPARK-49791][SQL] Make DelegatingCatalogExtension more extendable
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 339dd5b93316 [SPARK-49791][SQL] Make DelegatingCatalogExtension more extendable 339dd5b93316 is described below commit 339dd5b93316fecd0455b53b2cedee2b5333a184 Author: Wenchen Fan AuthorDate: Thu Sep 26 13:39:02 2024 -0700 [SPARK-49791][SQL] Make DelegatingCatalogExtension more extendable ### What changes were proposed in this pull request? This PR updates `DelegatingCatalogExtension` so that it's more extendable - `initialize` becomes not final, so that sub-classes can overwrite it - `delegate` becomes `protected`, so that sub-classes can access it In addition, this PR fixes a mistake that `DelegatingCatalogExtension` is just a convenient default implementation, it's actually the `CatalogExtension` interface that indicates this catalog implementation will delegate requests to the Spark session catalog. https://github.com/apache/spark/pull/47724 should use `CatalogExtension` instead. ### Why are the changes needed? Unblock the Iceberg extension. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #48257 from cloud-fan/catalog. Lead-authored-by: Wenchen Fan Co-authored-by: Wenchen Fan Signed-off-by: Dongjoon Hyun --- .../spark/sql/connector/catalog/DelegatingCatalogExtension.java | 4 ++-- .../apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala| 4 ++-- .../scala/org/apache/spark/sql/internal/DataFrameWriterImpl.scala | 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/DelegatingCatalogExtension.java b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/DelegatingCatalogExtension.java index f6686d2e4d3b..786821514822 100644 --- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/DelegatingCatalogExtension.java +++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/DelegatingCatalogExtension.java @@ -38,7 +38,7 @@ import org.apache.spark.sql.util.CaseInsensitiveStringMap; @Evolving public abstract class DelegatingCatalogExtension implements CatalogExtension { - private CatalogPlugin delegate; + protected CatalogPlugin delegate; @Override public final void setDelegateCatalog(CatalogPlugin delegate) { @@ -51,7 +51,7 @@ public abstract class DelegatingCatalogExtension implements CatalogExtension { } @Override - public final void initialize(String name, CaseInsensitiveStringMap options) {} + public void initialize(String name, CaseInsensitiveStringMap options) {} @Override public Set capabilities() { diff --git a/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala b/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala index 02ad2e79a564..a9ad7523c8fb 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala @@ -28,7 +28,7 @@ import org.apache.spark.sql.catalyst.plans.logical._ import org.apache.spark.sql.catalyst.rules.Rule import org.apache.spark.sql.catalyst.util.{quoteIfNeeded, toPrettySQL, ResolveDefaultColumns => DefaultCols} import org.apache.spark.sql.catalyst.util.ResolveDefaultColumns._ -import org.apache.spark.sql.connector.catalog.{CatalogManager, CatalogPlugin, CatalogV2Util, DelegatingCatalogExtension, LookupCatalog, SupportsNamespaces, V1Table} +import org.apache.spark.sql.connector.catalog.{CatalogExtension, CatalogManager, CatalogPlugin, CatalogV2Util, LookupCatalog, SupportsNamespaces, V1Table} import org.apache.spark.sql.connector.expressions.Transform import org.apache.spark.sql.errors.QueryCompilationErrors import org.apache.spark.sql.execution.command._ @@ -706,6 +706,6 @@ class ResolveSessionCatalog(val catalogManager: CatalogManager) private def supportsV1Command(catalog: CatalogPlugin): Boolean = { isSessionCatalog(catalog) && ( SQLConf.get.getConf(SQLConf.V2_SESSION_CATALOG_IMPLEMENTATION).isEmpty || -catalog.isInstanceOf[DelegatingCatalogExtension]) +catalog.isInstanceOf[CatalogExtension]) } } diff --git a/sql/core/src/main/scala/org/apache/spark/sql/internal/DataFrameWriterImpl.scala b/sql/core/src/main/scala/org/apache/spark/sql/internal/DataFrameWriterImpl.scala index f0eef9ae1cbb..8164d33f46fe 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/internal/DataFrameWriter
(spark) branch master updated: [SPARK-49800][BUILD][K8S] Upgrade `kubernetes-client` to 6.13.4
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 54e62a158ead [SPARK-49800][BUILD][K8S] Upgrade `kubernetes-client` to 6.13.4 54e62a158ead is described below commit 54e62a158ead91d832d477a76aace40ef5b54121 Author: Bjørn Jørgensen AuthorDate: Thu Sep 26 13:37:39 2024 -0700 [SPARK-49800][BUILD][K8S] Upgrade `kubernetes-client` to 6.13.4 ### What changes were proposed in this pull request? Upgrade `kubernetes-client` from 6.13.3 to 6.13.4 ### Why are the changes needed? New version that have 5 fixes [Release log 6.13.4](https://github.com/fabric8io/kubernetes-client/releases/tag/v6.13.4) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48268 from bjornjorgensen/k8sclient6.13.4. Authored-by: Bjørn Jørgensen Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 50 +-- pom.xml | 2 +- 2 files changed, 26 insertions(+), 26 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 19b8a237d30a..c9a32757554b 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -159,31 +159,31 @@ jsr305/3.0.0//jsr305-3.0.0.jar jta/1.1//jta-1.1.jar jul-to-slf4j/2.0.16//jul-to-slf4j-2.0.16.jar kryo-shaded/4.0.2//kryo-shaded-4.0.2.jar -kubernetes-client-api/6.13.3//kubernetes-client-api-6.13.3.jar -kubernetes-client/6.13.3//kubernetes-client-6.13.3.jar -kubernetes-httpclient-okhttp/6.13.3//kubernetes-httpclient-okhttp-6.13.3.jar -kubernetes-model-admissionregistration/6.13.3//kubernetes-model-admissionregistration-6.13.3.jar -kubernetes-model-apiextensions/6.13.3//kubernetes-model-apiextensions-6.13.3.jar -kubernetes-model-apps/6.13.3//kubernetes-model-apps-6.13.3.jar -kubernetes-model-autoscaling/6.13.3//kubernetes-model-autoscaling-6.13.3.jar -kubernetes-model-batch/6.13.3//kubernetes-model-batch-6.13.3.jar -kubernetes-model-certificates/6.13.3//kubernetes-model-certificates-6.13.3.jar -kubernetes-model-common/6.13.3//kubernetes-model-common-6.13.3.jar -kubernetes-model-coordination/6.13.3//kubernetes-model-coordination-6.13.3.jar -kubernetes-model-core/6.13.3//kubernetes-model-core-6.13.3.jar -kubernetes-model-discovery/6.13.3//kubernetes-model-discovery-6.13.3.jar -kubernetes-model-events/6.13.3//kubernetes-model-events-6.13.3.jar -kubernetes-model-extensions/6.13.3//kubernetes-model-extensions-6.13.3.jar -kubernetes-model-flowcontrol/6.13.3//kubernetes-model-flowcontrol-6.13.3.jar -kubernetes-model-gatewayapi/6.13.3//kubernetes-model-gatewayapi-6.13.3.jar -kubernetes-model-metrics/6.13.3//kubernetes-model-metrics-6.13.3.jar -kubernetes-model-networking/6.13.3//kubernetes-model-networking-6.13.3.jar -kubernetes-model-node/6.13.3//kubernetes-model-node-6.13.3.jar -kubernetes-model-policy/6.13.3//kubernetes-model-policy-6.13.3.jar -kubernetes-model-rbac/6.13.3//kubernetes-model-rbac-6.13.3.jar -kubernetes-model-resource/6.13.3//kubernetes-model-resource-6.13.3.jar -kubernetes-model-scheduling/6.13.3//kubernetes-model-scheduling-6.13.3.jar -kubernetes-model-storageclass/6.13.3//kubernetes-model-storageclass-6.13.3.jar +kubernetes-client-api/6.13.4//kubernetes-client-api-6.13.4.jar +kubernetes-client/6.13.4//kubernetes-client-6.13.4.jar +kubernetes-httpclient-okhttp/6.13.4//kubernetes-httpclient-okhttp-6.13.4.jar +kubernetes-model-admissionregistration/6.13.4//kubernetes-model-admissionregistration-6.13.4.jar +kubernetes-model-apiextensions/6.13.4//kubernetes-model-apiextensions-6.13.4.jar +kubernetes-model-apps/6.13.4//kubernetes-model-apps-6.13.4.jar +kubernetes-model-autoscaling/6.13.4//kubernetes-model-autoscaling-6.13.4.jar +kubernetes-model-batch/6.13.4//kubernetes-model-batch-6.13.4.jar +kubernetes-model-certificates/6.13.4//kubernetes-model-certificates-6.13.4.jar +kubernetes-model-common/6.13.4//kubernetes-model-common-6.13.4.jar +kubernetes-model-coordination/6.13.4//kubernetes-model-coordination-6.13.4.jar +kubernetes-model-core/6.13.4//kubernetes-model-core-6.13.4.jar +kubernetes-model-discovery/6.13.4//kubernetes-model-discovery-6.13.4.jar +kubernetes-model-events/6.13.4//kubernetes-model-events-6.13.4.jar +kubernetes-model-extensions/6.13.4//kubernetes-model-extensions-6.13.4.jar +kubernetes-model-flowcontrol/6.13.4//kubernetes-model-flowcontrol-6.13.4.jar +kubernetes-model-gatewayapi/6.13.4//kubernetes-model-gatewayapi-6.13.4.jar +kubernetes-model-metrics/6.13.4//kubernetes-model-metrics-6.13.4.jar +kubernetes-model-networking/6.13.4//kubernetes-model-networking-6.13.4.jar +kubernetes-model-node/6.13.4
(spark) branch master updated (218051a566c7 -> 87b5ffb22082)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 218051a566c7 [MINOR][SQL][TESTS] Use `formatString.format(value)` instead of `value.formatted(formatString)` add 87b5ffb22082 [SPARK-49797][INFRA] Align the running OS image of `maven_test.yml` to `ubuntu-latest` No new revisions were added by this update. Summary of changes: .github/workflows/maven_test.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-49786][K8S] Lower `KubernetesClusterSchedulerBackend.onDisconnected` log level to debug
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 562977928772 [SPARK-49786][K8S] Lower `KubernetesClusterSchedulerBackend.onDisconnected` log level to debug 562977928772 is described below commit 5629779287724a891c81b16f982f9529bd379c39 Author: Dongjoon Hyun AuthorDate: Wed Sep 25 22:34:35 2024 -0700 [SPARK-49786][K8S] Lower `KubernetesClusterSchedulerBackend.onDisconnected` log level to debug ### What changes were proposed in this pull request? This PR aims to lower `KubernetesClusterSchedulerBackend.onDisconnected` log level to debug. ### Why are the changes needed? This INFO-level message was added here. We already propagate the disconnection reason to UI, and `No executor found` has been used when an unknown peer is connect or disconnect. - https://github.com/apache/spark/pull/37821 The driver can be accessed by non-executors by design. And, all other resource managers do not complain at INFO level. ``` INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: No executor found for x.x.x.0:x ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual review because this is a log level change. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48249 from dongjoon-hyun/SPARK-49786. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .../scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala index 4e4634504a0f..09faa2a7fb1b 100644 --- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala +++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala @@ -32,7 +32,7 @@ import org.apache.spark.deploy.k8s.Config._ import org.apache.spark.deploy.k8s.Constants._ import org.apache.spark.deploy.k8s.submit.KubernetesClientUtils import org.apache.spark.deploy.security.HadoopDelegationTokenManager -import org.apache.spark.internal.LogKeys.{COUNT, HOST_PORT, TOTAL} +import org.apache.spark.internal.LogKeys.{COUNT, TOTAL} import org.apache.spark.internal.MDC import org.apache.spark.internal.config.SCHEDULER_MIN_REGISTERED_RESOURCES_RATIO import org.apache.spark.resource.ResourceProfile @@ -356,7 +356,7 @@ private[spark] class KubernetesClusterSchedulerBackend( execIDRequester -= rpcAddress // Expected, executors re-establish a connection with an ID case _ => - logInfo(log"No executor found for ${MDC(HOST_PORT, rpcAddress)}") + logDebug(s"No executor found for ${rpcAddress}") } } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark-kubernetes-operator) branch main updated: [SPARK-49790] Support `HPA` template for `SparkCluster`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git The following commit(s) were added to refs/heads/main by this push: new b2cee84 [SPARK-49790] Support `HPA` template for `SparkCluster` b2cee84 is described below commit b2cee8443e7760b82e63bc9b343a5b9279c0ae6a Author: Dongjoon Hyun AuthorDate: Wed Sep 25 14:58:30 2024 -0700 [SPARK-49790] Support `HPA` template for `SparkCluster` ### What changes were proposed in this pull request? This PR aims to support `HPA` template for `SparkCluster`. ### Why are the changes needed? Although `SparkCluster` needs generated values for the following `HPA` field.s ``` maxReplicas: minReplicas: scaleTargetRef: apiVersion: apps/v1 kind: StatefulSet name: ``` We still can allow the users to tune HPA for their cluster usage pattern like the following. ```yaml horizontalPodAutoscalerSpec: metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 10 behavior: scaleUp: policies: - type: Pods value: 1 periodSeconds: 10 scaleDown: policies: - type: Pods value: 1 periodSeconds: 1200 ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. And, do the manual review. - Delete the existing CRD because it's changed. ``` $ kubectl delete crd sparkclusters.spark.apache.org ``` - Build and Install ``` $ gradle build buildDockerImage spark-operator-api:relocateGeneratedCRD $ helm install spark-kubernetes-operator -f build-tools/helm/spark-kubernetes-operator/values.yaml build-tools/helm/spark-kubernetes-operator/ ``` - Create a `SparkCluster` with HPA template via the given example. ``` $ kubectl apply -f examples/cluster-with-hpa-template.yaml $ kubectl get hpa cluster-with-hpa-template-worker-hpa -oyaml apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: creationTimestamp: "2024-09-25T21:11:40Z" labels: spark.operator/name: spark-kubernetes-operator spark.operator/spark-cluster-name: cluster-with-hpa-template name: cluster-with-hpa-template-worker-hpa namespace: default ... spec: behavior: scaleDown: policies: - periodSeconds: 1200 type: Pods value: 1 selectPolicy: Max scaleUp: policies: - periodSeconds: 10 type: Pods value: 1 selectPolicy: Max stabilizationWindowSeconds: 0 maxReplicas: 2 metrics: - resource: name: cpu target: averageUtilization: 10 type: Utilization type: Resource minReplicas: 1 ... ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #137 from dongjoon-hyun/SPARK-49790. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- examples/cluster-with-hpa-template.yaml| 64 + .../apache/spark/k8s/operator/spec/WorkerSpec.java | 2 + .../k8s/operator/SparkClusterResourceSpec.java | 66 +- 3 files changed, 104 insertions(+), 28 deletions(-) diff --git a/examples/cluster-with-hpa-template.yaml b/examples/cluster-with-hpa-template.yaml new file mode 100644 index 000..cee5b18 --- /dev/null +++ b/examples/cluster-with-hpa-template.yaml @@ -0,0 +1,64 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +apiVersion: spark.apache.org/v1alpha1 +kind: SparkCluster +metadata: + name: cluster-with-hpa-template +spec: + runtimeVersions: +sparkVersion: "4.0.0-preview2&quo
(spark) branch master updated: [SPARK-49775][SQL][FOLLOW-UP] Use SortedSet instead of Array with sorting
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 09209f0ff503 [SPARK-49775][SQL][FOLLOW-UP] Use SortedSet instead of Array with sorting 09209f0ff503 is described below commit 09209f0ff503b29f9da92ba7db8aa820c03b3c0f Author: Hyukjin Kwon AuthorDate: Wed Sep 25 07:57:08 2024 -0700 [SPARK-49775][SQL][FOLLOW-UP] Use SortedSet instead of Array with sorting ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/48235 that addresses https://github.com/apache/spark/pull/48235#discussion_r1775020195 comment. ### Why are the changes needed? For better performance (in theory) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests should verify them ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48245 from HyukjinKwon/SPARK-49775-followup. Authored-by: Hyukjin Kwon Signed-off-by: Dongjoon Hyun --- .../scala/org/apache/spark/sql/catalyst/util/CharsetProvider.scala| 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CharsetProvider.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CharsetProvider.scala index d85673f2ce81..f805d2ed87b5 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CharsetProvider.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CharsetProvider.scala @@ -18,13 +18,15 @@ import java.nio.charset.{Charset, CharsetDecoder, CharsetEncoder, CodingErrorAction, IllegalCharsetNameException, UnsupportedCharsetException} import java.util.Locale + import scala.collection.SortedSet + import org.apache.spark.sql.errors.QueryExecutionErrors import org.apache.spark.sql.internal.SQLConf private[sql] object CharsetProvider { final lazy val VALID_CHARSETS = -Array("us-ascii", "iso-8859-1", "utf-8", "utf-16be", "utf-16le", "utf-16", "utf-32").sorted +SortedSet("us-ascii", "iso-8859-1", "utf-8", "utf-16be", "utf-16le", "utf-16", "utf-32") def forName( charset: String, - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-49731][K8S] Support K8s volume `mount.subPathExpr` and `hostPath` volume `type`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 1f2e7b87db76 [SPARK-49731][K8S] Support K8s volume `mount.subPathExpr` and `hostPath` volume `type` 1f2e7b87db76 is described below commit 1f2e7b87db76ef60eded8a6db09f6690238471ce Author: Enrico Minack AuthorDate: Wed Sep 25 07:53:12 2024 -0700 [SPARK-49731][K8S] Support K8s volume `mount.subPathExpr` and `hostPath` volume `type` ### What changes were proposed in this pull request? Add the following config options: - `spark.kubernetes.executor.volumes.[VolumeType].[VolumeName].mount.subPathExpr` - `spark.kubernetes.executor.volumes.hostPath.[VolumeName].options.type` ### Why are the changes needed? K8s Spec - https://kubernetes.io/docs/concepts/storage/volumes/#hostpath-volume-types - https://kubernetes.io/docs/concepts/storage/volumes/#using-subpath-expanded-environment These are natural extensions of the existing options - `spark.kubernetes.executor.volumes.[VolumeType].[VolumeName].mount.subPath` - `spark.kubernetes.executor.volumes.hostPath.[VolumeName].options.path` ### Does this PR introduce _any_ user-facing change? Above config options. ### How was this patch tested? Unit tests ### Was this patch authored or co-authored using generative AI tooling? No Closes #48181 from EnricoMi/k8s-volume-options. Authored-by: Enrico Minack Signed-off-by: Dongjoon Hyun --- .../scala/org/apache/spark/deploy/k8s/Config.scala | 2 + .../spark/deploy/k8s/KubernetesVolumeSpec.scala| 3 +- .../spark/deploy/k8s/KubernetesVolumeUtils.scala | 18 +- .../k8s/features/MountVolumesFeatureStep.scala | 6 +- .../spark/deploy/k8s/KubernetesTestConf.scala | 11 +++- .../deploy/k8s/KubernetesVolumeUtilsSuite.scala| 42 - .../k8s/features/LocalDirsFeatureStepSuite.scala | 3 +- .../features/MountVolumesFeatureStepSuite.scala| 72 +- 8 files changed, 144 insertions(+), 13 deletions(-) diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala index 3a4d68c19014..9c50f8ddb00c 100644 --- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala +++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala @@ -769,8 +769,10 @@ private[spark] object Config extends Logging { val KUBERNETES_VOLUMES_NFS_TYPE = "nfs" val KUBERNETES_VOLUMES_MOUNT_PATH_KEY = "mount.path" val KUBERNETES_VOLUMES_MOUNT_SUBPATH_KEY = "mount.subPath" + val KUBERNETES_VOLUMES_MOUNT_SUBPATHEXPR_KEY = "mount.subPathExpr" val KUBERNETES_VOLUMES_MOUNT_READONLY_KEY = "mount.readOnly" val KUBERNETES_VOLUMES_OPTIONS_PATH_KEY = "options.path" + val KUBERNETES_VOLUMES_OPTIONS_TYPE_KEY = "options.type" val KUBERNETES_VOLUMES_OPTIONS_CLAIM_NAME_KEY = "options.claimName" val KUBERNETES_VOLUMES_OPTIONS_CLAIM_STORAGE_CLASS_KEY = "options.storageClass" val KUBERNETES_VOLUMES_OPTIONS_MEDIUM_KEY = "options.medium" diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesVolumeSpec.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesVolumeSpec.scala index 9dfd40a773eb..b4fe414e3cde 100644 --- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesVolumeSpec.scala +++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesVolumeSpec.scala @@ -18,7 +18,7 @@ package org.apache.spark.deploy.k8s private[spark] sealed trait KubernetesVolumeSpecificConf -private[spark] case class KubernetesHostPathVolumeConf(hostPath: String) +private[spark] case class KubernetesHostPathVolumeConf(hostPath: String, volumeType: String) extends KubernetesVolumeSpecificConf private[spark] case class KubernetesPVCVolumeConf( @@ -42,5 +42,6 @@ private[spark] case class KubernetesVolumeSpec( volumeName: String, mountPath: String, mountSubPath: String, +mountSubPathExpr: String, mountReadOnly: Boolean, volumeConf: KubernetesVolumeSpecificConf) diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesVolumeUtils.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesVolumeUtils.scala index 6463512c0114..88bb998d88b7 100644 --- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesVolumeUtils.scala +++ b/resource-mana
(spark) branch master updated: [SPARK-49746][BUILD] Upgrade Scala to 2.13.15
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 983f6f434af3 [SPARK-49746][BUILD] Upgrade Scala to 2.13.15 983f6f434af3 is described below commit 983f6f434af335b9270a0748dc5b4b18c7dc4846 Author: panbingkun AuthorDate: Wed Sep 25 07:50:20 2024 -0700 [SPARK-49746][BUILD] Upgrade Scala to 2.13.15 ### What changes were proposed in this pull request? The pr aims to upgrade `scala` from `2.13.14` to `2.13.15`. ### Why are the changes needed? https://contributors.scala-lang.org/t/scala-2-13-15-release-planning/6649 https://github.com/user-attachments/assets/277cfdb4-8542-42fe-86e5-ad72ca2bba4c";> **Note: since 2.13.15, "-Wconf:cat=deprecation:wv,any:e" no longer takes effect and needs to be changed to "-Wconf:any:e", "-Wconf:cat=deprecation:wv", please refer to the details: https://github.com/scala/scala/pull/10708** ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48192 from panbingkun/SPARK-49746. Lead-authored-by: panbingkun Co-authored-by: YangJie Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 8 docs/_config.yml | 2 +- pom.xml | 7 --- project/SparkBuild.scala | 6 +- 4 files changed, 14 insertions(+), 9 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 88526995293f..19b8a237d30a 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -144,7 +144,7 @@ jetty-util-ajax/11.0.23//jetty-util-ajax-11.0.23.jar jetty-util/11.0.23//jetty-util-11.0.23.jar jjwt-api/0.12.6//jjwt-api-0.12.6.jar jline/2.14.6//jline-2.14.6.jar -jline/3.25.1//jline-3.25.1.jar +jline/3.26.3//jline-3.26.3.jar jna/5.14.0//jna-5.14.0.jar joda-time/2.13.0//joda-time-2.13.0.jar jodd-core/3.5.2//jodd-core-3.5.2.jar @@ -252,11 +252,11 @@ py4j/0.10.9.7//py4j-0.10.9.7.jar remotetea-oncrpc/1.1.2//remotetea-oncrpc-1.1.2.jar rocksdbjni/9.5.2//rocksdbjni-9.5.2.jar scala-collection-compat_2.13/2.7.0//scala-collection-compat_2.13-2.7.0.jar -scala-compiler/2.13.14//scala-compiler-2.13.14.jar -scala-library/2.13.14//scala-library-2.13.14.jar +scala-compiler/2.13.15//scala-compiler-2.13.15.jar +scala-library/2.13.15//scala-library-2.13.15.jar scala-parallel-collections_2.13/1.0.4//scala-parallel-collections_2.13-1.0.4.jar scala-parser-combinators_2.13/2.4.0//scala-parser-combinators_2.13-2.4.0.jar -scala-reflect/2.13.14//scala-reflect-2.13.14.jar +scala-reflect/2.13.15//scala-reflect-2.13.15.jar scala-xml_2.13/2.3.0//scala-xml_2.13-2.3.0.jar slf4j-api/2.0.16//slf4j-api-2.0.16.jar snakeyaml-engine/2.7//snakeyaml-engine-2.7.jar diff --git a/docs/_config.yml b/docs/_config.yml index e74eda047041..089d6bf2097b 100644 --- a/docs/_config.yml +++ b/docs/_config.yml @@ -22,7 +22,7 @@ include: SPARK_VERSION: 4.0.0-SNAPSHOT SPARK_VERSION_SHORT: 4.0.0 SCALA_BINARY_VERSION: "2.13" -SCALA_VERSION: "2.13.14" +SCALA_VERSION: "2.13.15" SPARK_ISSUE_TRACKER_URL: https://issues.apache.org/jira/browse/SPARK SPARK_GITHUB_URL: https://github.com/apache/spark # Before a new release, we should: diff --git a/pom.xml b/pom.xml index 131e754da815..f3dc92426ac4 100644 --- a/pom.xml +++ b/pom.xml @@ -169,7 +169,7 @@ 3.2.2 4.4 -2.13.14 +2.13.15 2.13 2.2.0 4.9.1 @@ -226,7 +226,7 @@ and ./python/packaging/connect/setup.py too. --> 17.0.0 -3.0.0-M2 +3.0.0 0.12.6 @@ -3051,7 +3051,8 @@ -explaintypes -release 17 - -Wconf:cat=deprecation:wv,any:e + -Wconf:any:e + -Wconf:cat=deprecation:wv -Wunused:imports -Wconf:cat=scaladoc:wv -Wconf:msg=^(?=.*?method|value|type|object|trait|inheritance)(?=.*?deprecated)(?=.*?since 2.13).+$:e diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala index 2f390cb70baa..82950fb30287 100644 --- a/project/SparkBuild.scala +++ b/project/SparkBuild.scala @@ -234,7 +234,11 @@ object SparkBuild extends PomBuild { // replace -Xfatal-warnings with fine-grained configuration, since 2.13.2 // verbose warning on deprecation, error on all others // see `scalac -Wconf:help` for details -"-Wconf:cat=deprecation:wv,any:e", +// since 2.13.15, "-Wconf:cat=deprecation:wv,any:e" no longer takes effect and needs to +// be changed to "-Wconf:any:e&q
(spark-kubernetes-operator) branch main updated: [SPARK-49778] Remove (master|worker) prefix from field names of `(Master|Worker)Spec`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git The following commit(s) were added to refs/heads/main by this push: new 52b9aef [SPARK-49778] Remove (master|worker) prefix from field names of `(Master|Worker)Spec` 52b9aef is described below commit 52b9aef3706581c4a2b0def74e4f16b133a5463b Author: Dongjoon Hyun AuthorDate: Wed Sep 25 00:04:35 2024 -0700 [SPARK-49778] Remove (master|worker) prefix from field names of `(Master|Worker)Spec` ### What changes were proposed in this pull request? This PR aims to remove the redundant `master` or `worker` prefixes from field names of `MasterSpec` and `WorkerSpec`. For example, ``` - workerSpec.workerStatefulSetSpec + workerSpec.statefulSetSpec ``` ### Why are the changes needed? To simplify `MasterSpec` and `WorkerSpec` by removing repetitions. ### Does this PR introduce _any_ user-facing change? No, this is not released yet. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #136 from dongjoon-hyun/SPARK-49778. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- examples/cluster-with-hpa.yaml | 2 +- examples/cluster-with-template.yaml| 6 ++-- .../apache/spark/k8s/operator/spec/MasterSpec.java | 8 +++--- .../apache/spark/k8s/operator/spec/WorkerSpec.java | 8 +++--- .../k8s/operator/SparkClusterResourceSpec.java | 18 .../k8s/operator/SparkClusterResourceSpecTest.java | 32 +++--- 6 files changed, 34 insertions(+), 40 deletions(-) diff --git a/examples/cluster-with-hpa.yaml b/examples/cluster-with-hpa.yaml index a91384a..ca82088 100644 --- a/examples/cluster-with-hpa.yaml +++ b/examples/cluster-with-hpa.yaml @@ -25,7 +25,7 @@ spec: minWorkers: 1 maxWorkers: 3 workerSpec: -workerStatefulSetSpec: +statefulSetSpec: template: spec: containers: diff --git a/examples/cluster-with-template.yaml b/examples/cluster-with-template.yaml index d9e12a2..66c6516 100644 --- a/examples/cluster-with-template.yaml +++ b/examples/cluster-with-template.yaml @@ -56,10 +56,10 @@ spec: annotations: customAnnotation: "svc1" workerSpec: -workerStatefulSetMetadata: +statefulSetMetadata: annotations: customAnnotation: "annotation" -workerStatefulSetSpec: +statefulSetSpec: template: spec: priorityClassName: system-cluster-critical @@ -83,7 +83,7 @@ spec: limits: cpu: "0.1" memory: "10Mi" -workerServiceMetadata: +serviceMetadata: annotations: customAnnotation: "annotation" sparkConf: diff --git a/spark-operator-api/src/main/java/org/apache/spark/k8s/operator/spec/MasterSpec.java b/spark-operator-api/src/main/java/org/apache/spark/k8s/operator/spec/MasterSpec.java index 7becfc0..c04a2be 100644 --- a/spark-operator-api/src/main/java/org/apache/spark/k8s/operator/spec/MasterSpec.java +++ b/spark-operator-api/src/main/java/org/apache/spark/k8s/operator/spec/MasterSpec.java @@ -34,8 +34,8 @@ import lombok.NoArgsConstructor; @Builder @JsonInclude(JsonInclude.Include.NON_NULL) public class MasterSpec { - protected StatefulSetSpec masterStatefulSetSpec; - protected ObjectMeta masterStatefulSetMetadata; - protected ServiceSpec masterServiceSpec; - protected ObjectMeta masterServiceMetadata; + protected StatefulSetSpec statefulSetSpec; + protected ObjectMeta statefulSetMetadata; + protected ServiceSpec serviceSpec; + protected ObjectMeta serviceMetadata; } diff --git a/spark-operator-api/src/main/java/org/apache/spark/k8s/operator/spec/WorkerSpec.java b/spark-operator-api/src/main/java/org/apache/spark/k8s/operator/spec/WorkerSpec.java index 2c5beb1..04f5abe 100644 --- a/spark-operator-api/src/main/java/org/apache/spark/k8s/operator/spec/WorkerSpec.java +++ b/spark-operator-api/src/main/java/org/apache/spark/k8s/operator/spec/WorkerSpec.java @@ -34,8 +34,8 @@ import lombok.NoArgsConstructor; @Builder @JsonInclude(JsonInclude.Include.NON_NULL) public class WorkerSpec { - protected StatefulSetSpec workerStatefulSetSpec; - protected ObjectMeta workerStatefulSetMetadata; - protected ServiceSpec workerServiceSpec; - protected ObjectMeta workerServiceMetadata; + protected StatefulSetSpec statefulSetSpec; + protected ObjectMeta statefulSetMetadata; + protected ServiceSpec serviceSpec; + protected ObjectMeta serviceMetadata; } diff --git a/spark-submission-worker/src/main/java/org/apache/spark/k8s/operator/SparkClusterResourceSpec.java b/spark-submission-
(spark-website) branch asf-site updated: Update` latest` to 3.5.3 (#559)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 84265de833 Update` latest` to 3.5.3 (#559) 84265de833 is described below commit 84265de83327c94b4aef65b735cb023ea1542bab Author: Haejoon Lee AuthorDate: Wed Sep 25 05:14:57 2024 +0900 Update` latest` to 3.5.3 (#559) --- site/docs/latest | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/site/docs/latest b/site/docs/latest index 80d13b7d9b..678fd88a33 12 --- a/site/docs/latest +++ b/site/docs/latest @@ -1 +1 @@ -3.5.2 \ No newline at end of file +3.5.3 \ No newline at end of file - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-49713][PYTHON][FOLLOWUP] Make function `count_min_sketch` accept long seed
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 55d0233d19cc [SPARK-49713][PYTHON][FOLLOWUP] Make function `count_min_sketch` accept long seed 55d0233d19cc is described below commit 55d0233d19cc52bee91a9619057d9b6f33165a0a Author: Ruifeng Zheng AuthorDate: Tue Sep 24 07:48:23 2024 -0700 [SPARK-49713][PYTHON][FOLLOWUP] Make function `count_min_sketch` accept long seed ### What changes were proposed in this pull request? Make function `count_min_sketch` accept long seed ### Why are the changes needed? existing implementation only accepts int seed, which is inconsistent with other `ExpressionWithRandomSeed`: ```py In [3]: >>> from pyspark.sql import functions as sf ...: >>> spark.range(100).select( ...: ... sf.hex(sf.count_min_sketch("id", sf.lit(1.5), 0.6, 111)) ...: ... ).show(truncate=False) ... AnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "count_min_sketch(id, 1.5, 0.6, 111)" due to data type mismatch: The 4th parameter requires the "INT" type, however "111" has the type "BIGINT". SQLSTATE: 42K09; 'Aggregate [unresolvedalias('hex(count_min_sketch(id#64L, 1.5, 0.6, 111, 0, 0)))] +- Range (0, 100, step=1, splits=Some(12)) ... ``` ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? added doctest ### Was this patch authored or co-authored using generative AI tooling? no Closes #48223 from zhengruifeng/count_min_sk_long_seed. Authored-by: Ruifeng Zheng Signed-off-by: Dongjoon Hyun --- python/pyspark/sql/connect/functions/builtin.py| 3 +-- python/pyspark/sql/functions/builtin.py| 14 +- .../src/main/scala/org/apache/spark/sql/functions.scala| 2 +- .../catalyst/expressions/aggregate/CountMinSketchAgg.scala | 8 ++-- 4 files changed, 21 insertions(+), 6 deletions(-) diff --git a/python/pyspark/sql/connect/functions/builtin.py b/python/pyspark/sql/connect/functions/builtin.py index 2a39bc6bfddd..6953230f5b42 100644 --- a/python/pyspark/sql/connect/functions/builtin.py +++ b/python/pyspark/sql/connect/functions/builtin.py @@ -70,7 +70,6 @@ from pyspark.sql.types import ( StringType, ) from pyspark.sql.utils import enum_to_value as _enum_to_value -from pyspark.util import JVM_INT_MAX # The implementation of pandas_udf is embedded in pyspark.sql.function.pandas_udf # for code reuse. @@ -1130,7 +1129,7 @@ def count_min_sketch( confidence: Union[Column, float], seed: Optional[Union[Column, int]] = None, ) -> Column: -_seed = lit(random.randint(0, JVM_INT_MAX)) if seed is None else lit(seed) +_seed = lit(random.randint(0, sys.maxsize)) if seed is None else lit(seed) return _invoke_function_over_columns("count_min_sketch", col, lit(eps), lit(confidence), _seed) diff --git a/python/pyspark/sql/functions/builtin.py b/python/pyspark/sql/functions/builtin.py index 2688f9daa23a..09a286fe7c94 100644 --- a/python/pyspark/sql/functions/builtin.py +++ b/python/pyspark/sql/functions/builtin.py @@ -6080,7 +6080,19 @@ def count_min_sketch( |00010064000100025D96391C00320032| ++ -Example 3: Using a random seed +Example 3: Using a long seed + +>>> from pyspark.sql import functions as sf +>>> spark.range(100).select( +... sf.hex(sf.count_min_sketch("id", sf.lit(1.5), 0.2, 111)) +... ).show(truncate=False) + ++ +|hex(count_min_sketch(id, 1.5, 0.2, 111)) | + ++ + |000100640001000244078BA100320032| + ++ + +Example 4: Using a random seed >>> from pyspark.sql import functions as sf >>> spark.range(100).select( diff --git a/sql/api/src/main/scala/org/apache/spark/sql/functions.scala b/sql/api/src/main/scala/org/apache/spark/sql/functions.scala index d9bceabe88f8..ab69789c75f5 100644 --- a/sql/api/src/main/scala/org/apache/spark/sql/functions.scala +++ b/sql/api/src/main/scala/org/apache/
(spark) branch branch-3.4 updated: [SPARK-49750][DOC] Mention delegation token support in K8s mode
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 2256141cbb45 [SPARK-49750][DOC] Mention delegation token support in K8s mode 2256141cbb45 is described below commit 2256141cbb455bd38b8676104f51207cc30a Author: Cheng Pan AuthorDate: Tue Sep 24 07:40:58 2024 -0700 [SPARK-49750][DOC] Mention delegation token support in K8s mode Update docs to mention delegation token support in K8s mode. The delegation token support in K8s mode has been implemented since 3.0.0 via SPARK-23257. Yes, docs are updated. Review. No. Closes #48199 from pan3793/SPARK-49750. Authored-by: Cheng Pan Signed-off-by: Dongjoon Hyun (cherry picked from commit dedf5aa91827f32736ce5dae2eb123ba4e244c3b) Signed-off-by: Dongjoon Hyun (cherry picked from commit b513297f661bf314bcb47033f408810b14ea39b8) Signed-off-by: Dongjoon Hyun --- docs/security.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/security.md b/docs/security.md index 2c61a64c36a6..1e694d53ff5a 100644 --- a/docs/security.md +++ b/docs/security.md @@ -814,7 +814,7 @@ mechanism (see `java.util.ServiceLoader`). Implementations of `org.apache.spark.security.HadoopDelegationTokenProvider` can be made available to Spark by listing their names in the corresponding file in the jar's `META-INF/services` directory. -Delegation token support is currently only supported in YARN and Mesos modes. Consult the +Delegation token support is currently only supported in YARN, Kubernetes and Mesos modes. Consult the deployment-specific page for more information. The following options provides finer-grained control for this feature: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-49750][DOC] Mention delegation token support in K8s mode
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new b513297f661b [SPARK-49750][DOC] Mention delegation token support in K8s mode b513297f661b is described below commit b513297f661bf314bcb47033f408810b14ea39b8 Author: Cheng Pan AuthorDate: Tue Sep 24 07:40:58 2024 -0700 [SPARK-49750][DOC] Mention delegation token support in K8s mode Update docs to mention delegation token support in K8s mode. The delegation token support in K8s mode has been implemented since 3.0.0 via SPARK-23257. Yes, docs are updated. Review. No. Closes #48199 from pan3793/SPARK-49750. Authored-by: Cheng Pan Signed-off-by: Dongjoon Hyun (cherry picked from commit dedf5aa91827f32736ce5dae2eb123ba4e244c3b) Signed-off-by: Dongjoon Hyun --- docs/security.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/security.md b/docs/security.md index 10201e6ed540..e6ef9ea584a1 100644 --- a/docs/security.md +++ b/docs/security.md @@ -840,7 +840,7 @@ mechanism (see `java.util.ServiceLoader`). Implementations of `org.apache.spark.security.HadoopDelegationTokenProvider` can be made available to Spark by listing their names in the corresponding file in the jar's `META-INF/services` directory. -Delegation token support is currently only supported in YARN and Mesos modes. Consult the +Delegation token support is currently only supported in YARN, Kubernetes and Mesos modes. Consult the deployment-specific page for more information. The following options provides finer-grained control for this feature: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-49750][DOC] Mention delegation token support in K8s mode
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new dedf5aa91827 [SPARK-49750][DOC] Mention delegation token support in K8s mode dedf5aa91827 is described below commit dedf5aa91827f32736ce5dae2eb123ba4e244c3b Author: Cheng Pan AuthorDate: Tue Sep 24 07:40:58 2024 -0700 [SPARK-49750][DOC] Mention delegation token support in K8s mode ### What changes were proposed in this pull request? Update docs to mention delegation token support in K8s mode. ### Why are the changes needed? The delegation token support in K8s mode has been implemented since 3.0.0 via SPARK-23257. ### Does this PR introduce _any_ user-facing change? Yes, docs are updated. ### How was this patch tested? Review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48199 from pan3793/SPARK-49750. Authored-by: Cheng Pan Signed-off-by: Dongjoon Hyun --- docs/security.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/security.md b/docs/security.md index b97abfeacf24..c7d3fd5f8c36 100644 --- a/docs/security.md +++ b/docs/security.md @@ -947,7 +947,7 @@ mechanism (see `java.util.ServiceLoader`). Implementations of `org.apache.spark.security.HadoopDelegationTokenProvider` can be made available to Spark by listing their names in the corresponding file in the jar's `META-INF/services` directory. -Delegation token support is currently only supported in YARN mode. Consult the +Delegation token support is currently only supported in YARN and Kubernetes mode. Consult the deployment-specific page for more information. The following options provides finer-grained control for this feature: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-49753][BUILD] Upgrade ZSTD-JNI to 1.5.6-6
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 438a6e7782ec [SPARK-49753][BUILD] Upgrade ZSTD-JNI to 1.5.6-6 438a6e7782ec is described below commit 438a6e7782ece23492928cfbb2d01e14104dfd9a Author: yangjie01 AuthorDate: Mon Sep 23 21:39:27 2024 -0700 [SPARK-49753][BUILD] Upgrade ZSTD-JNI to 1.5.6-6 ### What changes were proposed in this pull request? The pr aims to upgrade `zstd-jni` from `1.5.6-5` to `1.5.6-6`. ### Why are the changes needed? The new version allow including compression level when training a dictionary: https://github.com/luben/zstd-jni/commit/3ca26eed6c84fb09c382854ead527188e643e206#diff-bd5c0f62db7cb85cac88c7b6cfad1c0e5e2f433ba45097761654829627b7a31c All changes in the new version are as follows: - https://github.com/luben/zstd-jni/compare/v1.5.6-5...v1.5.6-6 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48204 from LuciferYang/zstd-jni-1.5.6-6. Lead-authored-by: yangjie01 Co-authored-by: YangJie Signed-off-by: Dongjoon Hyun --- .../ZStandardBenchmark-jdk21-results.txt | 56 +++--- core/benchmarks/ZStandardBenchmark-results.txt | 56 +++--- dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- pom.xml| 2 +- 4 files changed, 58 insertions(+), 58 deletions(-) diff --git a/core/benchmarks/ZStandardBenchmark-jdk21-results.txt b/core/benchmarks/ZStandardBenchmark-jdk21-results.txt index b3bffea826e5..f6bd681451d5 100644 --- a/core/benchmarks/ZStandardBenchmark-jdk21-results.txt +++ b/core/benchmarks/ZStandardBenchmark-jdk21-results.txt @@ -2,48 +2,48 @@ Benchmark ZStandardCompressionCodec -OpenJDK 64-Bit Server VM 21.0.4+7-LTS on Linux 6.5.0-1025-azure +OpenJDK 64-Bit Server VM 21.0.4+7-LTS on Linux 6.8.0-1014-azure AMD EPYC 7763 64-Core Processor Benchmark ZStandardCompressionCodec:Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -- -Compression 1 times at level 1 without buffer pool657 670 14 0.0 65699.2 1.0X -Compression 1 times at level 2 without buffer pool697 697 1 0.0 69673.4 0.9X -Compression 1 times at level 3 without buffer pool799 802 3 0.0 79855.2 0.8X -Compression 1 times at level 1 with buffer pool 593 595 1 0.0 59326.9 1.1X -Compression 1 times at level 2 with buffer pool 622 624 3 0.0 62194.1 1.1X -Compression 1 times at level 3 with buffer pool 732 733 1 0.0 73178.6 0.9X +Compression 1 times at level 1 without buffer pool659 676 16 0.0 65860.7 1.0X +Compression 1 times at level 2 without buffer pool721 723 2 0.0 72135.5 0.9X +Compression 1 times at level 3 without buffer pool815 816 1 0.0 81500.6 0.8X +Compression 1 times at level 1 with buffer pool 608 609 0 0.0 60846.6 1.1X +Compression 1 times at level 2 with buffer pool 645 647 3 0.0 64476.3 1.0X +Compression 1 times at level 3 with buffer pool 746 746 1 0.0 74584.0 0.9X -OpenJDK 64-Bit Server VM 21.0.4+7-LTS on Linux 6.5.0-1025-azure +OpenJDK 64-Bit Server VM 21.0.4+7-LTS on Linux 6.8.0-1014-azure AMD EPYC 7763 64-Core Processor Benchmark ZStandardCompressionCodec:Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -- -Decompression 1 times from level 1 without buffer pool813 820 11 0.0 81273.2 1.0X -Decompression 1 times from level 2 without buffer pool810 813 3 0.0 80986.2
(spark) branch branch-3.4 updated: [SPARK-49760][YARN] Correct handling of `SPARK_USER` env variable override in app master
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new e825ac3c272e [SPARK-49760][YARN] Correct handling of `SPARK_USER` env variable override in app master e825ac3c272e is described below commit e825ac3c272e65083aa4d06648e7cda16c04aa5e Author: Chris Nauroth AuthorDate: Mon Sep 23 21:36:48 2024 -0700 [SPARK-49760][YARN] Correct handling of `SPARK_USER` env variable override in app master ### What changes were proposed in this pull request? This patch corrects handling of a user-supplied `SPARK_USER` environment variable in the YARN app master. Currently, the user-supplied value gets appended to the default, like a classpath entry. The patch fixes it by using only the user-supplied value. ### Why are the changes needed? Overriding the `SPARK_USER` environment variable in the YARN app master with configuration property `spark.yarn.appMasterEnv.SPARK_USER` currently results in an incorrect value. `Client#setupLaunchEnv` first sets a default in the environment map using the Hadoop user. After that, `YarnSparkHadoopUtil.addPathToEnvironment` sees the existing value in the map and interprets the user-supplied value as needing to be appended like a classpath entry. The end result is the Hadoop user appende [...] ### Does this PR introduce _any_ user-facing change? Yes, the app master now uses the user-supplied `SPARK_USER` if specified. (The default is still the Hadoop user.) ### How was this patch tested? * Existing unit tests pass. * Added new unit tests covering default and overridden `SPARK_USER` for the app master. The override test fails without this patch, and then passes after the patch is applied. * Manually tested in a live YARN cluster as shown below. Manual testing used the `DFSReadWriteTest` job with overrides of `SPARK_USER`: ``` spark-submit \ --deploy-mode cluster \ --files all-lines.txt \ --class org.apache.spark.examples.DFSReadWriteTest \ --conf spark.yarn.appMasterEnv.SPARK_USER=sparkuser_appMaster \ --conf spark.driverEnv.SPARK_USER=sparkuser_driver \ --conf spark.executorEnv.SPARK_USER=sparkuser_executor \ /usr/lib/spark/examples/jars/spark-examples.jar \ all-lines.txt /tmp/DFSReadWriteTest ``` Before the patch, we can see the app master's `SPARK_USER` mishandled by looking at the `_SUCCESS` file in HDFS: ``` hdfs dfs -ls -R /tmp/DFSReadWriteTest drwxr-xr-x - cnauroth:sparkuser_appMaster hadoop 0 2024-09-20 23:35 /tmp/DFSReadWriteTest/dfs_read_write_test -rw-r--r-- 1 cnauroth:sparkuser_appMaster hadoop 0 2024-09-20 23:35 /tmp/DFSReadWriteTest/dfs_read_write_test/_SUCCESS -rw-r--r-- 1 sparkuser_executor hadoop2295080 2024-09-20 23:35 /tmp/DFSReadWriteTest/dfs_read_write_test/part-0 -rw-r--r-- 1 sparkuser_executor hadoop2288718 2024-09-20 23:35 /tmp/DFSReadWriteTest/dfs_read_write_test/part-1 ``` After the patch, we can see it working correctly: ``` hdfs dfs -ls -R /tmp/DFSReadWriteTest drwxr-xr-x - sparkuser_appMaster hadoop 0 2024-09-23 17:13 /tmp/DFSReadWriteTest/dfs_read_write_test -rw-r--r-- 1 sparkuser_appMaster hadoop 0 2024-09-23 17:13 /tmp/DFSReadWriteTest/dfs_read_write_test/_SUCCESS -rw-r--r-- 1 sparkuser_executor hadoop2295080 2024-09-23 17:13 /tmp/DFSReadWriteTest/dfs_read_write_test/part-0 -rw-r--r-- 1 sparkuser_executor hadoop2288718 2024-09-23 17:13 /tmp/DFSReadWriteTest/dfs_read_write_test/part-1 ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48216 from cnauroth/SPARK-49760-branch-3.5. Authored-by: Chris Nauroth Signed-off-by: Dongjoon Hyun (cherry picked from commit dd76a82734564afeed4225d19331243f7b926ae8) Signed-off-by: Dongjoon Hyun --- .../main/scala/org/apache/spark/deploy/yarn/Client.scala | 7 +-- .../scala/org/apache/spark/deploy/yarn/ClientSuite.scala | 16 2 files changed, 21 insertions(+), 2 deletions(-) diff --git a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala index d8e9cd8b47d8..09fc5b7a0caa 100644 --- a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala +++ b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala @@ -904,14 +904,13 @@ private[spark] class Client( /** * Set up the environment for launching our ApplicationMaster container. */ - pr
(spark) branch branch-3.5 updated (e7ca790ed4f0 -> dd76a8273456)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git from e7ca790ed4f0 [SPARK-49699][SS] Disable PruneFilters for streaming workloads add dd76a8273456 [SPARK-49760][YARN] Correct handling of `SPARK_USER` env variable override in app master No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/deploy/yarn/Client.scala | 7 +-- .../scala/org/apache/spark/deploy/yarn/ClientSuite.scala | 16 2 files changed, 21 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-49760][YARN] Correct handling of `SPARK_USER` env variable override in app master
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 35e5d290deee [SPARK-49760][YARN] Correct handling of `SPARK_USER` env variable override in app master 35e5d290deee is described below commit 35e5d290deee9cf2a913571407e2257217e0e9e2 Author: Chris Nauroth AuthorDate: Mon Sep 23 21:35:32 2024 -0700 [SPARK-49760][YARN] Correct handling of `SPARK_USER` env variable override in app master ### What changes were proposed in this pull request? This patch corrects handling of a user-supplied `SPARK_USER` environment variable in the YARN app master. Currently, the user-supplied value gets appended to the default, like a classpath entry. The patch fixes it by using only the user-supplied value. ### Why are the changes needed? Overriding the `SPARK_USER` environment variable in the YARN app master with configuration property `spark.yarn.appMasterEnv.SPARK_USER` currently results in an incorrect value. `Client#setupLaunchEnv` first sets a default in the environment map using the Hadoop user. After that, `YarnSparkHadoopUtil.addPathToEnvironment` sees the existing value in the map and interprets the user-supplied value as needing to be appended like a classpath entry. The end result is the Hadoop user appende [...] ### Does this PR introduce _any_ user-facing change? Yes, the app master now uses the user-supplied `SPARK_USER` if specified. (The default is still the Hadoop user.) ### How was this patch tested? * Existing unit tests pass. * Added new unit tests covering default and overridden `SPARK_USER` for the app master. The override test fails without this patch, and then passes after the patch is applied. * Manually tested in a live YARN cluster as shown below. Manual testing used the `DFSReadWriteTest` job with overrides of `SPARK_USER`: ``` spark-submit \ --deploy-mode cluster \ --files all-lines.txt \ --class org.apache.spark.examples.DFSReadWriteTest \ --conf spark.yarn.appMasterEnv.SPARK_USER=sparkuser_appMaster \ --conf spark.driverEnv.SPARK_USER=sparkuser_driver \ --conf spark.executorEnv.SPARK_USER=sparkuser_executor \ /usr/lib/spark/examples/jars/spark-examples.jar \ all-lines.txt /tmp/DFSReadWriteTest ``` Before the patch, we can see the app master's `SPARK_USER` mishandled by looking at the `_SUCCESS` file in HDFS: ``` hdfs dfs -ls -R /tmp/DFSReadWriteTest drwxr-xr-x - cnauroth:sparkuser_appMaster hadoop 0 2024-09-20 23:35 /tmp/DFSReadWriteTest/dfs_read_write_test -rw-r--r-- 1 cnauroth:sparkuser_appMaster hadoop 0 2024-09-20 23:35 /tmp/DFSReadWriteTest/dfs_read_write_test/_SUCCESS -rw-r--r-- 1 sparkuser_executor hadoop2295080 2024-09-20 23:35 /tmp/DFSReadWriteTest/dfs_read_write_test/part-0 -rw-r--r-- 1 sparkuser_executor hadoop2288718 2024-09-20 23:35 /tmp/DFSReadWriteTest/dfs_read_write_test/part-1 ``` After the patch, we can see it working correctly: ``` hdfs dfs -ls -R /tmp/DFSReadWriteTest drwxr-xr-x - sparkuser_appMaster hadoop 0 2024-09-23 17:13 /tmp/DFSReadWriteTest/dfs_read_write_test -rw-r--r-- 1 sparkuser_appMaster hadoop 0 2024-09-23 17:13 /tmp/DFSReadWriteTest/dfs_read_write_test/_SUCCESS -rw-r--r-- 1 sparkuser_executor hadoop2295080 2024-09-23 17:13 /tmp/DFSReadWriteTest/dfs_read_write_test/part-0 -rw-r--r-- 1 sparkuser_executor hadoop2288718 2024-09-23 17:13 /tmp/DFSReadWriteTest/dfs_read_write_test/part-1 ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48214 from cnauroth/SPARK-49760. Authored-by: Chris Nauroth Signed-off-by: Dongjoon Hyun --- .../main/scala/org/apache/spark/deploy/yarn/Client.scala | 7 +-- .../scala/org/apache/spark/deploy/yarn/ClientSuite.scala | 16 2 files changed, 21 insertions(+), 2 deletions(-) diff --git a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala index b2c4d97bc7b0..8b621e82afe2 100644 --- a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala +++ b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala @@ -960,14 +960,13 @@ private[spark] class Client( /** * Set up the environment for launching our ApplicationMaster container. */ - private def setupLaunchEnv( + private[yarn] def setupLaunchEnv( stagingDirPath: Path, pySparkArchives: Seq[S
(spark-kubernetes-operator) branch main updated: [SPARK-49754] Support HPA for `SparkCluster`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git The following commit(s) were added to refs/heads/main by this push: new a48c0a3 [SPARK-49754] Support HPA for `SparkCluster` a48c0a3 is described below commit a48c0a311c29b7d2a76d11134b89c72f39fbe38d Author: Dongjoon Hyun AuthorDate: Mon Sep 23 09:58:07 2024 -0700 [SPARK-49754] Support HPA for `SparkCluster` ### What changes were proposed in this pull request? This PR aims to support K8s [Horizontal Pod Autoscaler](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale) for `SparkCluster`. ### Why are the changes needed? To allow users more flexible installation on top of static SparkClusters. ### Does this PR introduce _any_ user-facing change? No. This is a new feature and HPA is not created if the minimum number of workers is equal to the maximum number of workers. ### How was this patch tested? Pass the CIs. And, manually create an example cluster and wait until it scales down to the minimum number of workers. ``` $ gradle build buildDockerImage spark-operator-api:relocateGeneratedCRD $ kubectl apply -f examples/cluster-with-hpa.yaml ``` ``` Conditions: │ │ TypeStatus ReasonMessage │ │ -- ----- │ │ AbleToScale TrueReadyForNewScale recommended size matches current size │ │ ScalingActive TrueValidMetricFound the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request) │ │ ScalingLimited TrueTooFewReplicasthe desired replica count is less than the minimum replica count │ │ Events: │ │ TypeReason AgeFrom Message │ │ -- --- │ │ Normal SuccessfulRescale 2m31s horizontal-pod-autoscaler New size: 2; reason: All metrics below target │ │ Normal SuccessfulRescale 91shorizontal-pod-autoscaler New size: 1; reason: All metrics below target ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #135 from dongjoon-hyun/hpa. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .../templates/operator-rbac.yaml | 6 +++ examples/cluster-with-hpa.yaml | 45 + .../k8s/operator/context/SparkClusterContext.java | 7 +++ .../SparkClusterResourceSpecFactory.java | 3 ++ .../reconciler/reconcilesteps/ClusterInitStep.java | 10 .../k8s/operator/SparkClusterResourceSpec.java | 57 ++ .../k8s/operator/SparkClusterResourceSpecTest.java | 31 .../operator/SparkClusterSubmissionWorkerTest.java | 1 + 8 files changed, 160 insertions(+) diff --git a/build-tools/helm/spark-kubernetes-operator/templates/operator-rbac.yaml b/build-tools/helm/spark-kubernetes-operator/templates/operator-rbac.yaml index e9fc7f0..eebbf55 100644 --- a/build-tools/helm/spark-kubernetes-operator/templates/operator-rbac.yaml +++ b/build-tools/helm/spark-kubernetes-operator/templates/operator-rbac.yaml @@ -35,6 +35,12 @@ rules: - statefulsets verbs: - '*' + - apiGroups: + - "autoscaling" +resources: + - horizontalpodautoscalers +verbs: + - '*' - apiGroups: - "spark.apache.org" resources: diff --git a/examples/cluster-with-hpa.yaml b/examples/cluster-with-hpa.yaml new file mode 100644 index 000..a91384a --- /dev/null +++ b/examples/cluster-with-hpa.yaml @@ -0,0 +1,45 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file dist
(spark-kubernetes-operator) branch main updated: [SPARK-49742] Upgrade `README`, examples, tests to use `preview2`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git The following commit(s) were added to refs/heads/main by this push: new 4aaae71 [SPARK-49742] Upgrade `README`, examples, tests to use `preview2` 4aaae71 is described below commit 4aaae715ff784f64c2f35379254255ffbb8dc384 Author: Dongjoon Hyun AuthorDate: Fri Sep 20 16:54:21 2024 -0700 [SPARK-49742] Upgrade `README`, examples, tests to use `preview2` ### What changes were proposed in this pull request? This PR aims to update README, examples, tests to use `Apache Spark 4.0.0-preview2`. ### Why are the changes needed? We can use the launched `SparkApp`s and `SparkCluster`s. - Spark K8s Operator is built with `4.0.0-preview2` already via #133 - Apache Spark 4.0.0-preview2 images are ready via - https://github.com/apache/spark-docker/pull/70 - https://github.com/apache/spark-docker/pull/71 ### Does this PR introduce _any_ user-facing change? No behavior change. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #134 from dongjoon-hyun/SPARK-49742. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- README.md | 6 +++--- examples/cluster-java21.yaml | 4 ++-- examples/cluster-on-yunikorn.yaml | 4 ++-- examples/cluster-with-template.yaml | 4 ++-- examples/pi-java21.yaml | 4 ++-- examples/pi-on-yunikorn.yaml | 4 ++-- examples/pi-scala.yaml| 4 ++-- examples/pi-with-one-pod.yaml | 4 ++-- examples/pi.yaml | 4 ++-- examples/prod-cluster-with-three-workers.yaml | 4 ++-- examples/pyspark-pi.yaml | 4 ++-- examples/qa-cluster-with-one-worker.yaml | 4 ++-- examples/sql.yaml | 4 ++-- .../org/apache/spark/k8s/operator/SparkClusterResourceSpec.java | 2 +- tests/e2e/spark-versions/chainsaw-test.yaml | 8 tests/e2e/state-transition/spark-cluster-example-succeeded.yaml | 6 +++--- tests/e2e/state-transition/spark-example-succeeded.yaml | 6 +++--- tests/e2e/watched-namespaces/spark-example.yaml | 6 +++--- 18 files changed, 41 insertions(+), 41 deletions(-) diff --git a/README.md b/README.md index e306889..8db0ab1 100644 --- a/README.md +++ b/README.md @@ -64,7 +64,7 @@ $ ./examples/submit-pi-to-prod.sh { "action" : "CreateSubmissionResponse", "message" : "Driver successfully submitted as driver-20240821181327-", - "serverSparkVersion" : "4.0.0-preview1", + "serverSparkVersion" : "4.0.0-preview2", "submissionId" : "driver-20240821181327-", "success" : true } @@ -73,7 +73,7 @@ $ curl http://localhost:6066/v1/submissions/status/driver-20240821181327-/ { "action" : "SubmissionStatusResponse", "driverState" : "FINISHED", - "serverSparkVersion" : "4.0.0-preview1", + "serverSparkVersion" : "4.0.0-preview2", "submissionId" : "driver-20240821181327-", "success" : true, "workerHostPort" : "10.1.5.188:42099", @@ -100,7 +100,7 @@ Events: Normal Scheduled 14s yunikorn Successfully assigned default/pi-on-yunikorn-0-driver to node docker-desktop Normal PodBindSuccessful 14s yunikorn Pod default/pi-on-yunikorn-0-driver is successfully bound to node docker-desktop Normal TaskCompleted 6syunikorn Task default/pi-on-yunikorn-0-driver is completed - Normal Pulled 13s kubelet Container image "apache/spark:4.0.0-preview1" already present on machine + Normal Pulled 13s kubelet Container image "apache/spark:4.0.0-preview2" already present on machine Normal Created13s kubelet Created container spark-kubernetes-driver Normal Started13s kubelet Started container spark-kubernetes-driver diff --git a/examples/cluster-java21.yaml b/examples/cluster-java21.yaml index fefe90c..abc4826 100644 --- a/examples/cluster-java21.yaml +++ b/examples/cluster-java21.yaml @@ -18,14 +18,14 @@ metadata: name: cluster-java21 spec: runtimeVersions: -spa
(spark-docker) branch master updated: [SPARK-49740] Update `publish-java17.yaml` and `publish-java21.yaml` to use `preview2` by default
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new 63c0ce3 [SPARK-49740] Update `publish-java17.yaml` and `publish-java21.yaml` to use `preview2` by default 63c0ce3 is described below commit 63c0ce3c0abfc94b591da7bc8cd5f90e02f16464 Author: Dongjoon Hyun AuthorDate: Fri Sep 20 11:15:32 2024 -0700 [SPARK-49740] Update `publish-java17.yaml` and `publish-java21.yaml` to use `preview2` by default ### What changes were proposed in this pull request? This PR aims to update `publish-java17.yaml` and `publish-java21.yaml` to use `preview2` by default. ### Why are the changes needed? To publish the latest images. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual review. Closes #71 from dongjoon-hyun/SPARK-49740. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .github/workflows/publish-java17.yaml | 3 ++- .github/workflows/publish-java21.yaml | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/.github/workflows/publish-java17.yaml b/.github/workflows/publish-java17.yaml index 610f839..2ca08e6 100644 --- a/.github/workflows/publish-java17.yaml +++ b/.github/workflows/publish-java17.yaml @@ -25,10 +25,11 @@ on: spark: description: 'The Spark version of Spark image.' required: true -default: '4.0.0-preview1' +default: '4.0.0-preview2' type: choice options: - 4.0.0-preview1 +- 4.0.0-preview2 publish: description: 'Publish the image or not.' default: false diff --git a/.github/workflows/publish-java21.yaml b/.github/workflows/publish-java21.yaml index 1a2078a..b98718f 100644 --- a/.github/workflows/publish-java21.yaml +++ b/.github/workflows/publish-java21.yaml @@ -25,10 +25,11 @@ on: spark: description: 'The Spark version of Spark image.' required: true -default: '4.0.0-preview1' +default: '4.0.0-preview2' type: choice options: - 4.0.0-preview1 +- 4.0.0-preview2 publish: description: 'Publish the image or not.' default: false - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark-docker) branch master updated: [SPARK-49736] Add Apache Spark `4.0.0-preview2` Dockerfiles
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new 217a942 [SPARK-49736] Add Apache Spark `4.0.0-preview2` Dockerfiles 217a942 is described below commit 217a9422bedb7fc3aab47c0bebed32acf0e1a737 Author: Dongjoon Hyun AuthorDate: Fri Sep 20 10:54:45 2024 -0700 [SPARK-49736] Add Apache Spark `4.0.0-preview2` Dockerfiles ### What changes were proposed in this pull request? This PR aims to add `4.0.0-preview2` Dockerfiles. ### Why are the changes needed? New release. ### Does this PR introduce _any_ user-facing change? New release. ### How was this patch tested? New release. Closes #70 from dongjoon-hyun/SPARK-49736. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .github/workflows/build_4.0.0-preview2.yaml| 43 +++ .github/workflows/test.yml | 1 + .../scala2.13-java17-python3-r-ubuntu/Dockerfile | 29 + .../scala2.13-java17-python3-ubuntu/Dockerfile | 26 + .../scala2.13-java17-r-ubuntu/Dockerfile | 28 + 4.0.0-preview2/scala2.13-java17-ubuntu/Dockerfile | 81 + .../scala2.13-java17-ubuntu/entrypoint.sh | 130 + .../scala2.13-java21-python3-r-ubuntu/Dockerfile | 29 + .../scala2.13-java21-python3-ubuntu/Dockerfile | 26 + .../scala2.13-java21-r-ubuntu/Dockerfile | 28 + 4.0.0-preview2/scala2.13-java21-ubuntu/Dockerfile | 81 + .../scala2.13-java21-ubuntu/entrypoint.sh | 130 + tools/template.py | 4 +- versions.json | 56 + 14 files changed, 691 insertions(+), 1 deletion(-) diff --git a/.github/workflows/build_4.0.0-preview2.yaml b/.github/workflows/build_4.0.0-preview2.yaml new file mode 100644 index 000..7e7dbea --- /dev/null +++ b/.github/workflows/build_4.0.0-preview2.yaml @@ -0,0 +1,43 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + +name: "Build and Test (4.0.0-preview2)" + +on: + pull_request: +branches: + - 'master' +paths: + - '4.0.0-preview2/**' + +jobs: + run-build: +strategy: + matrix: +image-type: ["all", "python", "scala", "r"] +java: [17, 21] +name: Run +secrets: inherit +uses: ./.github/workflows/main.yml +with: + spark: 4.0.0-preview2 + scala: 2.13 + java: ${{ matrix.java }} + image-type: ${{ matrix.image-type }} + diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml index f627405..e8f941c 100644 --- a/.github/workflows/test.yml +++ b/.github/workflows/test.yml @@ -28,6 +28,7 @@ on: default: '3.5.2' type: choice options: +- 4.0.0-preview2 - 4.0.0-preview1 - 3.5.2 - 3.5.1 diff --git a/4.0.0-preview2/scala2.13-java17-python3-r-ubuntu/Dockerfile b/4.0.0-preview2/scala2.13-java17-python3-r-ubuntu/Dockerfile new file mode 100644 index 000..7c575a8 --- /dev/null +++ b/4.0.0-preview2/scala2.13-java17-python3-r-ubuntu/Dockerfile @@ -0,0 +1,29 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing per
(spark-kubernetes-operator) branch main updated: [SPARK-49735] Upgrade Spark to `4.0.0-preview2`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git The following commit(s) were added to refs/heads/main by this push: new 1c1dbb6 [SPARK-49735] Upgrade Spark to `4.0.0-preview2` 1c1dbb6 is described below commit 1c1dbb61a65d65e97226f67ccb04be03f076f105 Author: Dongjoon Hyun AuthorDate: Fri Sep 20 10:53:43 2024 -0700 [SPARK-49735] Upgrade Spark to `4.0.0-preview2` ### What changes were proposed in this pull request? This PR aims to upgrade `Spark` dependency to `4.0.0-preview2`. ### Why are the changes needed? To use the latest updates. - https://github.com/apache/spark/releases/tag/v4.0.0-preview2 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #133 from dongjoon-hyun/SPARK-49735. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- gradle/libs.versions.toml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gradle/libs.versions.toml b/gradle/libs.versions.toml index 2ebed13..339b34f 100644 --- a/gradle/libs.versions.toml +++ b/gradle/libs.versions.toml @@ -20,7 +20,7 @@ lombok = "1.18.32" operator-sdk = "4.9.0" okhttp = "4.12.0" dropwizard-metrics = "4.2.25" -spark = "4.0.0-preview1" +spark = "4.0.0-preview2" log4j = "2.22.1" # Test - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-49531][PYTHON][CONNECT] Support line plot with plotly backend
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 22a7edce0a7c [SPARK-49531][PYTHON][CONNECT] Support line plot with plotly backend 22a7edce0a7c is described below commit 22a7edce0a7c70d6c1a5dcf995c6c723f0c3352b Author: Xinrong Meng AuthorDate: Fri Sep 20 08:53:52 2024 -0700 [SPARK-49531][PYTHON][CONNECT] Support line plot with plotly backend ### What changes were proposed in this pull request? Support line plot with plotly backend on both Spark Connect and Spark classic. ### Why are the changes needed? While Pandas on Spark supports plotting, PySpark currently lacks this feature. The proposed API will enable users to generate visualizations, such as line plots, by leveraging libraries like Plotly. This will provide users with an intuitive, interactive way to explore and understand large datasets directly from PySpark DataFrames, streamlining the data analysis workflow in distributed environments. See more at [PySpark Plotting API Specification](https://docs.google.com/document/d/1IjOEzC8zcetG86WDvqkereQPj_NGLNW7Bdu910g30Dg/edit?usp=sharing) in progress. Part of https://issues.apache.org/jira/browse/SPARK-49530. ### Does this PR introduce _any_ user-facing change? Yes. ```python >>> data = [("A", 10, 1.5), ("B", 30, 2.5), ("C", 20, 3.5)] >>> columns = ["category", "int_val", "float_val"] >>> sdf = spark.createDataFrame(data, columns) >>> sdf.show() ++---+-+ |category|int_val|float_val| ++---+-+ | A| 10| 1.5| | B| 30| 2.5| | C| 20| 3.5| ++---+-+ >>> f = sdf.plot(kind="line", x="category", y="int_val") >>> f.show() # see below >>> g = sdf.plot.line(x="category", y=["int_val", "float_val"]) >>> g.show() # see below ``` `f.show()`: ![newplot](https://github.com/user-attachments/assets/ebd50bbc-0dd1-437f-ae0c-0b4de8f3c722) `g.show()`: ![newplot (1)](https://github.com/user-attachments/assets/46d28840-a147-428f-8d88-d424aa76ad06) ### How was this patch tested? Unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48139 from xinrong-meng/plot_line_w_dep. Authored-by: Xinrong Meng Signed-off-by: Dongjoon Hyun --- .github/workflows/build_python_connect.yml | 2 +- dev/requirements.txt | 2 +- dev/sparktestsupport/modules.py| 4 + python/docs/source/getting_started/install.rst | 1 + python/packaging/classic/setup.py | 1 + python/packaging/connect/setup.py | 2 + python/pyspark/errors/error-conditions.json| 5 + python/pyspark/sql/classic/dataframe.py| 9 ++ python/pyspark/sql/connect/dataframe.py| 8 ++ python/pyspark/sql/dataframe.py| 28 + python/pyspark/sql/plot/__init__.py| 21 python/pyspark/sql/plot/core.py| 135 + python/pyspark/sql/plot/plotly.py | 30 + .../sql/tests/connect/test_parity_frame_plot.py| 36 ++ .../tests/connect/test_parity_frame_plot_plotly.py | 36 ++ python/pyspark/sql/tests/plot/__init__.py | 16 +++ python/pyspark/sql/tests/plot/test_frame_plot.py | 80 .../sql/tests/plot/test_frame_plot_plotly.py | 64 ++ python/pyspark/sql/utils.py| 17 +++ python/pyspark/testing/sqlutils.py | 7 ++ .../org/apache/spark/sql/internal/SQLConf.scala| 27 + 21 files changed, 529 insertions(+), 2 deletions(-) diff --git a/.github/workflows/build_python_connect.yml b/.github/workflows/build_python_connect.yml index 3ac1a0117e41..f668d813ef26 100644 --- a/.github/workflows/build_python_connect.yml +++ b/.github/workflows/build_python_connect.yml @@ -71,7 +71,7 @@ jobs: python packaging/connect/setup.py sdist cd dist pip install pyspark*connect-*.tar.gz - pip install 'six==1.16.0' 'pandas<=2.2.2' scipy 'plotly>=4.8' 'mlflow>=2.8.1' coverage matplotlib openpyxl 'memory-profiler>=0.61.0' 'scikit-learn>=1.3.2' 'graphviz==0.20.3' torch torchvision torcheval deepspeed unittest-xml-reporting + pip install 'six==1.16.0' 'pandas<=2.2.2' scipy
(spark) branch master updated: [SPARK-49704][BUILD] Upgrade `commons-io` to 2.17.0
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3d8c078ddefe [SPARK-49704][BUILD] Upgrade `commons-io` to 2.17.0 3d8c078ddefe is described below commit 3d8c078ddefe3bb74fc78ffc9391a067156c8499 Author: panbingkun AuthorDate: Fri Sep 20 08:44:14 2024 -0700 [SPARK-49704][BUILD] Upgrade `commons-io` to 2.17.0 ### What changes were proposed in this pull request? This PR aims to upgrade `commons-io` from `2.16.1` to `2.17.0`. ### Why are the changes needed? The full release notes: https://commons.apache.org/proper/commons-io/changes-report.html#a2.17.0 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48154 from panbingkun/SPARK-49704. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- pom.xml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 9871cc0bca04..419625f48fa1 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -44,7 +44,7 @@ commons-compiler/3.1.9//commons-compiler-3.1.9.jar commons-compress/1.27.1//commons-compress-1.27.1.jar commons-crypto/1.1.0//commons-crypto-1.1.0.jar commons-dbcp/1.4//commons-dbcp-1.4.jar -commons-io/2.16.1//commons-io-2.16.1.jar +commons-io/2.17.0//commons-io-2.17.0.jar commons-lang/2.6//commons-lang-2.6.jar commons-lang3/3.17.0//commons-lang3-3.17.0.jar commons-math3/3.6.1//commons-math3-3.6.1.jar diff --git a/pom.xml b/pom.xml index ddabc82d2ad1..b7c87beec0f9 100644 --- a/pom.xml +++ b/pom.xml @@ -187,7 +187,7 @@ 3.0.3 1.17.1 1.27.1 -2.16.1 +2.17.0 2.6 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark-kubernetes-operator) branch main updated (20bdb70 -> fcb8a8f)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch main in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git discard 20bdb70 [SPARK-45923] Add java21 to the e2e tests new fcb8a8f [SPARK-49724] Add java21 to the e2e tests This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (20bdb70) \ N -- N -- N refs/heads/main (fcb8a8f) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark-kubernetes-operator) 01/01: [SPARK-49724] Add java21 to the e2e tests
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git commit fcb8a8f3bf3afa75a0fb55a2bfc986af556541bc Author: Qi Tan AuthorDate: Fri Sep 20 04:51:50 2024 -0700 [SPARK-49724] Add java21 to the e2e tests Add java 21 e2e teste E2E coverage on java version no e2e workflow should cover this no Closes #132 from TQJADE/java-21. Authored-by: Qi Tan Signed-off-by: Dongjoon Hyun --- tests/e2e/spark-versions/chainsaw-test.yaml | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/tests/e2e/spark-versions/chainsaw-test.yaml b/tests/e2e/spark-versions/chainsaw-test.yaml index f4a07f1..8f6c263 100644 --- a/tests/e2e/spark-versions/chainsaw-test.yaml +++ b/tests/e2e/spark-versions/chainsaw-test.yaml @@ -38,7 +38,7 @@ spec: - name: "JAVA_VERSION" value: "17" - name: "IMAGE" -value: 'spark:3.5.2-scala2.12-java17-ubuntu' +value: 'apache/spark:3.5.2-scala2.12-java17-ubuntu' - bindings: - name: "SPARK_VERSION" value: "3.4.3" @@ -47,7 +47,16 @@ spec: - name: "JAVA_VERSION" value: "11" - name: "IMAGE" -value: 'spark:3.4.3-scala2.12-java11-ubuntu' +value: 'apache/spark:3.4.3-scala2.12-java11-ubuntu' + - bindings: + - name: "SPARK_VERSION" +value: "4.0.0-preview1" + - name: "SCALA_VERSION" +value: "2.13" + - name: "JAVA_VERSION" +value: "21" + - name: "IMAGE" +value: 'apache/spark:4.0.0-preview1-java21-scala' steps: - name: install-spark-application try: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark-kubernetes-operator) branch main updated: [SPARK-45923] Add java21 to the e2e tests
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git The following commit(s) were added to refs/heads/main by this push: new 20bdb70 [SPARK-45923] Add java21 to the e2e tests 20bdb70 is described below commit 20bdb7095a45d57c0e2caca692cc72658072d258 Author: Qi Tan AuthorDate: Fri Sep 20 04:50:15 2024 -0700 [SPARK-45923] Add java21 to the e2e tests ### What changes were proposed in this pull request? Add java 21 e2e teste ### Why are the changes needed? E2E coverage on java version ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? e2e workflow should cover this ### Was this patch authored or co-authored using generative AI tooling? no Closes #132 from TQJADE/java-21. Authored-by: Qi Tan Signed-off-by: Dongjoon Hyun --- tests/e2e/spark-versions/chainsaw-test.yaml | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/tests/e2e/spark-versions/chainsaw-test.yaml b/tests/e2e/spark-versions/chainsaw-test.yaml index f4a07f1..8f6c263 100644 --- a/tests/e2e/spark-versions/chainsaw-test.yaml +++ b/tests/e2e/spark-versions/chainsaw-test.yaml @@ -38,7 +38,7 @@ spec: - name: "JAVA_VERSION" value: "17" - name: "IMAGE" -value: 'spark:3.5.2-scala2.12-java17-ubuntu' +value: 'apache/spark:3.5.2-scala2.12-java17-ubuntu' - bindings: - name: "SPARK_VERSION" value: "3.4.3" @@ -47,7 +47,16 @@ spec: - name: "JAVA_VERSION" value: "11" - name: "IMAGE" -value: 'spark:3.4.3-scala2.12-java11-ubuntu' +value: 'apache/spark:3.4.3-scala2.12-java11-ubuntu' + - bindings: + - name: "SPARK_VERSION" +value: "4.0.0-preview1" + - name: "SCALA_VERSION" +value: "2.13" + - name: "JAVA_VERSION" +value: "21" + - name: "IMAGE" +value: 'apache/spark:4.0.0-preview1-java21-scala' steps: - name: install-spark-application try: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) tag v4.0.0-preview2 created (now f0d465e09b8d)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to tag v4.0.0-preview2 in repository https://gitbox.apache.org/repos/asf/spark.git at f0d465e09b8d (commit) No new revisions were added by this update. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r71774 - /dev/spark/v4.0.0-preview2-rc1-bin/ /release/spark/spark-4.0.0-preview2/
Author: dongjoon Date: Fri Sep 20 10:00:47 2024 New Revision: 71774 Log: Release Apache Spark 4.0.0-preview2 Added: release/spark/spark-4.0.0-preview2/ - copied from r71773, dev/spark/v4.0.0-preview2-rc1-bin/ Removed: dev/spark/v4.0.0-preview2-rc1-bin/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-49721][BUILD] Upgrade `protobuf-java` to 3.25.5
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ca726c10925a [SPARK-49721][BUILD] Upgrade `protobuf-java` to 3.25.5 ca726c10925a is described below commit ca726c10925a3677bf057f65ecf415e608c63cd5 Author: Dongjoon Hyun AuthorDate: Thu Sep 19 17:16:25 2024 -0700 [SPARK-49721][BUILD] Upgrade `protobuf-java` to 3.25.5 ### What changes were proposed in this pull request? This PR aims to upgrade `protobuf-java` to 3.25.5. ### Why are the changes needed? To bring the latest bug fixes. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48170 Closes #48171 from dongjoon-hyun/SPARK-49721. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- pom.xml | 2 +- project/SparkBuild.scala | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/pom.xml b/pom.xml index 694ea31e6f37..ddabc82d2ad1 100644 --- a/pom.xml +++ b/pom.xml @@ -124,7 +124,7 @@ 3.4.0 -3.25.4 +3.25.5 3.11.4 ${hadoop.version} 3.9.2 diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala index d93a52985b77..2f390cb70baa 100644 --- a/project/SparkBuild.scala +++ b/project/SparkBuild.scala @@ -89,7 +89,7 @@ object BuildCommons { // Google Protobuf version used for generating the protobuf. // SPARK-41247: needs to be consistent with `protobuf.version` in `pom.xml`. - val protoVersion = "3.25.4" + val protoVersion = "3.25.5" // GRPC version used for Spark Connect. val grpcVersion = "1.62.2" } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-49720][PYTHON][INFRA] Add a script to clean up PySpark temp files
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 04455797bfb3 [SPARK-49720][PYTHON][INFRA] Add a script to clean up PySpark temp files 04455797bfb3 is described below commit 04455797bfb3631b13b41cfa5d2604db3bf8acc2 Author: Ruifeng Zheng AuthorDate: Thu Sep 19 12:32:30 2024 -0700 [SPARK-49720][PYTHON][INFRA] Add a script to clean up PySpark temp files ### What changes were proposed in this pull request? Add a script to clean up PySpark temp files ### Why are the changes needed? Sometimes I encounter weird issues due to the out-dated `pyspark.zip` file, and removing it can result in expected behavior. So I think we can add such a script. ### Does this PR introduce _any_ user-facing change? no, dev-only ### How was this patch tested? manually test ### Was this patch authored or co-authored using generative AI tooling? no Closes #48167 from zhengruifeng/py_infra_cleanup. Authored-by: Ruifeng Zheng Signed-off-by: Dongjoon Hyun --- dev/py-cleanup | 31 +++ 1 file changed, 31 insertions(+) diff --git a/dev/py-cleanup b/dev/py-cleanup new file mode 100755 index ..6a2edd104017 --- /dev/null +++ b/dev/py-cleanup @@ -0,0 +1,31 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Utility for temporary files cleanup in 'python'. +# usage: ./dev/py-cleanup + +set -ex + +SPARK_HOME="$(cd "`dirname $0`"/..; pwd)" +cd "$SPARK_HOME" + +rm -rf python/target +rm -rf python/lib/pyspark.zip +rm -rf python/docs/build +rm -rf python/docs/source/reference/*/api - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-49718][PS] Switch `Scatter` plot to sampled data
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6d1815eceea2 [SPARK-49718][PS] Switch `Scatter` plot to sampled data 6d1815eceea2 is described below commit 6d1815eceea2003de2e3602f0f64e8188e8288d8 Author: Ruifeng Zheng AuthorDate: Thu Sep 19 12:31:48 2024 -0700 [SPARK-49718][PS] Switch `Scatter` plot to sampled data ### What changes were proposed in this pull request? Switch `Scatter` plot to sampled data ### Why are the changes needed? when the data distribution has relationship with the order, the first n rows will not be representative of the whole dataset for example: ``` import pandas as pd import numpy as np import pyspark.pandas as ps # ps.set_option("plotting.max_rows", 1) np.random.seed(123) pdf = pd.DataFrame(np.random.randn(1, 4), columns=list('ABCD')).sort_values("A") psdf = ps.DataFrame(pdf) psdf.plot.scatter(x='B', y='A') ``` all 10k datapoints: ![image](https://github.com/user-attachments/assets/72cf7e97-ad10-41e0-a8a6-351747d5285f) before (first 1k datapoints): ![image](https://github.com/user-attachments/assets/1ed50d2c-7772-4579-a84c-6062542d9367) after (sampled 1k datapoints): ![image](https://github.com/user-attachments/assets/6c684cba-4119-4c38-8228-2bedcdeb9e59) ### Does this PR introduce _any_ user-facing change? yes ### How was this patch tested? ci and manually test ### Was this patch authored or co-authored using generative AI tooling? no Closes #48164 from zhengruifeng/ps_scatter_sampling. Authored-by: Ruifeng Zheng Signed-off-by: Dongjoon Hyun --- python/pyspark/pandas/plot/core.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/python/pyspark/pandas/plot/core.py b/python/pyspark/pandas/plot/core.py index 429e97ecf07b..6f036b766924 100644 --- a/python/pyspark/pandas/plot/core.py +++ b/python/pyspark/pandas/plot/core.py @@ -479,7 +479,7 @@ class PandasOnSparkPlotAccessor(PandasObject): "pie": TopNPlotBase().get_top_n, "bar": TopNPlotBase().get_top_n, "barh": TopNPlotBase().get_top_n, -"scatter": TopNPlotBase().get_top_n, +"scatter": SampledPlotBase().get_sampled, "area": SampledPlotBase().get_sampled, "line": SampledPlotBase().get_sampled, } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated: [SPARK-46535][SQL][3.4] Fix NPE when describe extended a column without col stats
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new cb89d18a4d75 [SPARK-46535][SQL][3.4] Fix NPE when describe extended a column without col stats cb89d18a4d75 is described below commit cb89d18a4d750fc88e5d747601352488223e97b5 Author: saitharun15 AuthorDate: Thu Sep 19 12:19:10 2024 -0700 [SPARK-46535][SQL][3.4] Fix NPE when describe extended a column without col stats ### What changes were proposed in this pull request? Backport [#44524 ] to 3.4 for [[SPARK-46535]](https://issues.apache.org/jira/browse/SPARK-46535)[SQL] Fix NPE when describe extended a column without col stats ### Why are the changes needed? Currently executing DESCRIBE TABLE EXTENDED a column without col stats with v2 table will throw a null pointer exception. ``` Cannot invoke "org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because the return value of "scala.Option.get()" is null java.lang.NullPointerException: Cannot invoke "org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because the return value of "scala.Option.get()" is null at org.apache.spark.sql.execution.datasources.v2.DescribeColumnExec.run(DescribeColumnExec.scala:63) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:118) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103) ``` ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Add a new test describe extended (formatted) a column without col stats ### Was this patch authored or co-authored using generative AI tooling? No Closes #48160 from saitharun15/SPARK-46535-branch-3.4. Lead-authored-by: saitharun15 Co-authored-by: Sai Tharun Signed-off-by: Dongjoon Hyun --- .../datasources/v2/DescribeColumnExec.scala | 2 +- .../execution/command/v2/DescribeTableSuite.scala | 21 + 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeColumnExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeColumnExec.scala index 61ccda3fc954..2683d8d547f0 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeColumnExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeColumnExec.scala @@ -53,7 +53,7 @@ case class DescribeColumnExec( read.newScanBuilder(CaseInsensitiveStringMap.empty()).build() match { case s: SupportsReportStatistics => val stats = s.estimateStatistics() - Some(stats.columnStats().get(FieldReference.column(column.name))) + Option(stats.columnStats().get(FieldReference.column(column.name))) case _ => None } case _ => None diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/DescribeTableSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/DescribeTableSuite.scala index 25363dcea699..a12bb92072bc 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/DescribeTableSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/DescribeTableSuite.scala @@ -175,4 +175,25 @@ class DescribeTableSuite extends command.DescribeTableSuiteBase Row("max_col_len", "NULL"))) } } + + test("SPARK-46535: describe extended (formatted) a column without col stats") { +withNamespaceAndTable("ns", "tbl") { tbl => + sql( +s""" + |CREATE TABLE $tbl + |(key INT COMMENT 'column_comment', col STRING) + |$defaultUsing""".stripMargin) + + val descriptionDf = sql(s"DESCRIBE TABLE EXTENDED $tbl key") + assert(descriptionDf.schema.map(field => (field.name, field.dataTy
(spark) branch master updated (f0fb0c89ec29 -> 92cad2abd54e)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from f0fb0c89ec29 [SPARK-49719][SQL] Make `UUID` and `SHUFFLE` accept integer `seed` add 92cad2abd54e [SPARK-49716][PS][DOCS][TESTS] Fix documentation and add test of barh plot No new revisions were added by this update. Summary of changes: python/pyspark/pandas/plot/core.py | 13 ++--- python/pyspark/pandas/tests/plot/test_frame_plot_plotly.py | 5 +++-- 2 files changed, 13 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (94dca78c128f -> f0fb0c89ec29)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 94dca78c128f [SPARK-49693][PYTHON][CONNECT] Refine the string representation of `timedelta` add f0fb0c89ec29 [SPARK-49719][SQL] Make `UUID` and `SHUFFLE` accept integer `seed` No new revisions were added by this update. Summary of changes: .../apache/spark/sql/catalyst/expressions/randomExpressions.scala | 1 + .../sql/catalyst/expressions/CollectionExpressionsSuite.scala | 8 .../spark/sql/catalyst/expressions/MiscExpressionsSuite.scala | 7 +++ 3 files changed, 16 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark-kubernetes-operator) branch main updated: [SPARK-49715] Add `Java 21`-based `SparkCluster` example
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git The following commit(s) were added to refs/heads/main by this push: new b187764 [SPARK-49715] Add `Java 21`-based `SparkCluster` example b187764 is described below commit b18776458b8f26db8e43f6ddb9275f3cbdbafffe Author: Dongjoon Hyun AuthorDate: Wed Sep 18 22:30:11 2024 -0700 [SPARK-49715] Add `Java 21`-based `SparkCluster` example ### What changes were proposed in this pull request? This PR aims to add `Java 21`-based `SparkCluster` example. ### Why are the changes needed? Apache Spark starts to publish Java 21-based image from Today (2024-09-19). This new example will illustrate how to use it. - https://github.com/apache/spark-docker/pull/69 ### Does this PR introduce _any_ user-facing change? No, this is a new example. ### How was this patch tested? Manual review. ``` $ kubectl apply -f cluster-java21.yaml ``` ``` $ kubectl get sparkcluster NAME CURRENT STATEAGE cluster-java21 RunningHealthy 9s ``` ``` $ kubectl describe sparkcluster cluster-java21 Name: cluster-java21 Namespace:default Labels: Annotations: API Version: spark.apache.org/v1alpha1 Kind: SparkCluster Metadata: Creation Timestamp: 2024-09-19T04:25:30Z Finalizers: sparkclusters.spark.apache.org/finalizer Generation:2 Resource Version: 96663 UID: 9421c957-380d-4a26-a266-379304bf83ee Spec: Cluster Tolerations: Instance Config: Init Workers: 3 Max Workers: 3 Min Workers: 3 Master Spec: Runtime Versions: Spark Version: 4.0.0-preview1 Spark Conf: spark.kubernetes.container.image: apache/spark:4.0.0-preview1-java21 spark.master.rest.enabled: true spark.master.rest.host:0.0.0.0 spark.master.ui.title: Prod Spark Cluster (Java 21) spark.ui.reverseProxy: true Worker Spec: Status: Current Attempt Summary: Attempt Info: Id: 0 Current State: Current State Summary: RunningHealthy Last Transition Time: 2024-09-19T04:25:30.665095088Z Message:Cluster has reached ready state. State Transition History: 0: Current State Summary: Submitted Last Transition Time: 2024-09-19T04:25:30.640072963Z Message:Spark cluster has been submitted to Kubernetes Cluster. 1: Current State Summary: RunningHealthy Last Transition Time: 2024-09-19T04:25:30.665095088Z Message:Cluster has reached ready state. Events: ``` ``` $ k get pod NAME READY STATUSRESTARTS AGE cluster-java21-master-0 1/1 Running 0 3m20s cluster-java21-worker-0 1/1 Running 0 3m20s cluster-java21-worker-1 1/1 Running 0 3m20s cluster-java21-worker-2 1/1 Running 0 3m20s spark-kubernetes-operator-778b9bbdc6-fqks9 1/1 Running 0 20m ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #130 from dongjoon-hyun/SPARK-49715. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- examples/cluster-java21.yaml | 32 1 file changed, 32 insertions(+) diff --git a/examples/cluster-java21.yaml b/examples/cluster-java21.yaml new file mode 100644 index 000..fefe90c --- /dev/null +++ b/examples/cluster-java21.yaml @@ -0,0 +1,32 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +apiVersion: spark.apache.org/v1alpha1 +kind: SparkCluster +metadata: +
(spark-kubernetes-operator) branch main updated: [SPARK-49714] Add `Java 21`-based `SparkPi` example
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git The following commit(s) were added to refs/heads/main by this push: new 815cefe [SPARK-49714] Add `Java 21`-based `SparkPi` example 815cefe is described below commit 815cefe136c196cf987ac73300e55c84d3216816 Author: Dongjoon Hyun AuthorDate: Wed Sep 18 22:29:13 2024 -0700 [SPARK-49714] Add `Java 21`-based `SparkPi` example ### What changes were proposed in this pull request? This PR aims to add `Java 21`-based SparkPi example. ### Why are the changes needed? Apache Spark starts to publish Java 21-based image from Today (2024-09-19). This new example will illustrate how to use it. - https://github.com/apache/spark-docker/pull/69 ### Does this PR introduce _any_ user-facing change? No, this is a new example. ### How was this patch tested? Manual review. ``` $ kubectl apply -f examples/pi-java21.yaml ``` ``` $ kubectl get sparkapp NAMECURRENT STATE AGE pi-java21 ResourceReleased 28s ``` ``` $ kubectl describe sparkapp pi-java21 Name: pi-java21 Namespace:default Labels: Annotations: API Version: spark.apache.org/v1alpha1 Kind: SparkApplication Metadata: Creation Timestamp: 2024-09-19T04:08:16Z Finalizers: sparkapplications.spark.apache.org/finalizer Generation:2 Resource Version: 95294 UID: 2bc46e8d-6339-4867-9a28-6552c6c471e5 Spec: Application Tolerations: Application Timeout Config: Driver Ready Timeout Millis:30 Driver Start Timeout Millis:30 Executor Start Timeout Millis: 30 Force Termination Grace Period Millis: 30 Termination Requeue Period Millis: 2000 Instance Config: Init Executors:0 Max Executors: 0 Min Executors: 0 Resource Retain Policy: OnFailure Restart Config: Max Restart Attempts:3 Restart Backoff Millis: 3 Restart Policy: Never Deployment Mode: ClusterMode Driver Args: Jars:local:///opt/spark/examples/jars/spark-examples.jar Main Class: org.apache.spark.examples.SparkPi Runtime Versions: Scala Version: 2.13 Spark Version: 4.0.0-preview1 Spark Conf: spark.dynamicAllocation.enabled: true spark.dynamicAllocation.maxExecutors: 3 spark.dynamicAllocation.shuffleTracking.enabled: true spark.kubernetes.authenticate.driver.serviceAccountName: spark spark.kubernetes.container.image: apache/spark:4.0.0-preview1-java21-scala spark.log.structuredLogging.enabled: false Status: Current Attempt Summary: Attempt Info: Id: 0 Current State: Current State Summary: ResourceReleased Last Transition Time: 2024-09-19T04:08:33.316041381Z State Transition History: 0: Current State Summary: Submitted Last Transition Time: 2024-09-19T04:08:16.584629470Z Message:Spark application has been created on Kubernetes Cluster. 1: Current State Summary: DriverRequested Last Transition Time: 2024-09-19T04:08:17.269457304Z Message:Requested driver from resource scheduler. 2: Current State Summary: DriverStarted Last Transition Time: 2024-09-19T04:08:17.809898304Z Message:Driver has started running. 3: Current State Summary: DriverReady Last Transition Time: 2024-09-19T04:08:17.810393971Z Message:Driver has reached ready state. 4: Current State Summary: RunningHealthy Last Transition Time: 2024-09-19T04:08:17.828526471Z Message:Application is running healthy. 5: Current State Summary: Succeeded Last Transition Time: 2024-09-19T04:08:33.241514089Z Message:Spark application completed successfully. 6: Current State Summary: ResourceReleased Last Transition Time: 2024-09-19T04:08:33.316041381Z Events: ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #129 from dongjoon-hyun/SPARK-49714. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- examples/pi-java21.yaml | 33
(spark-kubernetes-operator) branch main updated: [SPARK-49705] Use `spark-examples.jar` instead of `spark-examples_2.13-4.0.0-preview1.jar`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git The following commit(s) were added to refs/heads/main by this push: new 491e7db [SPARK-49705] Use `spark-examples.jar` instead of `spark-examples_2.13-4.0.0-preview1.jar` 491e7db is described below commit 491e7db1d62b9f0e4f641f2a59eee083140dc819 Author: Dongjoon Hyun AuthorDate: Wed Sep 18 22:28:36 2024 -0700 [SPARK-49705] Use `spark-examples.jar` instead of `spark-examples_2.13-4.0.0-preview1.jar` ### What changes were proposed in this pull request? This PR aims to use `spark-examples.jar` instead of `spark-examples_2.13-4.0.0-preview1.jar`. ### Why are the changes needed? To simplify the examples for Apache Spark 4+ via SPARK-45497. - https://github.com/apache/spark/pull/43324 ### Does this PR introduce _any_ user-facing change? Yes, but only example images. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #127 from dongjoon-hyun/SPARK-49705. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- examples/pi-on-yunikorn.yaml| 2 +- examples/pi-scala.yaml | 2 +- examples/pi-with-one-pod.yaml | 2 +- examples/pi.yaml| 2 +- examples/sql.yaml | 2 +- examples/submit-pi-to-prod.sh | 2 +- tests/e2e/state-transition/spark-example-succeeded.yaml | 2 +- tests/e2e/watched-namespaces/spark-example.yaml | 2 +- 8 files changed, 8 insertions(+), 8 deletions(-) diff --git a/examples/pi-on-yunikorn.yaml b/examples/pi-on-yunikorn.yaml index 029c9f3..d8f6ccf 100644 --- a/examples/pi-on-yunikorn.yaml +++ b/examples/pi-on-yunikorn.yaml @@ -18,7 +18,7 @@ metadata: name: pi-on-yunikorn spec: mainClass: "org.apache.spark.examples.SparkPi" - jars: "local:///opt/spark/examples/jars/spark-examples_2.13-4.0.0-preview1.jar" + jars: "local:///opt/spark/examples/jars/spark-examples.jar" driverArgs: [ "2" ] sparkConf: spark.dynamicAllocation.enabled: "true" diff --git a/examples/pi-scala.yaml b/examples/pi-scala.yaml index 3744ae1..29b5018 100644 --- a/examples/pi-scala.yaml +++ b/examples/pi-scala.yaml @@ -18,7 +18,7 @@ metadata: name: pi-scala spec: mainClass: "org.apache.spark.examples.SparkPi" - jars: "local:///opt/spark/examples/jars/spark-examples_2.13-4.0.0-preview1.jar" + jars: "local:///opt/spark/examples/jars/spark-examples.jar" sparkConf: spark.dynamicAllocation.enabled: "true" spark.dynamicAllocation.shuffleTracking.enabled: "true" diff --git a/examples/pi-with-one-pod.yaml b/examples/pi-with-one-pod.yaml index f46d977..739058d 100644 --- a/examples/pi-with-one-pod.yaml +++ b/examples/pi-with-one-pod.yaml @@ -18,7 +18,7 @@ metadata: name: pi-with-one-pod spec: mainClass: "org.apache.spark.examples.SparkPi" - jars: "local:///opt/spark/examples/jars/spark-examples_2.13-4.0.0-preview1.jar" + jars: "local:///opt/spark/examples/jars/spark-examples.jar" sparkConf: spark.kubernetes.driver.master: "local[10]" spark.kubernetes.driver.request.cores: "5" diff --git a/examples/pi.yaml b/examples/pi.yaml index f99499d..8b20dcd 100644 --- a/examples/pi.yaml +++ b/examples/pi.yaml @@ -18,7 +18,7 @@ metadata: name: pi spec: mainClass: "org.apache.spark.examples.SparkPi" - jars: "local:///opt/spark/examples/jars/spark-examples_2.13-4.0.0-preview1.jar" + jars: "local:///opt/spark/examples/jars/spark-examples.jar" sparkConf: spark.dynamicAllocation.enabled: "true" spark.dynamicAllocation.shuffleTracking.enabled: "true" diff --git a/examples/sql.yaml b/examples/sql.yaml index 9639723..1be9779 100644 --- a/examples/sql.yaml +++ b/examples/sql.yaml @@ -18,7 +18,7 @@ metadata: name: sql spec: mainClass: "org.apache.spark.examples.sql.JavaSparkSQLCli" - jars: "local:///opt/spark/examples/jars/spark-examples_2.13-4.0.0-preview1.jar" + jars: "local:///opt/spark/examples/jars/spark-examples.jar" driverArgs: [ "SHOW DATABASES", "SHOW TABLES", "SELECT VERSION()" ] sparkConf: spark.dynamicAllocation.enabled: "true" diff --git a/examples/submit-pi-to-prod.sh b/examples/submit-pi-to-prod.sh index 915706a..b3b4707 100755 --- a/examples/submit-pi-to-prod.sh +++ b/examples/submit-pi-to-prod.sh @@ -30,7 +30,7 @@ curl -XPOST http://localhost:6066/v1/submissions/create
(spark-docker) branch master updated: [SPARK-49703] Publish Java 21 Docker image for preview1
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new 0402e13 [SPARK-49703] Publish Java 21 Docker image for preview1 0402e13 is described below commit 0402e13bb797363f6b99d6aa56c4185317deeaf4 Author: Dongjoon Hyun AuthorDate: Wed Sep 18 20:37:56 2024 -0700 [SPARK-49703] Publish Java 21 Docker image for preview1 ### What changes were proposed in this pull request? This PR aims to publish Java 21 Docker image for `preview1` and will be extended for `preview2`. ### Why are the changes needed? Apache Spark supports Java 21 via SPARK-43831. ### Does this PR introduce _any_ user-facing change? No, this is a new image. ### How was this patch tested? Pass the CIs. Closes #69 from dongjoon-hyun/SPARK-49703. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .github/workflows/build_4.0.0-preview1.yaml| 2 +- .github/workflows/publish-java21.yaml | 88 ++ .../scala2.13-java21-python3-r-ubuntu/Dockerfile | 29 + .../scala2.13-java21-python3-ubuntu/Dockerfile | 26 + .../scala2.13-java21-r-ubuntu/Dockerfile | 28 + 4.0.0-preview1/scala2.13-java21-ubuntu/Dockerfile | 81 + .../scala2.13-java21-ubuntu/entrypoint.sh | 130 + README.md | 2 +- add-dockerfiles.sh | 10 +- versions.json | 28 + 10 files changed, 420 insertions(+), 4 deletions(-) diff --git a/.github/workflows/build_4.0.0-preview1.yaml b/.github/workflows/build_4.0.0-preview1.yaml index aa683f7..31df15a 100644 --- a/.github/workflows/build_4.0.0-preview1.yaml +++ b/.github/workflows/build_4.0.0-preview1.yaml @@ -31,7 +31,7 @@ jobs: strategy: matrix: image-type: ["all", "python", "scala", "r"] -java: [17] +java: [17, 21] name: Run secrets: inherit uses: ./.github/workflows/main.yml diff --git a/.github/workflows/publish-java21.yaml b/.github/workflows/publish-java21.yaml new file mode 100644 index 000..1a2078a --- /dev/null +++ b/.github/workflows/publish-java21.yaml @@ -0,0 +1,88 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + +name: "Publish (Java 21 only)" + +on: + workflow_dispatch: +inputs: + spark: +description: 'The Spark version of Spark image.' +required: true +default: '4.0.0-preview1' +type: choice +options: +- 4.0.0-preview1 + publish: +description: 'Publish the image or not.' +default: false +type: boolean +required: true + repository: +description: The registry to be published (Available only when publish is true). +required: false +default: ghcr.io/apache/spark-docker +type: choice +options: +# GHCR: This required the write permission of apache/spark-docker (Spark Committer) +- ghcr.io/apache/spark-docker +# Dockerhub: This required the DOCKERHUB_TOKEN and DOCKERHUB_USER (Spark Committer) +- apache + +jobs: + # We first build and publish the base image + run-base-build: +strategy: + matrix: +scala: [2.13] +java: [21] +image-type: ["scala"] +permissions: + packages: write +name: Run Base +secrets: inherit +uses: ./.github/workflows/main.yml +with: + spark: ${{ inputs.spark }} + scala: ${{ matrix.scala }} + java: ${{ matrix.java }} + publish: ${{ inputs.publish }} + repository: ${{ inputs.repository }} + image-type: ${{ matrix.image-type }} + + # Then publish the all / python / r images + run-build: +needs: run-base-build +strategy: + matrix: +scala: [2.13] +java: [21] +image-type: ["all&
(spark-kubernetes-operator) branch main updated: [SPARK-49706] Use `apache/spark` images instead of `spark`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git The following commit(s) were added to refs/heads/main by this push: new 41e7c3b [SPARK-49706] Use `apache/spark` images instead of `spark` 41e7c3b is described below commit 41e7c3bba5857f6a678570c64e1c360375395611 Author: Dongjoon Hyun AuthorDate: Wed Sep 18 18:16:35 2024 -0700 [SPARK-49706] Use `apache/spark` images instead of `spark` ### What changes were proposed in this pull request? This PR aims to propose to use `apache/spark` images instead of `spark` because `apache/spark` images are published first. For example, the following are only available in `apache/spark` as of now. - https://github.com/apache/spark-docker/pull/66 - https://github.com/apache/spark-docker/pull/67 - https://github.com/apache/spark-docker/pull/68 ### Why are the changes needed? To apply the latest bits earlier. ### Does this PR introduce _any_ user-facing change? There is no change from `Apache Spark K8s Operator`. Only the underlying images are changed. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #128 from dongjoon-hyun/SPARK-49706. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- README.md | 2 +- examples/cluster-on-yunikorn.yaml | 2 +- examples/cluster-with-template.yaml | 2 +- examples/pi-on-yunikorn.yaml| 2 +- examples/pi-scala.yaml | 2 +- examples/pi-with-one-pod.yaml | 2 +- examples/pi.yaml| 2 +- examples/prod-cluster-with-three-workers.yaml | 2 +- examples/pyspark-pi.yaml| 2 +- examples/qa-cluster-with-one-worker.yaml| 2 +- examples/sql.yaml | 2 +- tests/e2e/python/chainsaw-test.yaml | 4 ++-- tests/e2e/spark-versions/chainsaw-test.yaml | 2 +- tests/e2e/state-transition/spark-cluster-example-succeeded.yaml | 2 +- tests/e2e/state-transition/spark-example-succeeded.yaml | 2 +- tests/e2e/watched-namespaces/spark-example.yaml | 2 +- 16 files changed, 17 insertions(+), 17 deletions(-) diff --git a/README.md b/README.md index e9cdee7..e306889 100644 --- a/README.md +++ b/README.md @@ -100,7 +100,7 @@ Events: Normal Scheduled 14s yunikorn Successfully assigned default/pi-on-yunikorn-0-driver to node docker-desktop Normal PodBindSuccessful 14s yunikorn Pod default/pi-on-yunikorn-0-driver is successfully bound to node docker-desktop Normal TaskCompleted 6syunikorn Task default/pi-on-yunikorn-0-driver is completed - Normal Pulled 13s kubelet Container image "spark:4.0.0-preview1" already present on machine + Normal Pulled 13s kubelet Container image "apache/spark:4.0.0-preview1" already present on machine Normal Created13s kubelet Created container spark-kubernetes-driver Normal Started13s kubelet Started container spark-kubernetes-driver diff --git a/examples/cluster-on-yunikorn.yaml b/examples/cluster-on-yunikorn.yaml index 0032c84..4c1d142 100644 --- a/examples/cluster-on-yunikorn.yaml +++ b/examples/cluster-on-yunikorn.yaml @@ -25,7 +25,7 @@ spec: minWorkers: 1 maxWorkers: 1 sparkConf: -spark.kubernetes.container.image: "spark:4.0.0-preview1" +spark.kubernetes.container.image: "apache/spark:4.0.0-preview1" spark.kubernetes.scheduler.name: "yunikorn" spark.master.ui.title: "Spark Cluster on YuniKorn Scheduler" spark.master.rest.enabled: "true" diff --git a/examples/cluster-with-template.yaml b/examples/cluster-with-template.yaml index c0d17b8..69add4d 100644 --- a/examples/cluster-with-template.yaml +++ b/examples/cluster-with-template.yaml @@ -87,7 +87,7 @@ spec: annotations: customAnnotation: "annotation" sparkConf: -spark.kubernetes.container.image: "spark:4.0.0-preview1" +spark.kubernetes.container.image: "apache/spark:4.0.0-preview1" spark.master.ui.title: "Spark Cluster with Template" spark.master.rest.enabled: "true" spark.master.rest.host: "0.0.0.0" diff --git a/examples/pi-on-yunikorn.yaml b/examples/pi-on-yunikorn.yaml index 9e115b4..029c9f3 100644 --- a/exa
(spark-docker) branch master updated: [SPARK-44935] Fix `RELEASE` file to have the correct information in Docker images if exists
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new a5f0168 [SPARK-44935] Fix `RELEASE` file to have the correct information in Docker images if exists a5f0168 is described below commit a5f016858ca0ce6809a06d3c159bfdb221df68e0 Author: Dongjoon Hyun AuthorDate: Wed Sep 18 16:12:41 2024 -0700 [SPARK-44935] Fix `RELEASE` file to have the correct information in Docker images if exists ### What changes were proposed in this pull request? This PR aims to fix `RELEASE` file to have the correct information in Docker images if exists. Apache Spark repository already fixed this. - https://github.com/apache/spark/pull/42636 ### Why are the changes needed? To provide a correct information for Spark 3.4+ ### Does this PR introduce _any_ user-facing change? No behavior change. Only `RELEASE` file. ### How was this patch tested? Pass the CIs. Closes #68 from dongjoon-hyun/SPARK-44935. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- 3.4.0/scala2.12-java11-ubuntu/Dockerfile | 1 + 3.4.1/scala2.12-java11-ubuntu/Dockerfile | 1 + 3.4.2/scala2.12-java11-ubuntu/Dockerfile | 1 + 3.4.3/scala2.12-java11-ubuntu/Dockerfile | 1 + 3.5.0/scala2.12-java11-ubuntu/Dockerfile | 1 + 3.5.0/scala2.12-java17-ubuntu/Dockerfile | 1 + 3.5.1/scala2.12-java11-ubuntu/Dockerfile | 1 + 3.5.1/scala2.12-java17-ubuntu/Dockerfile | 1 + 3.5.2/scala2.12-java11-ubuntu/Dockerfile | 1 + 3.5.2/scala2.12-java17-ubuntu/Dockerfile | 1 + 4.0.0-preview1/scala2.13-java17-ubuntu/Dockerfile | 1 + Dockerfile.template | 1 + 12 files changed, 12 insertions(+) diff --git a/3.4.0/scala2.12-java11-ubuntu/Dockerfile b/3.4.0/scala2.12-java11-ubuntu/Dockerfile index a4b081e..c756d8a 100644 --- a/3.4.0/scala2.12-java11-ubuntu/Dockerfile +++ b/3.4.0/scala2.12-java11-ubuntu/Dockerfile @@ -55,6 +55,7 @@ RUN set -ex; \ tar -xf spark.tgz --strip-components=1; \ chown -R spark:spark .; \ mv jars /opt/spark/; \ +mv RELEASE /opt/spark/; \ mv bin /opt/spark/; \ mv sbin /opt/spark/; \ mv kubernetes/dockerfiles/spark/decom.sh /opt/; \ diff --git a/3.4.1/scala2.12-java11-ubuntu/Dockerfile b/3.4.1/scala2.12-java11-ubuntu/Dockerfile index d8bba7e..c18afb0 100644 --- a/3.4.1/scala2.12-java11-ubuntu/Dockerfile +++ b/3.4.1/scala2.12-java11-ubuntu/Dockerfile @@ -55,6 +55,7 @@ RUN set -ex; \ tar -xf spark.tgz --strip-components=1; \ chown -R spark:spark .; \ mv jars /opt/spark/; \ +mv RELEASE /opt/spark/; \ mv bin /opt/spark/; \ mv sbin /opt/spark/; \ mv kubernetes/dockerfiles/spark/decom.sh /opt/; \ diff --git a/3.4.2/scala2.12-java11-ubuntu/Dockerfile b/3.4.2/scala2.12-java11-ubuntu/Dockerfile index 2a472f9..4c28cc0 100644 --- a/3.4.2/scala2.12-java11-ubuntu/Dockerfile +++ b/3.4.2/scala2.12-java11-ubuntu/Dockerfile @@ -55,6 +55,7 @@ RUN set -ex; \ tar -xf spark.tgz --strip-components=1; \ chown -R spark:spark .; \ mv jars /opt/spark/; \ +mv RELEASE /opt/spark/; \ mv bin /opt/spark/; \ mv sbin /opt/spark/; \ mv kubernetes/dockerfiles/spark/decom.sh /opt/; \ diff --git a/3.4.3/scala2.12-java11-ubuntu/Dockerfile b/3.4.3/scala2.12-java11-ubuntu/Dockerfile index b749f07..4432dea 100644 --- a/3.4.3/scala2.12-java11-ubuntu/Dockerfile +++ b/3.4.3/scala2.12-java11-ubuntu/Dockerfile @@ -55,6 +55,7 @@ RUN set -ex; \ tar -xf spark.tgz --strip-components=1; \ chown -R spark:spark .; \ mv jars /opt/spark/; \ +mv RELEASE /opt/spark/; \ mv bin /opt/spark/; \ mv sbin /opt/spark/; \ mv kubernetes/dockerfiles/spark/decom.sh /opt/; \ diff --git a/3.5.0/scala2.12-java11-ubuntu/Dockerfile b/3.5.0/scala2.12-java11-ubuntu/Dockerfile index 15f4b31..afc7e9a 100644 --- a/3.5.0/scala2.12-java11-ubuntu/Dockerfile +++ b/3.5.0/scala2.12-java11-ubuntu/Dockerfile @@ -55,6 +55,7 @@ RUN set -ex; \ tar -xf spark.tgz --strip-components=1; \ chown -R spark:spark .; \ mv jars /opt/spark/; \ +mv RELEASE /opt/spark/; \ mv bin /opt/spark/; \ mv sbin /opt/spark/; \ mv kubernetes/dockerfiles/spark/decom.sh /opt/; \ diff --git a/3.5.0/scala2.12-java17-ubuntu/Dockerfile b/3.5.0/scala2.12-java17-ubuntu/Dockerfile index a2749bb..61c6687 100644 --- a/3.5.0/scala2.12-java17-ubuntu/Dockerfile +++ b/3.5.0/scala2.12-java17-ubuntu/Dockerfile @@ -55,6 +55,7 @@ RUN set -ex; \ tar -xf spark.tgz --strip-components=1; \ chown -R spark:spark .; \ mv jars /opt/spark/; \ +mv RELEASE /opt/spark/; \ mv bin /opt/spark/; \ mv sbin /opt/spark/; \ mv kubernetes/dockerfiles/spark/decom.sh /opt
(spark-docker) branch master updated: [SPARK-45497] Add a symbolic link file `spark-examples.jar` in K8s Docker images
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new b69d21d [SPARK-45497] Add a symbolic link file `spark-examples.jar` in K8s Docker images b69d21d is described below commit b69d21da5da1f5d35ea0125188a700c4ca9897bb Author: Dongjoon Hyun AuthorDate: Wed Sep 18 16:09:30 2024 -0700 [SPARK-45497] Add a symbolic link file `spark-examples.jar` in K8s Docker images ### What changes were proposed in this pull request? This PR aims to add a symbolic link file, `spark-examples.jar`, in the example jar directory. Apache Spark repository is updated already via - https://github.com/apache/spark/pull/43324 ``` $ docker run -it --rm spark:latest ls -al /opt/spark/examples/jars | tail -n6 total 1620 drwxr-xr-x 1 root root4096 Oct 11 04:37 . drwxr-xr-x 1 root root4096 Sep 9 02:08 .. -rw-r--r-- 1 root root 78803 Sep 9 02:08 scopt_2.12-3.7.1.jar -rw-r--r-- 1 root root 1564255 Sep 9 02:08 spark-examples_2.12-3.5.0.jar lrwxrwxrwx 1 root root 29 Oct 11 04:37 spark-examples.jar -> spark-examples_2.12-3.5.0.jar ``` ### Why are the changes needed? Like PySpark example (`pi.py`), we can submit the examples without considering the version numbers which was painful before. ``` bin/spark-submit \ --master k8s://$K8S_MASTER \ --deploy-mode cluster \ ... --class org.apache.spark.examples.SparkPi \ local:///opt/spark/examples/jars/spark-examples.jar 1 ``` The following is the driver pod log. ``` + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit ... --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.examples.SparkPi local:///opt/spark/examples/jars/spark-examples.jar 1 Files local:///opt/spark/examples/jars/spark-examples.jar from /opt/spark/examples/jars/spark-examples.jar to /opt/spark/work-dir/./spark-examples.jar ``` ### Does this PR introduce _any_ user-facing change? No, this is an additional file. ### How was this patch tested? Manually build the docker image and do `ls`. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #67 from dongjoon-hyun/SPARK-45497. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- 4.0.0-preview1/scala2.13-java17-ubuntu/Dockerfile | 1 + Dockerfile.template | 1 + 2 files changed, 2 insertions(+) diff --git a/4.0.0-preview1/scala2.13-java17-ubuntu/Dockerfile b/4.0.0-preview1/scala2.13-java17-ubuntu/Dockerfile index 0a487dd..2685a87 100644 --- a/4.0.0-preview1/scala2.13-java17-ubuntu/Dockerfile +++ b/4.0.0-preview1/scala2.13-java17-ubuntu/Dockerfile @@ -59,6 +59,7 @@ RUN set -ex; \ mv sbin /opt/spark/; \ mv kubernetes/dockerfiles/spark/decom.sh /opt/; \ mv examples /opt/spark/; \ +ln -s "$(basename $(ls /opt/spark/examples/jars/spark-examples_*.jar))" /opt/spark/examples/jars/spark-examples.jar; \ mv kubernetes/tests /opt/spark/; \ mv data /opt/spark/; \ mv python/pyspark /opt/spark/python/pyspark/; \ diff --git a/Dockerfile.template b/Dockerfile.template index 3d0aacf..c19e961 100644 --- a/Dockerfile.template +++ b/Dockerfile.template @@ -59,6 +59,7 @@ RUN set -ex; \ mv sbin /opt/spark/; \ mv kubernetes/dockerfiles/spark/decom.sh /opt/; \ mv examples /opt/spark/; \ +ln -s "$(basename $(ls /opt/spark/examples/jars/spark-examples_*.jar))" /opt/spark/examples/jars/spark-examples.jar; \ mv kubernetes/tests /opt/spark/; \ mv data /opt/spark/; \ mv python/pyspark /opt/spark/python/pyspark/; \ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark-docker) branch master updated: [SPARK-49701] Use JDK for Spark 3.5+ Docker image
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new daa6f94 [SPARK-49701] Use JDK for Spark 3.5+ Docker image daa6f94 is described below commit daa6f940a133d63a4e813bff811b98b1f05f1c4a Author: Dongjoon Hyun AuthorDate: Wed Sep 18 16:08:32 2024 -0700 [SPARK-49701] Use JDK for Spark 3.5+ Docker image ### What changes were proposed in this pull request? This PR aims to use JDK for Spark 3.5+ Docker image. Apache Spark Dockerfile are updated already. - https://github.com/apache/spark/pull/45762 - https://github.com/apache/spark/pull/45761 ### Why are the changes needed? Since Apache Spark 3.5.0, SPARK-44153 starts to use `jmap` like the following. - https://github.com/apache/spark/pull/41709 https://github.com/apache/spark/blob/c832e2ac1d04668c77493577662c639785808657/core/src/main/scala/org/apache/spark/util/Utils.scala#L2030 ### Does this PR introduce _any_ user-facing change? Yes, the user can use `Heap Histogram` feature. ### How was this patch tested? Pass the CIs. Closes #66 from dongjoon-hyun/SPARK-49701. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- 3.5.0/scala2.12-java17-ubuntu/Dockerfile | 2 +- 3.5.1/scala2.12-java17-ubuntu/Dockerfile | 2 +- 3.5.2/scala2.12-java17-ubuntu/Dockerfile | 2 +- 4.0.0-preview1/scala2.13-java17-ubuntu/Dockerfile | 2 +- add-dockerfiles.sh| 2 +- 5 files changed, 5 insertions(+), 5 deletions(-) diff --git a/3.5.0/scala2.12-java17-ubuntu/Dockerfile b/3.5.0/scala2.12-java17-ubuntu/Dockerfile index ed29cba..a2749bb 100644 --- a/3.5.0/scala2.12-java17-ubuntu/Dockerfile +++ b/3.5.0/scala2.12-java17-ubuntu/Dockerfile @@ -14,7 +14,7 @@ # See the License for the specific language governing permissions and # limitations under the License. # -FROM eclipse-temurin:17-jre-jammy +FROM eclipse-temurin:17-jammy ARG spark_uid=185 diff --git a/3.5.1/scala2.12-java17-ubuntu/Dockerfile b/3.5.1/scala2.12-java17-ubuntu/Dockerfile index 562d938..1682e72 100644 --- a/3.5.1/scala2.12-java17-ubuntu/Dockerfile +++ b/3.5.1/scala2.12-java17-ubuntu/Dockerfile @@ -14,7 +14,7 @@ # See the License for the specific language governing permissions and # limitations under the License. # -FROM eclipse-temurin:17-jre-jammy +FROM eclipse-temurin:17-jammy ARG spark_uid=185 diff --git a/3.5.2/scala2.12-java17-ubuntu/Dockerfile b/3.5.2/scala2.12-java17-ubuntu/Dockerfile index 280bd0c..34b1214 100644 --- a/3.5.2/scala2.12-java17-ubuntu/Dockerfile +++ b/3.5.2/scala2.12-java17-ubuntu/Dockerfile @@ -14,7 +14,7 @@ # See the License for the specific language governing permissions and # limitations under the License. # -FROM eclipse-temurin:17-jre-jammy +FROM eclipse-temurin:17-jammy ARG spark_uid=185 diff --git a/4.0.0-preview1/scala2.13-java17-ubuntu/Dockerfile b/4.0.0-preview1/scala2.13-java17-ubuntu/Dockerfile index 1102caf..0a487dd 100644 --- a/4.0.0-preview1/scala2.13-java17-ubuntu/Dockerfile +++ b/4.0.0-preview1/scala2.13-java17-ubuntu/Dockerfile @@ -14,7 +14,7 @@ # See the License for the specific language governing permissions and # limitations under the License. # -FROM eclipse-temurin:17-jre-jammy +FROM eclipse-temurin:17-jammy ARG spark_uid=185 diff --git a/add-dockerfiles.sh b/add-dockerfiles.sh index 63f610c..ccc4ac1 100755 --- a/add-dockerfiles.sh +++ b/add-dockerfiles.sh @@ -72,7 +72,7 @@ for TAG in $TAGS; do fi if echo $TAG | grep -q "java17"; then -OPTS+=" --java-version 17 --image eclipse-temurin:17-jre-jammy" +OPTS+=" --java-version 17 --image eclipse-temurin:17-jammy" elif echo $TAG | grep -q "java11"; then OPTS+=" --java-version 11 --image eclipse-temurin:11-jre-focal" fi - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-49691][PYTHON][CONNECT] Function `substring` should accept column names
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ed3a9b1aa929 [SPARK-49691][PYTHON][CONNECT] Function `substring` should accept column names ed3a9b1aa929 is described below commit ed3a9b1aa92957015592b399167a960b68b73beb Author: Ruifeng Zheng AuthorDate: Wed Sep 18 09:28:09 2024 -0700 [SPARK-49691][PYTHON][CONNECT] Function `substring` should accept column names ### What changes were proposed in this pull request? Function `substring` should accept column names ### Why are the changes needed? Bug fix: ``` In [1]: >>> import pyspark.sql.functions as sf ...: >>> df = spark.createDataFrame([('Spark', 2, 3)], ['s', 'p', 'l']) ...: >>> df.select('*', sf.substring('s', 'p', 'l')).show() ``` works in PySpark Classic, but fail in Connect with: ``` NumberFormatException Traceback (most recent call last) Cell In[2], line 1 > 1 df.select('*', sf.substring('s', 'p', 'l')).show() File ~/Dev/spark/python/pyspark/sql/connect/dataframe.py:1170, in DataFrame.show(self, n, truncate, vertical) 1169 def show(self, n: int = 20, truncate: Union[bool, int] = True, vertical: bool = False) -> None: -> 1170 print(self._show_string(n, truncate, vertical)) File ~/Dev/spark/python/pyspark/sql/connect/dataframe.py:927, in DataFrame._show_string(self, n, truncate, vertical) 910 except ValueError: 911 raise PySparkTypeError( 912 errorClass="NOT_BOOL", 913 messageParameters={ (...) 916 }, 917 ) 919 table, _ = DataFrame( 920 plan.ShowString( 921 child=self._plan, 922 num_rows=n, 923 truncate=_truncate, 924 vertical=vertical, 925 ), 926 session=self._session, --> 927 )._to_table() 928 return table[0][0].as_py() File ~/Dev/spark/python/pyspark/sql/connect/dataframe.py:1844, in DataFrame._to_table(self) 1842 def _to_table(self) -> Tuple["pa.Table", Optional[StructType]]: 1843 query = self._plan.to_proto(self._session.client) -> 1844 table, schema, self._execution_info = self._session.client.to_table( 1845 query, self._plan.observations 1846 ) 1847 assert table is not None 1848 return (table, schema) File ~/Dev/spark/python/pyspark/sql/connect/client/core.py:892, in SparkConnectClient.to_table(self, plan, observations) 890 req = self._execute_plan_request_with_metadata() 891 req.plan.CopyFrom(plan) --> 892 table, schema, metrics, observed_metrics, _ = self._execute_and_fetch(req, observations) 894 # Create a query execution object. 895 ei = ExecutionInfo(metrics, observed_metrics) File ~/Dev/spark/python/pyspark/sql/connect/client/core.py:1517, in SparkConnectClient._execute_and_fetch(self, req, observations, self_destruct) 1514 properties: Dict[str, Any] = {} 1516 with Progress(handlers=self._progress_handlers, operation_id=req.operation_id) as progress: -> 1517 for response in self._execute_and_fetch_as_iterator( 1518 req, observations, progress=progress 1519 ): 1520 if isinstance(response, StructType): 1521 schema = response File ~/Dev/spark/python/pyspark/sql/connect/client/core.py:1494, in SparkConnectClient._execute_and_fetch_as_iterator(self, req, observations, progress) 1492 raise kb 1493 except Exception as error: -> 1494 self._handle_error(error) File ~/Dev/spark/python/pyspark/sql/connect/client/core.py:1764, in SparkConnectClient._handle_error(self, error) 1762 self.thread_local.inside_error_handling = True 1763 if isinstance(error, grpc.RpcError): -> 1764 self._handle_rpc_error(error) 1765 elif isinstance(error, ValueError): 1766 if "Cannot invoke RPC" in str(error) and "closed" in str(error): File ~/Dev/spark/python/pyspark/sql/connect/client/core.py:1840, in SparkConnectClient._handle_rpc_error(self, rpc_error) 1837 if info.metadata["errorClass"] == "INVALID_HANDLE.SESSION_CHANGED": 1838 self._closed = True -> 1840 raise convert_exception( 1841 info, 1842 status.message, 1843
(spark) branch master updated: [SPARK-49495][DOCS][FOLLOWUP] Enable GitHub Pages settings via .asf.yml
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new b86e5d2ab1fb [SPARK-49495][DOCS][FOLLOWUP] Enable GitHub Pages settings via .asf.yml b86e5d2ab1fb is described below commit b86e5d2ab1fb17f8dcbb5b4d50f3361494270438 Author: Kent Yao AuthorDate: Wed Sep 18 07:44:42 2024 -0700 [SPARK-49495][DOCS][FOLLOWUP] Enable GitHub Pages settings via .asf.yml ### What changes were proposed in this pull request? A followup of SPARK-49495 to enable GitHub Pages settings via [.asf.yaml](https://cwiki.apache.org/confluence/pages/viewpage.action?spaceKey=INFRA&title=git+-+.asf.yaml+features#Git.asf.yamlfeatures-GitHubPages) ### Why are the changes needed? Meet the requirement for `actions/configure-pagesv5` action ``` Run actions/configure-pagesv5 with: token: *** enablement: false env: SPARK_TESTING: 1 RELEASE_VERSION: In-Progress JAVA_HOME: /opt/hostedtoolcache/Java_Zulu_jdk/17.0.1[2](https://github.com/apache/spark/actions/runs/10916383676/job/30297716064#step:10:2)-7/x64 JAVA_HOME_17_X64: /opt/hostedtoolcache/Java_Zulu_jdk/17.0.12-7/x64 pythonLocation: /opt/hostedtoolcache/Python/[3](https://github.com/apache/spark/actions/runs/10916383676/job/30297716064#step:10:3).9.19/x64 PKG_CONFIG_PATH: /opt/hostedtoolcache/Python/3.9.19/x6[4](https://github.com/apache/spark/actions/runs/10916383676/job/30297716064#step:10:4)/lib/pkgconfig Python_ROOT_DIR: /opt/hostedtoolcache/Python/3.9.19/x[6](https://github.com/apache/spark/actions/runs/10916383676/job/30297716064#step:10:6)4 Python2_ROOT_DIR: /opt/hostedtoolcache/Python/3.9.19/x64 Python3_ROOT_DIR: /opt/hostedtoolcache/Python/3.[9](https://github.com/apache/spark/actions/runs/10916383676/job/30297716064#step:10:9).19/x64 LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.9.19/x64/lib Error: Get Pages site failed. Please verify that the repository has Pages enabled and configured to build using GitHub Actions, or consider exploring the `enablement` parameter for this action. Error: Not Found - https://docs.github.com/rest/pages/pages#get-a-apiname-pages-site Error: HttpError: Not Found - https://docs.github.com/rest/pages/pages#get-a-apiname-pages-site ``` ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? NA ### Was this patch authored or co-authored using generative AI tooling? no Closes #48141 from yaooqinn/SPARK-49495-FF. Authored-by: Kent Yao Signed-off-by: Dongjoon Hyun --- .asf.yaml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/.asf.yaml b/.asf.yaml index 22042b355b2f..91a5f9b2bb1a 100644 --- a/.asf.yaml +++ b/.asf.yaml @@ -31,6 +31,8 @@ github: merge: false squash: true rebase: true + ghp_branch: master + ghp_path: /docs/_site notifications: pullrequests: revi...@spark.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r71714 - /dev/spark/v3.5.3-rc3-bin/ /release/spark/spark-3.5.3/
Author: dongjoon Date: Wed Sep 18 04:20:19 2024 New Revision: 71714 Log: Release Apache Spark 3.5.3 Added: release/spark/spark-3.5.3/ - copied from r71713, dev/spark/v3.5.3-rc3-bin/ Removed: dev/spark/v3.5.3-rc3-bin/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-49682][BUILD] Upgrade joda-time to 2.13.0
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4590538df095 [SPARK-49682][BUILD] Upgrade joda-time to 2.13.0 4590538df095 is described below commit 4590538df095b20c0736ecc992ed9c0dfb926c0e Author: panbingkun AuthorDate: Tue Sep 17 21:14:52 2024 -0700 [SPARK-49682][BUILD] Upgrade joda-time to 2.13.0 ### What changes were proposed in this pull request? The pr aims to upgrade joda-time from `2.12.7` to `2.13.0`. ### Why are the changes needed? The version `DateTimeZone` data updated to version `2024bgtz`. The full release notes: https://www.joda.org/joda-time/changes-report.html#a2.13.0 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48130 from panbingkun/SPARK-49682. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- pom.xml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index e1ac039f2546..9871cc0bca04 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -146,7 +146,7 @@ jjwt-api/0.12.6//jjwt-api-0.12.6.jar jline/2.14.6//jline-2.14.6.jar jline/3.25.1//jline-3.25.1.jar jna/5.14.0//jna-5.14.0.jar -joda-time/2.12.7//joda-time-2.12.7.jar +joda-time/2.13.0//joda-time-2.13.0.jar jodd-core/3.5.2//jodd-core-3.5.2.jar jpam/1.1//jpam-1.1.jar json/1.8//json-1.8.jar diff --git a/pom.xml b/pom.xml index b9f28eb61925..694ea31e6f37 100644 --- a/pom.xml +++ b/pom.xml @@ -199,7 +199,7 @@ 2.11.0 3.1.9 3.0.12 -2.12.7 +2.13.0 3.5.2 3.0.0 2.2.11 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-49687][SQL] Delay sorting in `validateAndMaybeEvolveStateSchema`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new c38844c9ecc6 [SPARK-49687][SQL] Delay sorting in `validateAndMaybeEvolveStateSchema` c38844c9ecc6 is described below commit c38844c9ecc6dd648500b2ef6ff01acbe46255f4 Author: Zhihong Yu AuthorDate: Tue Sep 17 10:58:05 2024 -0700 [SPARK-49687][SQL] Delay sorting in `validateAndMaybeEvolveStateSchema` ### What changes were proposed in this pull request? In `validateAndMaybeEvolveStateSchema`, existing schema and new schema are sorted by column family name. The sorting can be delayed until `createSchemaFile` is called. When computing `colFamiliesAddedOrRemoved`, we can use `toSet` to compare column families. ### Why are the changes needed? This would make `validateAndMaybeEvolveStateSchema` faster. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48116 from tedyu/ty-comp-chk. Authored-by: Zhihong Yu Signed-off-by: Dongjoon Hyun --- .../streaming/state/StateSchemaCompatibilityChecker.scala | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateSchemaCompatibilityChecker.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateSchemaCompatibilityChecker.scala index 3a1793f71794..721d72b6a099 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateSchemaCompatibilityChecker.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateSchemaCompatibilityChecker.scala @@ -168,12 +168,12 @@ class StateSchemaCompatibilityChecker( newStateSchema: List[StateStoreColFamilySchema], ignoreValueSchema: Boolean, stateSchemaVersion: Int): Boolean = { -val existingStateSchemaList = getExistingKeyAndValueSchema().sortBy(_.colFamilyName) -val newStateSchemaList = newStateSchema.sortBy(_.colFamilyName) +val existingStateSchemaList = getExistingKeyAndValueSchema() +val newStateSchemaList = newStateSchema if (existingStateSchemaList.isEmpty) { // write the schema file if it doesn't exist - createSchemaFile(newStateSchemaList, stateSchemaVersion) + createSchemaFile(newStateSchemaList.sortBy(_.colFamilyName), stateSchemaVersion) true } else { // validate if the new schema is compatible with the existing schema @@ -188,9 +188,9 @@ class StateSchemaCompatibilityChecker( } } val colFamiliesAddedOrRemoved = -newStateSchemaList.map(_.colFamilyName) != existingStateSchemaList.map(_.colFamilyName) +(newStateSchemaList.map(_.colFamilyName).toSet != existingSchemaMap.keySet) if (stateSchemaVersion == SCHEMA_FORMAT_V3 && colFamiliesAddedOrRemoved) { -createSchemaFile(newStateSchemaList, stateSchemaVersion) +createSchemaFile(newStateSchemaList.sortBy(_.colFamilyName), stateSchemaVersion) } // TODO: [SPARK-49535] Write Schema files after schema has changed for StateSchemaV3 colFamiliesAddedOrRemoved - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark-kubernetes-operator) branch main updated: [SPARK-49657] Add multi instances e2e
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git The following commit(s) were added to refs/heads/main by this push: new e161205 [SPARK-49657] Add multi instances e2e e161205 is described below commit e161205c781338114ddb86acafa2d0b2e19e05af Author: Qi Tan AuthorDate: Mon Sep 16 21:27:01 2024 -0700 [SPARK-49657] Add multi instances e2e ### What changes were proposed in this pull request? Add e2e test for two instances of Spark Operator running at the same time. ### Why are the changes needed? There is one scenario, in a cluster, user deployed multi instances of operator and make them watching on several namespaces. For example, operator A in default namespace, watching on namespace spark-1... operator B in default-2 namespace, watching on namespace spark-2.. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? test locally ### Was this patch authored or co-authored using generative AI tooling? no Closes #126 from TQJADE/multi-instance. Authored-by: Qi Tan Signed-off-by: Dongjoon Hyun --- .../dynamic-config-values-2.yaml} | 43 +++ tests/e2e/watched-namespaces/chainsaw-test.yaml| 50 +- tests/e2e/watched-namespaces/spark-example.yaml| 2 + 3 files changed, 76 insertions(+), 19 deletions(-) diff --git a/tests/e2e/watched-namespaces/spark-example.yaml b/tests/e2e/helm/dynamic-config-values-2.yaml similarity index 54% copy from tests/e2e/watched-namespaces/spark-example.yaml copy to tests/e2e/helm/dynamic-config-values-2.yaml index dba59ab..aacc0f1 100644 --- a/tests/e2e/watched-namespaces/spark-example.yaml +++ b/tests/e2e/helm/dynamic-config-values-2.yaml @@ -1,4 +1,3 @@ -# # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. @@ -6,27 +5,35 @@ # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # -#http://www.apache.org/licenses/LICENSE-2.0 +# http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. -# -apiVersion: spark.apache.org/v1alpha1 -kind: SparkApplication -metadata: - name: spark-job-succeeded-test - namespace: ($SPARK_APP_NAMESPACE) -spec: - mainClass: "org.apache.spark.examples.SparkPi" - jars: "local:///opt/spark/examples/jars/spark-examples_2.13-4.0.0-preview1.jar" - sparkConf: -spark.executor.instances: "1" -spark.kubernetes.container.image: "spark:4.0.0-preview1-scala2.13-java17-ubuntu" -spark.kubernetes.authenticate.driver.serviceAccountName: "spark" - runtimeVersions: -sparkVersion: 4.0.0-preview1 -scalaVersion: "2.13" \ No newline at end of file +workloadResources: + namespaces: +overrideWatchedNamespaces: false +data: + - "spark-3" + role: +create: true + roleBinding: +create: true + clusterRole: +name: spark-workload-clusterrole-2 + +operatorConfiguration: + dynamicConfig: +enable: true +create: true +data: + spark.kubernetes.operator.watchedNamespaces: "spark-3" + +operatorRbac: + clusterRole: +name: "spark-operator-clusterrole-2" + clusterRoleBinding: +name: "spark-operator-clusterrolebinding-2" \ No newline at end of file diff --git a/tests/e2e/watched-namespaces/chainsaw-test.yaml b/tests/e2e/watched-namespaces/chainsaw-test.yaml index 82ed409..fdffa0a 100644 --- a/tests/e2e/watched-namespaces/chainsaw-test.yaml +++ b/tests/e2e/watched-namespaces/chainsaw-test.yaml @@ -71,4 +71,52 @@ spec: content: | kubectl delete sparkapplication spark-job-succeeded-test -n spark-1 --ignore-not-found=true kubectl delete sparkapplication spark-job-succeeded-test -n spark-2 --ignore-not-found=true -kubectl replace -f spark-operator-dynamic-config-2.yaml \ No newline at end of file +kubectl replace -f spark-operator-dynamic-config-2.yaml + - try: + - script: + content: | +echo "Installing another spark operator in default-2 namespaces, watching on namespace: spark-3" +helm install spark-kubernetes-operator -n default-2 --create-namespace -f \ +../../../build-tools/helm/spark-kube
(spark-kubernetes-operator) branch main updated: [SPARK-49658] Refactor e2e tests pipelines
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git The following commit(s) were added to refs/heads/main by this push: new 2c6c507 [SPARK-49658] Refactor e2e tests pipelines 2c6c507 is described below commit 2c6c50757f94d639ad97445477fb1a022812dc0b Author: Qi Tan AuthorDate: Mon Sep 16 21:24:45 2024 -0700 [SPARK-49658] Refactor e2e tests pipelines ### What changes were proposed in this pull request? e2e pipelines refactor ### Why are the changes needed? * Current helm installation does not require creation of additional clusterrolebinding, serviceacount etc. Remove it. * Current e2e pipeline run 3 times(because of the dimension of test-group) run of dynamic operator installation and watched-namespaces tests for one k8s version. Reduce it to 1 run only per k8s version. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? test locally ### Was this patch authored or co-authored using generative AI tooling? no Closes #125 from TQJADE/workflow-fix. Authored-by: Qi Tan Signed-off-by: Dongjoon Hyun --- .github/workflows/build_and_test.yml | 25 - 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index 851a09d..bb30476 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -69,12 +69,23 @@ jobs: kubernetes-version: - "1.28.0" - "1.31.0" +mode: + - dynamic + - static test-group: - spark-versions - python - state-transition -dynamic-config-test-group: - watched-namespaces +exclude: + - mode: dynamic +test-group: spark-versions + - mode: dynamic +test-group: python + - mode: dynamic +test-group: state-transition + - mode: static +test-group: watched-namespaces steps: - name: Checkout repository uses: actions/checkout@v4 @@ -101,8 +112,8 @@ jobs: kubectl get pods -A kubectl describe node - name: Run Spark K8s Operator on K8S with Dynamic Configuration Disabled +if: matrix.mode == 'static' run: | - kubectl create clusterrolebinding serviceaccounts-cluster-admin --clusterrole=cluster-admin --group=system:serviceaccounts || true eval $(minikube docker-env) ./gradlew buildDockerImage ./gradlew spark-operator-api:relocateGeneratedCRD @@ -111,20 +122,24 @@ jobs: # Use remote host' s docker image minikube docker-env --unset - name: Run E2E Test with Dynamic Configuration Disabled -run: | +if: matrix.mode == 'static' +run: | chainsaw test --test-dir ./tests/e2e/${{ matrix.test-group }} --parallel 2 - name: Run Spark K8s Operator on K8S with Dynamic Configuration Enabled +if: matrix.mode == 'dynamic' run: | - helm uninstall spark-kubernetes-operator eval $(minikube docker-env) + ./gradlew buildDockerImage + ./gradlew spark-operator-api:relocateGeneratedCRD helm install spark-kubernetes-operator --create-namespace -f \ build-tools/helm/spark-kubernetes-operator/values.yaml -f \ tests/e2e/helm/dynamic-config-values.yaml \ build-tools/helm/spark-kubernetes-operator/ minikube docker-env --unset - name: Run E2E Test with Dynamic Configuration Enabled +if: matrix.mode == 'dynamic' run: | - chainsaw test --test-dir ./tests/e2e/${{ matrix.dynamic-config-test-group }} --parallel 2 + chainsaw test --test-dir ./tests/e2e/${{ matrix.test-group }} --parallel 2 lint: name: "Linter and documentation" - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-49678][CORE] Support `spark.test.master` in `SparkSubmitArguments`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 370453adba17 [SPARK-49678][CORE] Support `spark.test.master` in `SparkSubmitArguments` 370453adba17 is described below commit 370453adba1730b5412750b34e87a35147d71aa2 Author: Dongjoon Hyun AuthorDate: Mon Sep 16 20:53:35 2024 -0700 [SPARK-49678][CORE] Support `spark.test.master` in `SparkSubmitArguments` ### What changes were proposed in this pull request? This PR aims to support `spark.test.master` in `SparkSubmitArguments`. ### Why are the changes needed? To allow users to control the default master setting during testing and documentation generation. First, currently, we cannot build `Python Documentation` on M3 Max (and high-core machines) without this. Only it succeeds on GitHub Action runners (4 cores) or equivalent low-core docker run. Please try the following on your Macs. **BEFORE** ``` $ build/sbt package -Phive-thriftserver $ cd python/docs $ make html ... java.lang.OutOfMemoryError: Java heap space ... 24/09/16 14:09:55 WARN PythonRunner: Incomplete task 7.0 in stage 30 (TID 177) interrupted: Attempting to kill Python Worker ... make: *** [html] Error 2 ``` **AFTER** ``` $ build/sbt package -Phive-thriftserver $ cd python/docs $ JDK_JAVA_OPTIONS="-Dspark.test.master=local[1]" make html ... build succeeded. The HTML pages are in build/html. ``` Second, in general, we can control all `SparkSubmit` (eg. Spark Shells) like the following. **BEFORE (`local[*]`)** ``` $ bin/pyspark Python 3.9.19 (main, Jun 17 2024, 15:39:29) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin Type "help", "copyright", "credits" or "license" for more information. WARNING: Using incubator modules: jdk.incubator.vector Using Spark's default log4j profile: org/apache/spark/log4j2-pattern-layout-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 24/09/16 13:53:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 4.0.0-SNAPSHOT /_/ Using Python version 3.9.19 (main, Jun 17 2024 15:39:29) Spark context Web UI available at http://localhost:4040 Spark context available as 'sc' (master = local[*], app id = local-1726519982935). SparkSession available as 'spark'. >>> ``` **AFTER (`local[1]`)** ``` $ JDK_JAVA_OPTIONS="-Dspark.test.master=local[1]" bin/pyspark NOTE: Picked up JDK_JAVA_OPTIONS: -Dspark.test.master=local[1] Python 3.9.19 (main, Jun 17 2024, 15:39:29) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin Type "help", "copyright", "credits" or "license" for more information. NOTE: Picked up JDK_JAVA_OPTIONS: -Dspark.test.master=local[1] NOTE: Picked up JDK_JAVA_OPTIONS: -Dspark.test.master=local[1] WARNING: Using incubator modules: jdk.incubator.vector Using Spark's default log4j profile: org/apache/spark/log4j2-pattern-layout-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 24/09/16 13:51:03 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 4.0.0-SNAPSHOT /_/ Using Python version 3.9.19 (main, Jun 17 2024 15:39:29) Spark context Web UI available at http://localhost:4040 Spark context available as 'sc' (master = local[1], app id = local-1726519863363). SparkSession available as 'spark'. >>> ``` ### Does this PR introduce _any_ user-facing change? No. `spark.test.master` is a new parameter. ### How was this patch tested? Manual tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48126 from dongjoon-hyun/SPARK-49678. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala | 3 ++- 1 file changed
(spark) branch master updated: [SPARK-49680][PYTHON] Limit `Sphinx` build parallelism to 4 by default
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 294af6e31639 [SPARK-49680][PYTHON] Limit `Sphinx` build parallelism to 4 by default 294af6e31639 is described below commit 294af6e31639d6f6ac51f961f319866f077b5302 Author: Dongjoon Hyun AuthorDate: Mon Sep 16 20:52:28 2024 -0700 [SPARK-49680][PYTHON] Limit `Sphinx` build parallelism to 4 by default ### What changes were proposed in this pull request? This PR aims to limit `Sphinx` build parallelism to 4 by default for the following goals. - This will preserve the same speed in GitHub Action environment. - This will prevent the exhaustive `SparkSubmit` invocation in large machines like `c6i.24xlarge`. - The user still can override by providing `SPHINXOPTS`. ### Why are the changes needed? `Sphinx` parallelism feature was added via the following on 2024-01-10. - #44680 However, unfortunately, this breaks Python API doc generation in large machines because this means the number of parallel `SparkSubmit` invocation of PySpark. In addition, given that each `PySpark` currently is launched with `local[*]`, this ends up `N * N` `pyspark.daemon`s. In other words, as of today, this default setting, `auto`, seems to work on low-core machine like `GitHub Action` runners (4 cores). For example, this breaks `Python` documentations build even on M3 Max environment and this is worse on large EC2 machines (c7i.24xlarge). You can see the failure locally like this. ``` $ build/sbt package -Phive-thriftserver $ cd python/docs $ make html ... 24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. 24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042. 24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043. 24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. 24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042. 24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043. 24/09/16 17:04:38 WARN Utils: Service 'SparkUI' could not bind on port 4043. Attempting port 4044. 24/09/16 17:04:39 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. 24/09/16 17:04:39 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042. 24/09/16 17:04:39 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043. 24/09/16 17:04:39 WARN Utils: Service 'SparkUI' could not bind on port 4043. Attempting port 4044. 24/09/16 17:04:39 WARN Utils: Service 'SparkUI' could not bind on port 4044. Attempting port 4045. ... java.lang.OutOfMemoryError: Java heap space ... 24/09/16 14:09:55 WARN PythonRunner: Incomplete task 7.0 in stage 30 (TID 177) interrupted: Attempting to kill Python Worker ... make: *** [html] Error 2 ``` ### Does this PR introduce _any_ user-facing change? No, this is a dev-only change. ### How was this patch tested? Pass the CIs and do manual tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48129 from dongjoon-hyun/SPARK-49680. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- python/docs/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/python/docs/Makefile b/python/docs/Makefile index 5058c1206171..428b0d24b568 100644 --- a/python/docs/Makefile +++ b/python/docs/Makefile @@ -16,7 +16,7 @@ # Minimal makefile for Sphinx documentation # You can set these variables from the command line. -SPHINXOPTS?= "-W" "-j" "auto" +SPHINXOPTS?= "-W" "-j" "4" SPHINXBUILD ?= sphinx-build SOURCEDIR ?= source BUILDDIR ?= build - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r71610 - in /dev/spark/v4.0.0-preview2-rc1-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/R/articles/ _site/api/R/articles/sparkr-vignettes_files/ _site/api/R/articles/sparkr-vignettes_
Author: dongjoon Date: Mon Sep 16 06:20:30 2024 New Revision: 71610 Log: Apache Spark v4.0.0-preview2-rc1 docs [This commit notification would consist of 4901 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r71609 - /dev/spark/v4.0.0-preview2-rc1-bin/
Author: dongjoon Date: Mon Sep 16 04:21:15 2024 New Revision: 71609 Log: Apache Spark v4.0.0-preview2-rc1 Added: dev/spark/v4.0.0-preview2-rc1-bin/ dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz (with props) dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz.asc dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz.sha512 dev/spark/v4.0.0-preview2-rc1-bin/pyspark-4.0.0.dev2.tar.gz (with props) dev/spark/v4.0.0-preview2-rc1-bin/pyspark-4.0.0.dev2.tar.gz.asc dev/spark/v4.0.0-preview2-rc1-bin/pyspark-4.0.0.dev2.tar.gz.sha512 dev/spark/v4.0.0-preview2-rc1-bin/pyspark_connect-4.0.0.dev2.tar.gz (with props) dev/spark/v4.0.0-preview2-rc1-bin/pyspark_connect-4.0.0.dev2.tar.gz.asc dev/spark/v4.0.0-preview2-rc1-bin/pyspark_connect-4.0.0.dev2.tar.gz.sha512 dev/spark/v4.0.0-preview2-rc1-bin/spark-4.0.0-preview2-bin-hadoop3.tgz (with props) dev/spark/v4.0.0-preview2-rc1-bin/spark-4.0.0-preview2-bin-hadoop3.tgz.asc dev/spark/v4.0.0-preview2-rc1-bin/spark-4.0.0-preview2-bin-hadoop3.tgz.sha512 dev/spark/v4.0.0-preview2-rc1-bin/spark-4.0.0-preview2-bin-without-hadoop.tgz (with props) dev/spark/v4.0.0-preview2-rc1-bin/spark-4.0.0-preview2-bin-without-hadoop.tgz.asc dev/spark/v4.0.0-preview2-rc1-bin/spark-4.0.0-preview2-bin-without-hadoop.tgz.sha512 dev/spark/v4.0.0-preview2-rc1-bin/spark-4.0.0-preview2.tgz (with props) dev/spark/v4.0.0-preview2-rc1-bin/spark-4.0.0-preview2.tgz.asc dev/spark/v4.0.0-preview2-rc1-bin/spark-4.0.0-preview2.tgz.sha512 Added: dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz == Binary file - no diff available. Propchange: dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz.asc == --- dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz.asc (added) +++ dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz.asc Mon Sep 16 04:21:15 2024 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJIBAABCgAyFiEE8oycklwYjDXjRWFN7aAM6DTw/FwFAmbnrf8UHGRvbmdqb29u +QGFwYWNoZS5vcmcACgkQ7aAM6DTw/Fz4mxAAi4XIEwNrYyR9BZmVDYwugMureFuN +B0+8b04SUdFO0DEIG1Lr5P3B2M1Ku8dRoKSZ0dyECWHqGzqIk+fW5Su2A6Jz7FPY +RUwLQgd2CrP1He08gpS1vvjD0sOU2pJq4pGIh0BKwFKhTneUxvUO88jNTooPkRcJ +GU8Zd68/2h5iAKl+qGn9wL8g3Vh856+TwgKoD2g/P4kB5LstoYo/l7cdooRFs/B+ +vGwNsaNWD0JCMNCKY7E1TWNqk5l8vjQqZZ3VkPhrUIFmmXYbb92af89ro4YeRGZx +rLZeKWRUcp8DukOp4Qa0vMY67sPjW1uYPpx4Qy/vmnwjQ9+hsqEq29k0zlwhOn/B +agHnsNPJM2LVabtr+H5/bZOh2Oovyb3rFenHS4sgIE7khycmtyAc9VMZynRt+hLo +9IrK4GQ4/OtgXnE9U/hq/s3DdtgyWy3pqRhWi3cFlEkWAiUrKwTHmv7V4Drk4jVV +vHawTH5RF1ZhjQsFfrX+tk4Rkws3qyj8LXdKOLh3f8eG4c2kIy8RwBETmUE6w6zN +RQvA+gCFiBoBPVnsOZ9umgFHqdwCUet5vMEle9oRp5qPZwtBksECOr43mXWLkJmn +odq4j7nFcY4o7D76lwO1OOKncIAbewbwJWXQWeAUQqkv0UhEKfgMQm0JDRkIruyr +2V085CdzD5hmN3A= +=Hbfw +-END PGP SIGNATURE- Added: dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz.sha512 == --- dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz.sha512 (added) +++ dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz.sha512 Mon Sep 16 04:21:15 2024 @@ -0,0 +1 @@ +c1ed617c4eea52dd30fd933e19211dd400aa3fb412e9504fe90d5a35383aac0ae2690d8aab5b623abc7d94b73c3c544ff538fa3f74c055f528962c863f823394 SparkR_4.0.0-preview2.tar.gz Added: dev/spark/v4.0.0-preview2-rc1-bin/pyspark-4.0.0.dev2.tar.gz == Binary file - no diff available. Propchange: dev/spark/v4.0.0-preview2-rc1-bin/pyspark-4.0.0.dev2.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v4.0.0-preview2-rc1-bin/pyspark-4.0.0.dev2.tar.gz.asc == --- dev/spark/v4.0.0-preview2-rc1-bin/pyspark-4.0.0.dev2.tar.gz.asc (added) +++ dev/spark/v4.0.0-preview2-rc1-bin/pyspark-4.0.0.dev2.tar.gz.asc Mon Sep 16 04:21:15 2024 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJIBAABCgAyFiEE8oycklwYjDXjRWFN7aAM6DTw/FwFAmbnrgEUHGRvbmdqb29u +QGFwYWNoZS5vcmcACgkQ7aAM6DTw/Fy4yg//ZtSKpiZeUCdZFfk4iupr6rbzIzJl +p4b2tdquNm37zbh/kdR7jwFfXxkXR0AEHnRHy1H4htVFkL28VijVCzin7qkbXe0w +wfA0B5XUaStCx3ri7M8OOhLsZZUVTdXFyab0qq21Dd09THTphsg8UpQ9XqGOYjlq +fRbc2BykgqPxN5fqATXXTw1PtC6ej3Eup5VdDGvcgHN0Fbw3XgDRolpTvbMvk2Vx +V0avnMd5OzugLlHujW7LErJz1ugYIJKDUjtYnu9iYdFcBLP6HFwiMEu0MptjjM9T +Tojyj6qCZJ2mBd5BKuzLxY0PrlwS/EZWkao6gRuJ+TbiudSHza4UOonATt3HqeOw +3WgIrf14fveQT6jX4KoP4aSBrWAWiZF+BhDermh0Dq3ksSyj4RK2gHWlcyIdm+6j
(spark) 01/01: Preparing Spark release v4.0.0-preview2-rc1
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to tag v4.0.0-preview2-rc1 in repository https://gitbox.apache.org/repos/asf/spark.git commit f0d465e09b8d89d5e56ec21f4bd7e3ecbeeb318a Author: Dongjoon Hyun AuthorDate: Mon Sep 16 03:31:26 2024 + Preparing Spark release v4.0.0-preview2-rc1 --- R/pkg/R/sparkR.R | 4 ++-- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- common/utils/pom.xml | 2 +- common/variant/pom.xml | 2 +- connector/avro/pom.xml | 2 +- connector/connect/client/jvm/pom.xml | 2 +- connector/docker-integration-tests/pom.xml | 2 +- connector/kafka-0-10-assembly/pom.xml | 2 +- connector/kafka-0-10-sql/pom.xml | 2 +- connector/kafka-0-10-token-provider/pom.xml| 2 +- connector/kafka-0-10/pom.xml | 2 +- connector/kinesis-asl-assembly/pom.xml | 2 +- connector/kinesis-asl/pom.xml | 2 +- connector/profiler/pom.xml | 2 +- connector/protobuf/pom.xml | 2 +- connector/spark-ganglia-lgpl/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 6 +++--- examples/pom.xml | 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/yarn/pom.xml | 2 +- sql/api/pom.xml| 2 +- sql/catalyst/pom.xml | 2 +- sql/connect/common/pom.xml | 2 +- sql/connect/server/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 46 files changed, 49 insertions(+), 49 deletions(-) diff --git a/R/pkg/R/sparkR.R b/R/pkg/R/sparkR.R index 29c05b0db7c2..2b57ca77a4ed 100644 --- a/R/pkg/R/sparkR.R +++ b/R/pkg/R/sparkR.R @@ -461,8 +461,8 @@ sparkR.session <- function( # Check if version number of SparkSession matches version number of SparkR package jvmVersion <- callJMethod(sparkSession, "version") - # Remove -SNAPSHOT from jvm versions - jvmVersionStrip <- gsub("-SNAPSHOT", "", jvmVersion, fixed = TRUE) + # Remove -preview2 from jvm versions + jvmVersionStrip <- gsub("-preview2", "", jvmVersion, fixed = TRUE) rPackageVersion <- paste0(packageVersion("SparkR")) if (jvmVersionStrip != rPackageVersion) { diff --git a/assembly/pom.xml b/assembly/pom.xml index 01bd324efc11..b72cf3ef3f3b 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.13 -4.0.0-SNAPSHOT +4.0.0-preview2 ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index 046648e9c2ae..322279d66e17 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.13 -4.0.0-SNAPSHOT +4.0.0-preview2 ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index cdb5bd72158a..93bca7177ce0 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.13 -4.0.0-SNAPSHOT +4.0.0-preview2 ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index 0f7036ef
(spark) tag v4.0.0-preview2-rc1 created (now f0d465e09b8d)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to tag v4.0.0-preview2-rc1 in repository https://gitbox.apache.org/repos/asf/spark.git at f0d465e09b8d (commit) This tag includes the following new commits: new f0d465e09b8d Preparing Spark release v4.0.0-preview2-rc1 The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r71607 - /dev/spark/v4.0.0-preview2-rc1-bin/
Author: dongjoon Date: Mon Sep 16 01:35:01 2024 New Revision: 71607 Log: Remove v4.0.0-preview2-rc1-bin Removed: dev/spark/v4.0.0-preview2-rc1-bin/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) tag v4.0.0-preview2-rc1 deleted (was 383afc7aca30)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to tag v4.0.0-preview2-rc1 in repository https://gitbox.apache.org/repos/asf/spark.git *** WARNING: tag v4.0.0-preview2-rc1 was deleted! *** was 383afc7aca30 Preparing Spark release v4.0.0-preview2-rc1 This change permanently discards the following revisions: discard 383afc7aca30 Preparing Spark release v4.0.0-preview2-rc1 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-49655][BUILD] Link `python3` to `python3.9` in `spark-rm` Docker image
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new b4f4d9b7a7d4 [SPARK-49655][BUILD] Link `python3` to `python3.9` in `spark-rm` Docker image b4f4d9b7a7d4 is described below commit b4f4d9b7a7d470158af39d75824bcc501e3506da Author: Dongjoon Hyun AuthorDate: Sun Sep 15 18:17:34 2024 -0700 [SPARK-49655][BUILD] Link `python3` to `python3.9` in `spark-rm` Docker image ### What changes were proposed in this pull request? This PR aims to link `python3` to `python3.9` in `spark-rm` docker image. ### Why are the changes needed? We already link `python` to `python3.9`. https://github.com/apache/spark/blob/931ab065df3952487028316ebd49c2895d947bf2/dev/create-release/spark-rm/Dockerfile#L139 We need to link `python3` to `python3.9` to fix Spark Documentation generation failure in release script. ``` $ dev/create-release/do-release-docker.sh -d /run/user/1000/spark -s docs ... = Building documentation... Command: /opt/spark-rm/release-build.sh docs Log file: docs.log Command FAILED. Check full logs for details. from /opt/spark-rm/output/spark/docs/.local_ruby_bundle/ruby/3.0.0/gems/jekyll-4.3.3/lib/jekyll/command.rb:91:in `process_with_graceful_fail' ``` The root cause is `mkdocs` module import error during `error-conditions.html` generation. ### Does this PR introduce _any_ user-facing change? No. This is a release-script. ### How was this patch tested? Manual review. After this PR, `error docs` generation succeeds. ``` * Building error docs. * Generated: docs/_generated/error-conditions.html ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48117 from dongjoon-hyun/SPARK-49655. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- dev/create-release/spark-rm/Dockerfile | 1 + 1 file changed, 1 insertion(+) diff --git a/dev/create-release/spark-rm/Dockerfile b/dev/create-release/spark-rm/Dockerfile index e7f558b523d0..3cba72d042ed 100644 --- a/dev/create-release/spark-rm/Dockerfile +++ b/dev/create-release/spark-rm/Dockerfile @@ -137,6 +137,7 @@ RUN python3.9 -m pip list RUN gem install --no-document "bundler:2.4.22" RUN ln -s "$(which python3.9)" "/usr/local/bin/python" +RUN ln -s "$(which python3.9)" "/usr/local/bin/python3" WORKDIR /opt/spark-rm/output - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r71606 - /dev/spark/v4.0.0-preview2-rc1-bin/
Author: dongjoon Date: Sun Sep 15 23:24:31 2024 New Revision: 71606 Log: Apache Spark v4.0.0-preview2-rc1 Added: dev/spark/v4.0.0-preview2-rc1-bin/ dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz (with props) dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz.asc dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz.sha512 dev/spark/v4.0.0-preview2-rc1-bin/pyspark-4.0.0.dev2.tar.gz (with props) dev/spark/v4.0.0-preview2-rc1-bin/pyspark-4.0.0.dev2.tar.gz.asc dev/spark/v4.0.0-preview2-rc1-bin/pyspark-4.0.0.dev2.tar.gz.sha512 dev/spark/v4.0.0-preview2-rc1-bin/pyspark_connect-4.0.0.dev2.tar.gz (with props) dev/spark/v4.0.0-preview2-rc1-bin/pyspark_connect-4.0.0.dev2.tar.gz.asc dev/spark/v4.0.0-preview2-rc1-bin/pyspark_connect-4.0.0.dev2.tar.gz.sha512 dev/spark/v4.0.0-preview2-rc1-bin/spark-4.0.0-preview2-bin-hadoop3.tgz (with props) dev/spark/v4.0.0-preview2-rc1-bin/spark-4.0.0-preview2-bin-hadoop3.tgz.asc dev/spark/v4.0.0-preview2-rc1-bin/spark-4.0.0-preview2-bin-hadoop3.tgz.sha512 dev/spark/v4.0.0-preview2-rc1-bin/spark-4.0.0-preview2-bin-without-hadoop.tgz (with props) dev/spark/v4.0.0-preview2-rc1-bin/spark-4.0.0-preview2-bin-without-hadoop.tgz.asc dev/spark/v4.0.0-preview2-rc1-bin/spark-4.0.0-preview2-bin-without-hadoop.tgz.sha512 dev/spark/v4.0.0-preview2-rc1-bin/spark-4.0.0-preview2.tgz (with props) dev/spark/v4.0.0-preview2-rc1-bin/spark-4.0.0-preview2.tgz.asc dev/spark/v4.0.0-preview2-rc1-bin/spark-4.0.0-preview2.tgz.sha512 Added: dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz == Binary file - no diff available. Propchange: dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz.asc == --- dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz.asc (added) +++ dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz.asc Sun Sep 15 23:24:31 2024 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJIBAABCgAyFiEE8oycklwYjDXjRWFN7aAM6DTw/FwFAmbnaEoUHGRvbmdqb29u +QGFwYWNoZS5vcmcACgkQ7aAM6DTw/Fxasg//YTaWh8pe9fzQFS88xg7y57LYn7wZ +H1Yp27Zp9bnFcVzfdbCHxVB+2vyCXRf9ssMXuepbluIaH7C6esljwU85RMd8xWK6 +AA3HyoQFfUGG+ItC4eVUeZmrq8C0sCd7f8NEt4x7WgSuvDBRmkt3cmRN42kqCPeo +rbglfmpHH4Dkh43LCecPTIgraJ44+1k7mUd5RhjEQ/21rxa2SpBAqpfhT4lL5wvl +cQzr690pmDb2+tSMhmbEfrLU3gmUDy9HnlvNUGK1/MHEUfjAcxMGeue7B+PX9eue +pOyRWYeMMhXoM+CVij4gmnJEUelBDGzLlYOtzd6A3REy5XYkh+3Jpryv3Zc+6iVH +YQbbhO7eSEf3XPQJkz/dcX80UG0mVpsHMnEyOufOdB4hrKjjnsUa4u45PJ3h+Kt+ +R6KnIv79QT/m9IGSTEdl3rCHf1WvXHwUNnOW/XltyHDCceVHemMfA/qScw7IhEDR +ZeIlz1+qbFptdDznmbrJQRu33L0Td0brSggaLFPrOjN0UV0mpmZIk8MN0cFoImmp +XE8hPKIHSd9YFLi4VD6n3cUSnBORHQIXNIWW59HCgRiBJJmMV+8Hh8Vm/LDF9UuN +1/wjCwF1KzQOtp8eC/GmmPUJROk2mfmEp7jlGVqPmYuCnzfrpZy833628ZDpoYjI +JoG3U7goRsAjq1g= +=c4Pk +-END PGP SIGNATURE- Added: dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz.sha512 == --- dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz.sha512 (added) +++ dev/spark/v4.0.0-preview2-rc1-bin/SparkR_4.0.0-preview2.tar.gz.sha512 Sun Sep 15 23:24:31 2024 @@ -0,0 +1 @@ +92ebfbcedd6e9f3f74b142815f0852c3f98935521863a5dca27ab76cdd90a81d7c26b5f5621b7dbb5bed1605c276b7677dec90391c13cb2095bc4095163dabe3 SparkR_4.0.0-preview2.tar.gz Added: dev/spark/v4.0.0-preview2-rc1-bin/pyspark-4.0.0.dev2.tar.gz == Binary file - no diff available. Propchange: dev/spark/v4.0.0-preview2-rc1-bin/pyspark-4.0.0.dev2.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v4.0.0-preview2-rc1-bin/pyspark-4.0.0.dev2.tar.gz.asc == --- dev/spark/v4.0.0-preview2-rc1-bin/pyspark-4.0.0.dev2.tar.gz.asc (added) +++ dev/spark/v4.0.0-preview2-rc1-bin/pyspark-4.0.0.dev2.tar.gz.asc Sun Sep 15 23:24:31 2024 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJIBAABCgAyFiEE8oycklwYjDXjRWFN7aAM6DTw/FwFAmbnaEwUHGRvbmdqb29u +QGFwYWNoZS5vcmcACgkQ7aAM6DTw/FyjdA//f/G8gA1KHtiDgt7eH4jozIYPL2lU +5xyo8VjI+/iatzYxemsTaUdPnv8PDpdDYE/c7PuXqwXPaXUBwQFE+ReAxnFrhvLM +yLMjOGYnZoDyHOs38IMASQz5k8L4YceknCWWHmHwkeCl8N0BVMXFozWi4tlz2CWR +jeB1VA9gUUNu8OXw2WcIhtKU6CvaoOhVc3TTb16Ma2m4cViATBZChvQi6E47sGEb +mBkwIWCkX+d6NnL0LlqYo6af/CSVZMMfLLFcja4G5j1iAWsPhihvH7rRQZPHaavH +oSPZj8+7Iy0y5YQbB9f+pt66AptUftNUvJgTAqyTn1iO0LiFHldXAFcxibh3cFYz +XxZcy/mzY+48umCE9J4Wq7YvLC0RM/wXweQU7JXslAT5m1p74chf/Ax9RO9OxWu2
(spark) 01/01: Preparing Spark release v4.0.0-preview2-rc1
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to tag v4.0.0-preview2-rc1 in repository https://gitbox.apache.org/repos/asf/spark.git commit 383afc7aca302579e7b9094d5890e95bc045e49a Author: Dongjoon Hyun AuthorDate: Sun Sep 15 22:25:30 2024 + Preparing Spark release v4.0.0-preview2-rc1 --- R/pkg/R/sparkR.R | 4 ++-- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- common/utils/pom.xml | 2 +- common/variant/pom.xml | 2 +- connector/avro/pom.xml | 2 +- connector/connect/client/jvm/pom.xml | 2 +- connector/docker-integration-tests/pom.xml | 2 +- connector/kafka-0-10-assembly/pom.xml | 2 +- connector/kafka-0-10-sql/pom.xml | 2 +- connector/kafka-0-10-token-provider/pom.xml| 2 +- connector/kafka-0-10/pom.xml | 2 +- connector/kinesis-asl-assembly/pom.xml | 2 +- connector/kinesis-asl/pom.xml | 2 +- connector/profiler/pom.xml | 2 +- connector/protobuf/pom.xml | 2 +- connector/spark-ganglia-lgpl/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 6 +++--- examples/pom.xml | 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/yarn/pom.xml | 2 +- sql/api/pom.xml| 2 +- sql/catalyst/pom.xml | 2 +- sql/connect/common/pom.xml | 2 +- sql/connect/server/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 46 files changed, 49 insertions(+), 49 deletions(-) diff --git a/R/pkg/R/sparkR.R b/R/pkg/R/sparkR.R index 29c05b0db7c2..2b57ca77a4ed 100644 --- a/R/pkg/R/sparkR.R +++ b/R/pkg/R/sparkR.R @@ -461,8 +461,8 @@ sparkR.session <- function( # Check if version number of SparkSession matches version number of SparkR package jvmVersion <- callJMethod(sparkSession, "version") - # Remove -SNAPSHOT from jvm versions - jvmVersionStrip <- gsub("-SNAPSHOT", "", jvmVersion, fixed = TRUE) + # Remove -preview2 from jvm versions + jvmVersionStrip <- gsub("-preview2", "", jvmVersion, fixed = TRUE) rPackageVersion <- paste0(packageVersion("SparkR")) if (jvmVersionStrip != rPackageVersion) { diff --git a/assembly/pom.xml b/assembly/pom.xml index 01bd324efc11..b72cf3ef3f3b 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.13 -4.0.0-SNAPSHOT +4.0.0-preview2 ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index 046648e9c2ae..322279d66e17 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.13 -4.0.0-SNAPSHOT +4.0.0-preview2 ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index cdb5bd72158a..93bca7177ce0 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.13 -4.0.0-SNAPSHOT +4.0.0-preview2 ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index 0f7036ef
(spark) tag v4.0.0-preview2-rc1 created (now 383afc7aca30)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to tag v4.0.0-preview2-rc1 in repository https://gitbox.apache.org/repos/asf/spark.git at 383afc7aca30 (commit) This tag includes the following new commits: new 383afc7aca30 Preparing Spark release v4.0.0-preview2-rc1 The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-48355][SQL][TESTS][FOLLOWUP] Disable a test case failing on non-ANSI mode
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 1346531ccc6e [SPARK-48355][SQL][TESTS][FOLLOWUP] Disable a test case failing on non-ANSI mode 1346531ccc6e is described below commit 1346531ccc6ee814d5b357158a4c4aed2bf1d573 Author: Dongjoon Hyun AuthorDate: Sat Sep 14 21:10:29 2024 -0700 [SPARK-48355][SQL][TESTS][FOLLOWUP] Disable a test case failing on non-ANSI mode ### What changes were proposed in this pull request? This PR is a follow-up of https://github.com/apache/spark/pull/47672 to disable a test case failing on non-ANSI mode. ### Why are the changes needed? To recover non-ANSI CI. - https://github.com/apache/spark/actions/workflows/build_non_ansi.yml ### Does this PR introduce _any_ user-facing change? No, this is a test-only change. ### How was this patch tested? Manual review. ``` $ SPARK_ANSI_SQL_MODE=false build/sbt "sql/testOnly *.SqlScriptingInterpreterSuite" ... [info] - simple case mismatched types !!! IGNORED !!! [info] All tests passed. [success] Total time: 24 s, completed Sep 14, 2024, 7:51:15 PM ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48115 from dongjoon-hyun/SPARK-48355. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .../org/apache/spark/sql/scripting/SqlScriptingInterpreterSuite.scala | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/scripting/SqlScriptingInterpreterSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/scripting/SqlScriptingInterpreterSuite.scala index 3fad99eba509..bc2adec5be3d 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/scripting/SqlScriptingInterpreterSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/scripting/SqlScriptingInterpreterSuite.scala @@ -701,7 +701,8 @@ class SqlScriptingInterpreterSuite extends QueryTest with SharedSparkSession { verifySqlScriptResult(commands, expected) } - test("simple case mismatched types") { + // This is disabled because it fails in non-ANSI mode + ignore("simple case mismatched types") { val commands = """ |BEGIN - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: Revert "[SPARK-49531][PYTHON][CONNECT] Support line plot with plotly backend"
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new fa6a0786bb4b Revert "[SPARK-49531][PYTHON][CONNECT] Support line plot with plotly backend" fa6a0786bb4b is described below commit fa6a0786bb4b23a895e68a721df9ee88684c4fab Author: Dongjoon Hyun AuthorDate: Sat Sep 14 17:57:35 2024 -0700 Revert "[SPARK-49531][PYTHON][CONNECT] Support line plot with plotly backend" This reverts commit 3b8dddac65bce6f88f51e23e777d521d65fa3373. --- dev/sparktestsupport/modules.py| 4 - python/pyspark/errors/error-conditions.json| 5 - python/pyspark/sql/classic/dataframe.py| 5 - python/pyspark/sql/connect/dataframe.py| 5 - python/pyspark/sql/dataframe.py| 27 - python/pyspark/sql/plot/__init__.py| 21 python/pyspark/sql/plot/core.py| 135 - python/pyspark/sql/plot/plotly.py | 30 - .../sql/tests/connect/test_parity_frame_plot.py| 36 -- .../tests/connect/test_parity_frame_plot_plotly.py | 36 -- python/pyspark/sql/tests/plot/__init__.py | 16 --- python/pyspark/sql/tests/plot/test_frame_plot.py | 79 .../sql/tests/plot/test_frame_plot_plotly.py | 64 -- python/pyspark/sql/utils.py| 17 --- python/pyspark/testing/sqlutils.py | 7 -- .../org/apache/spark/sql/internal/SQLConf.scala| 27 - 16 files changed, 514 deletions(-) diff --git a/dev/sparktestsupport/modules.py b/dev/sparktestsupport/modules.py index b9a4bed715f6..34fbb8450d54 100644 --- a/dev/sparktestsupport/modules.py +++ b/dev/sparktestsupport/modules.py @@ -548,8 +548,6 @@ pyspark_sql = Module( "pyspark.sql.tests.test_udtf", "pyspark.sql.tests.test_utils", "pyspark.sql.tests.test_resources", -"pyspark.sql.tests.plot.test_frame_plot", -"pyspark.sql.tests.plot.test_frame_plot_plotly", ], ) @@ -1053,8 +1051,6 @@ pyspark_connect = Module( "pyspark.sql.tests.connect.test_parity_arrow_cogrouped_map", "pyspark.sql.tests.connect.test_parity_python_datasource", "pyspark.sql.tests.connect.test_parity_python_streaming_datasource", -"pyspark.sql.tests.connect.test_parity_frame_plot", -"pyspark.sql.tests.connect.test_parity_frame_plot_plotly", "pyspark.sql.tests.connect.test_utils", "pyspark.sql.tests.connect.client.test_artifact", "pyspark.sql.tests.connect.client.test_artifact_localcluster", diff --git a/python/pyspark/errors/error-conditions.json b/python/pyspark/errors/error-conditions.json index 92aeb15e21d1..4061d024a83c 100644 --- a/python/pyspark/errors/error-conditions.json +++ b/python/pyspark/errors/error-conditions.json @@ -1088,11 +1088,6 @@ "Function `` should use only POSITIONAL or POSITIONAL OR KEYWORD arguments." ] }, - "UNSUPPORTED_PLOT_BACKEND": { -"message": [ - "`` is not supported, it should be one of the values from " -] - }, "UNSUPPORTED_SIGNATURE": { "message": [ "Unsupported signature: ." diff --git a/python/pyspark/sql/classic/dataframe.py b/python/pyspark/sql/classic/dataframe.py index d174f7774cc5..91b959162590 100644 --- a/python/pyspark/sql/classic/dataframe.py +++ b/python/pyspark/sql/classic/dataframe.py @@ -58,7 +58,6 @@ from pyspark.sql.column import Column from pyspark.sql.classic.column import _to_seq, _to_list, _to_java_column from pyspark.sql.readwriter import DataFrameWriter, DataFrameWriterV2 from pyspark.sql.merge import MergeIntoWriter -from pyspark.sql.plot import PySparkPlotAccessor from pyspark.sql.streaming import DataStreamWriter from pyspark.sql.types import ( StructType, @@ -1863,10 +1862,6 @@ class DataFrame(ParentDataFrame, PandasMapOpsMixin, PandasConversionMixin): messageParameters={"member": "queryExecution"}, ) -@property -def plot(self) -> PySparkPlotAccessor: -return PySparkPlotAccessor(self) - class DataFrameNaFunctions(ParentDataFrameNaFunctions): def __init__(self, df: ParentDataFrame): diff --git a/python/pyspark/sql/connect/dataframe.py b/python/pyspark/sql/connect/dataframe.py index e3b1d35b2d5d..768abd655d49 100644 --- a/python/pyspark/sql/connect/dataframe.py +++ b/python/pyspark/sql/connect/dataframe.py @@ -83,7 +83,6 @@ from pyspark.sql.connect.expressions import ( UnresolvedStar, ) from pyspark.sql.connect.functions import builtin as F -from pyspark.
(spark) branch master updated: [SPARK-49649][DOCS] Make `docs/index.md` up-to-date for 4.0.0
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 2250b35be6a2 [SPARK-49649][DOCS] Make `docs/index.md` up-to-date for 4.0.0 2250b35be6a2 is described below commit 2250b35be6a24c777d6fa82b1c6a7a10a6854895 Author: Dongjoon Hyun AuthorDate: Fri Sep 13 20:49:01 2024 -0700 [SPARK-49649][DOCS] Make `docs/index.md` up-to-date for 4.0.0 ### What changes were proposed in this pull request? This PR aims to update Spark documentation landing page (`docs/index.md`) for Apache Spark 4.0.0-preview2 release. ### Why are the changes needed? - [SPARK-45314 Drop Scala 2.12 and make Scala 2.13 by default](https://issues.apache.org/jira/browse/SPARK-45314) - #46228 - #47842 - [SPARK-45923 Spark Kubernetes Operator](https://issues.apache.org/jira/browse/SPARK-45923) ### Does this PR introduce _any_ user-facing change? No because this is a documentation-only change. ### How was this patch tested? Manual review. https://github.com/user-attachments/assets/bdbd0e61-d71a-41ca-aa1b-1b0805813a45";> https://github.com/user-attachments/assets/e13a6bba-2149-48fa-983d-c5399defdc70";> https://github.com/user-attachments/assets/721c7760-bc2e-444c-9209-174e3119c2b4";> ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48113 from dongjoon-hyun/SPARK-49649. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- docs/index.md | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/docs/index.md b/docs/index.md index 7e57eddb6da8..fea62865e216 100644 --- a/docs/index.md +++ b/docs/index.md @@ -34,9 +34,8 @@ source, visit [Building Spark](building-spark.html). Spark runs on both Windows and UNIX-like systems (e.g. Linux, Mac OS), and it should run on any platform that runs a supported version of Java. This should include JVMs on x86_64 and ARM64. It's easy to run locally on one machine --- all you need is to have `java` installed on your system `PATH`, or the `JAVA_HOME` environment variable pointing to a Java installation. -Spark runs on Java 17/21, Scala 2.13, Python 3.8+, and R 3.5+. -When using the Scala API, it is necessary for applications to use the same version of Scala that Spark was compiled for. -For example, when using Scala 2.13, use Spark compiled for 2.13, and compile code/applications for Scala 2.13 as well. +Spark runs on Java 17/21, Scala 2.13, Python 3.9+, and R 3.5+ (Deprecated). +When using the Scala API, it is necessary for applications to use the same version of Scala that Spark was compiled for. Since Spark 4.0.0, it's Scala 2.13. # Running the Examples and Shell @@ -110,7 +109,7 @@ options for deployment: * [Spark Streaming](streaming-programming-guide.html): processing data streams using DStreams (old API) * [MLlib](ml-guide.html): applying machine learning algorithms * [GraphX](graphx-programming-guide.html): processing graphs -* [SparkR](sparkr.html): processing data with Spark in R +* [SparkR (Deprecated)](sparkr.html): processing data with Spark in R * [PySpark](api/python/getting_started/index.html): processing data with Spark in Python * [Spark SQL CLI](sql-distributed-sql-engine-spark-sql-cli.html): processing data with SQL on the command line @@ -128,10 +127,13 @@ options for deployment: * [Cluster Overview](cluster-overview.html): overview of concepts and components when running on a cluster * [Submitting Applications](submitting-applications.html): packaging and deploying applications * Deployment modes: - * [Amazon EC2](https://github.com/amplab/spark-ec2): scripts that let you launch a cluster on EC2 in about 5 minutes * [Standalone Deploy Mode](spark-standalone.html): launch a standalone cluster quickly without a third-party cluster manager * [YARN](running-on-yarn.html): deploy Spark on top of Hadoop NextGen (YARN) - * [Kubernetes](running-on-kubernetes.html): deploy Spark on top of Kubernetes + * [Kubernetes](running-on-kubernetes.html): deploy Spark apps on top of Kubernetes directly + * [Amazon EC2](https://github.com/amplab/spark-ec2): scripts that let you launch a cluster on EC2 in about 5 minutes +* [Spark Kubernetes Operator](https://github.com/apache/spark-kubernetes-operator): + * [SparkApp](https://github.com/apache/spark-kubernetes-operator/blob/main/examples/pyspark-pi.yaml): deploy Spark apps on top of Kubernetes via [operator patterns](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/) + * [SparkCluster](https://github.com/apache/spark-kubernetes-operator/blob/main/examples/cluster-with-template.yaml): deploy Spark clusters on top of Kubernetes via [operator pa
(spark) branch master updated: [SPARK-49648][DOCS] Update `Configuring Ports for Network Security` section with JWS
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new df0e34c5a1c3 [SPARK-49648][DOCS] Update `Configuring Ports for Network Security` section with JWS df0e34c5a1c3 is described below commit df0e34c5a1c30956cb16e8af5569ed72387b6fc3 Author: Dongjoon Hyun AuthorDate: Fri Sep 13 18:09:48 2024 -0700 [SPARK-49648][DOCS] Update `Configuring Ports for Network Security` section with JWS ### What changes were proposed in this pull request? This PR aims to update `Configuring Ports for Network Security` section of `Security` page with new JWS feature. ### Why are the changes needed? In addition to the existing restriction, Spark 4 can take advantage of new JWS feature. This PR informs it more clearly. https://github.com/apache/spark/blob/08a26bb56cfb48f27c68a79be1e15bc4c9e466e0/docs/security.md?plain=1#L811-L814 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual review. https://github.com/user-attachments/assets/2250e65b-cddd-4541-b42f-5284d5ce4b02";> https://github.com/user-attachments/assets/0c853380-081a-41a3-b66b-7774ec62fd3e";> ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48112 from dongjoon-hyun/SPARK-49648. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- docs/security.md | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/docs/security.md b/docs/security.md index a8f4e4ec5389..b97abfeacf24 100644 --- a/docs/security.md +++ b/docs/security.md @@ -55,7 +55,8 @@ To enable authorization, Spark Master should have `spark.master.rest.filters=org.apache.spark.ui.JWSFilter` and `spark.org.apache.spark.ui.JWSFilter.param.secretKey=BASE64URL-ENCODED-KEY` configurations, and client should provide HTTP `Authorization` header which contains JSON Web Token signed by -the shared secret key. +the shared secret key. Please note that this feature requires a Spark distribution built with +`jjwt` profile. ### YARN @@ -813,6 +814,12 @@ They are generally private services, and should only be accessible within the ne organization that deploys Spark. Access to the hosts and ports used by Spark services should be limited to origin hosts that need to access the services. +However, like the REST Submission port, Spark also supports HTTP `Authorization` header +with a cryptographically signed JSON Web Token (JWT) for all UI ports. +To use it, a user needs the Spark distribution built with `jjwt` profile and to configure +`spark.ui.filters=org.apache.spark.ui.JWSFilter` and +`spark.org.apache.spark.ui.JWSFilter.param.secretKey=BASE64URL-ENCODED-KEY`. + Below are the primary ports that Spark uses for its communication and how to configure those ports. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (08a26bb56cfb -> d3eb99f79e50)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 08a26bb56cfb [SPARK-48779][SQL][TESTS] Improve collation support testing - add golden files add d3eb99f79e50 [SPARK-49647][TESTS] Change SharedSparkContext so that its SparkConf loads defaults No new revisions were added by this update. Summary of changes: core/src/test/scala/org/apache/spark/SharedSparkContext.scala | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark-kubernetes-operator) branch main updated: [SPARK-49645] Update `e2e/python/chainsaw-test.yaml` to use non-R image
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git The following commit(s) were added to refs/heads/main by this push: new b2adda7 [SPARK-49645] Update `e2e/python/chainsaw-test.yaml` to use non-R image b2adda7 is described below commit b2adda7d18ca05ea4da1161d13a1fcf26d98f5d1 Author: Dongjoon Hyun AuthorDate: Fri Sep 13 09:57:52 2024 -0700 [SPARK-49645] Update `e2e/python/chainsaw-test.yaml` to use non-R image ### What changes were proposed in this pull request? This PR aims to update `e2e/python/chainsaw-test.yaml` to use non-R image. This is the only instance we have. ``` $ git grep '\-r-' tests/e2e/python/chainsaw-test.yaml: value: 'spark:3.5.2-scala2.12-java17-python3-r-ubuntu' ``` ### Why are the changes needed? New image is 36% smaller. ``` $ docker images | grep 3.5.2 spark 3.5.2-scala2.12-java17-python3-r-ubuntu 16362acf4adb 4 weeks ago 1.52GB spark 3.5.2-scala2.12-java17-python3-ubuntu a79b6b6ef9a4 4 weeks ago 985MB ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #123 from dongjoon-hyun/SPARK-49645. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- tests/e2e/python/chainsaw-test.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/e2e/python/chainsaw-test.yaml b/tests/e2e/python/chainsaw-test.yaml index ede8d73..4147d2f 100644 --- a/tests/e2e/python/chainsaw-test.yaml +++ b/tests/e2e/python/chainsaw-test.yaml @@ -27,7 +27,7 @@ spec: - name: "SCALA_VERSION" value: "2.12" - name: "IMAGE" - value: 'spark:3.5.2-scala2.12-java17-python3-r-ubuntu' + value: 'spark:3.5.2-scala2.12-java17-python3-ubuntu' steps: - name: install-spark-application try: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-49234][BUILD][FOLLOWUP] Add `LICENSE-xz.txt` to `licenses-binary` folder
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f92e9489fb23 [SPARK-49234][BUILD][FOLLOWUP] Add `LICENSE-xz.txt` to `licenses-binary` folder f92e9489fb23 is described below commit f92e9489fb23a85195067cd0f0f5cd9e9d00b138 Author: Dongjoon Hyun AuthorDate: Fri Sep 13 09:45:46 2024 -0700 [SPARK-49234][BUILD][FOLLOWUP] Add `LICENSE-xz.txt` to `licenses-binary` folder ### What changes were proposed in this pull request? This PR aims to add `LICENSE-xz.txt` to `licenses-binary` folder. ### Why are the changes needed? To provide the license properly. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48107 from dongjoon-hyun/SPARK-49234-2. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- licenses-binary/LICENSE-xz.txt | 11 +++ 1 file changed, 11 insertions(+) diff --git a/licenses-binary/LICENSE-xz.txt b/licenses-binary/LICENSE-xz.txt new file mode 100644 index ..4322122aecf1 --- /dev/null +++ b/licenses-binary/LICENSE-xz.txt @@ -0,0 +1,11 @@ +Permission to use, copy, modify, and/or distribute this +software for any purpose with or without fee is hereby granted. + +THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL +WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED +WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL +THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR +CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM +LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, +NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN +CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-49624][BUILD] Upgrade `aircompressor` to 2.0.2
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e7cf246fb763 [SPARK-49624][BUILD] Upgrade `aircompressor` to 2.0.2 e7cf246fb763 is described below commit e7cf246fb7635ef7b95c18b7958bcadae00aa281 Author: panbingkun AuthorDate: Thu Sep 12 20:52:11 2024 -0700 [SPARK-49624][BUILD] Upgrade `aircompressor` to 2.0.2 ### What changes were proposed in this pull request? The pr aims to upgrade `aircompressor` from `0.27` to `2.0.2`. ### Why are the changes needed? https://github.com/airlift/aircompressor/releases/tag/2.0 (ps: 2.0.2 was built against `JDK 1.8`). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48098 from panbingkun/aircompressor_2. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- pom.xml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 2db86ed229a0..e1ac039f2546 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -4,7 +4,7 @@ JTransforms/3.1//JTransforms-3.1.jar RoaringBitmap/1.2.1//RoaringBitmap-1.2.1.jar ST4/4.0.4//ST4-4.0.4.jar activation/1.1.1//activation-1.1.1.jar -aircompressor/0.27//aircompressor-0.27.jar +aircompressor/2.0.2//aircompressor-2.0.2.jar algebra_2.13/2.8.0//algebra_2.13-2.8.0.jar aliyun-java-sdk-core/4.5.10//aliyun-java-sdk-core-4.5.10.jar aliyun-java-sdk-kms/2.11.0//aliyun-java-sdk-kms-2.11.0.jar diff --git a/pom.xml b/pom.xml index b1497c782685..b9f28eb61925 100644 --- a/pom.xml +++ b/pom.xml @@ -2634,7 +2634,7 @@ io.airlift aircompressor -0.27 +2.0.2 org.apache.orc - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-43354][PYTHON][TESTS] Re-enable `test_create_dataframe_from_pandas_with_day_time_interval` in PyPy3.9
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 61814876b26c [SPARK-43354][PYTHON][TESTS] Re-enable `test_create_dataframe_from_pandas_with_day_time_interval` in PyPy3.9 61814876b26c is described below commit 61814876b26c6fef2dc8238b1aeb0594d9a24472 Author: Dongjoon Hyun AuthorDate: Thu Sep 12 20:49:16 2024 -0700 [SPARK-43354][PYTHON][TESTS] Re-enable `test_create_dataframe_from_pandas_with_day_time_interval` in PyPy3.9 ### What changes were proposed in this pull request? This PR aims to re-enable `test_create_dataframe_from_pandas_with_day_time_interval` in PyPy3.9. ### Why are the changes needed? This was disabled at PyPy3.8, but we dropped Python 3.8 support and the test passed with PyPy3.9. - #46228 **BEFORE: Skipped with `Fails in PyPy Python 3.8, should enable.` message** ``` $ python/run-tests.py --python-executables pypy3 --testnames pyspark.sql.tests.test_creation Running PySpark tests. Output is in /Users/dongjoon/APACHE/spark-merge/python/unit-tests.log Will test against the following Python executables: ['pypy3'] Will test the following Python tests: ['pyspark.sql.tests.test_creation'] pypy3 python_implementation is PyPy pypy3 version is: Python 3.9.19 (a2113ea87262, Apr 21 2024, 05:41:07) [PyPy 7.3.16 with GCC Apple LLVM 15.0.0 (clang-1500.1.0.2.5)] Starting test(pypy3): pyspark.sql.tests.test_creation (temp output: /Users/dongjoon/APACHE/spark-merge/python/target/58e26724-5c3e-4451-80f8-cabdb36f0901/pypy3__pyspark.sql.tests.test_creation__n448ay57.log) Finished test(pypy3): pyspark.sql.tests.test_creation (6s) ... 3 tests were skipped Tests passed in 6 seconds Skipped tests in pyspark.sql.tests.test_creation with pypy3: test_create_dataframe_from_pandas_with_day_time_interval (pyspark.sql.tests.test_creation.DataFrameCreationTests) ... skipped 'Fails in PyPy Python 3.8, should enable.' test_create_dataframe_required_pandas_not_found (pyspark.sql.tests.test_creation.DataFrameCreationTests) ... skipped 'Required Pandas was found.' test_schema_inference_from_pandas_with_dict (pyspark.sql.tests.test_creation.DataFrameCreationTests) ... skipped '[PACKAGE_NOT_INSTALLED] PyArrow >= 10.0.0 must be installed; however, it was not found.' ``` **AFTER** ``` $ python/run-tests.py --python-executables pypy3 --testnames pyspark.sql.tests.test_creation Running PySpark tests. Output is in /Users/dongjoon/APACHE/spark-merge/python/unit-tests.log Will test against the following Python executables: ['pypy3'] Will test the following Python tests: ['pyspark.sql.tests.test_creation'] pypy3 python_implementation is PyPy pypy3 version is: Python 3.9.19 (a2113ea87262, Apr 21 2024, 05:41:07) [PyPy 7.3.16 with GCC Apple LLVM 15.0.0 (clang-1500.1.0.2.5)] Starting test(pypy3): pyspark.sql.tests.test_creation (temp output: /Users/dongjoon/APACHE/spark-merge/python/target/1f0db01f-0beb-4ee2-817f-363eb2f2804d/pypy3__pyspark.sql.tests.test_creation__2w4gy9u1.log) Finished test(pypy3): pyspark.sql.tests.test_creation (13s) ... 2 tests were skipped Tests passed in 13 seconds Skipped tests in pyspark.sql.tests.test_creation with pypy3: test_create_dataframe_required_pandas_not_found (pyspark.sql.tests.test_creation.DataFrameCreationTests) ... skipped 'Required Pandas was found.' test_schema_inference_from_pandas_with_dict (pyspark.sql.tests.test_creation.DataFrameCreationTests) ... skipped '[PACKAGE_NOT_INSTALLED] PyArrow >= 10.0.0 must be installed; however, it was not found.' ``` ### Does this PR introduce _any_ user-facing change? No, this is a test only change. ### How was this patch tested? Manual tests with PyPy3.9. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48097 from dongjoon-hyun/SPARK-43354. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- python/pyspark/sql/tests/test_creation.py | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/python/pyspark/sql/tests/test_creation.py b/python/pyspark/sql/tests/test_creation.py index dfe66cdd3edf..c6917aa234b4 100644 --- a/python/pyspark/sql/tests/test_creation.py +++ b/python/pyspark/sql/tests/test_creation.py @@ -15,7 +15,6 @@ # limitations under the License. # -import platform from decimal import Decimal import os import time @@ -111,11 +110,7 @@ class DataFrameCreationTestsMixin: os.environ["TZ"] = orig_env_tz time.tzset() -# TO
(spark) branch master updated (f69b518446e2 -> 23e61f6b1845)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from f69b518446e2 [SPARK-49620][INFRA] Fix `spark-rm` and `infra` docker files to create `pypy3.9` links add 23e61f6b1845 [SPARK-49621][SQL][TESTS] Remove the flaky `EXEC IMMEDIATE STACK OVERFLOW` test case No new revisions were added by this update. Summary of changes: .../execution/ExecuteImmediateEndToEndSuite.scala | 27 -- 1 file changed, 27 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-49620][INFRA] Fix `spark-rm` and `infra` docker files to create `pypy3.9` links
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f69b518446e2 [SPARK-49620][INFRA] Fix `spark-rm` and `infra` docker files to create `pypy3.9` links f69b518446e2 is described below commit f69b518446e2f18fccdad3e1c23792bbee20f3f5 Author: Dongjoon Hyun AuthorDate: Thu Sep 12 20:43:33 2024 -0700 [SPARK-49620][INFRA] Fix `spark-rm` and `infra` docker files to create `pypy3.9` links ### What changes were proposed in this pull request? This PR aims to fix two Dockerfiles to create `pypy3.9` symlinks instead of `pypy3.8`. https://github.com/apache/spark/blob/d2d293e3fb57d6c9dea084b5fe6707d67c715af3/dev/create-release/spark-rm/Dockerfile#L97 https://github.com/apache/spark/blob/d2d293e3fb57d6c9dea084b5fe6707d67c715af3/dev/infra/Dockerfile#L91 ### Why are the changes needed? Apache Spark 4.0 dropped `Python 3.8` support. We should make it sure that we don't use `pypy3.8` at all. - #46228 ### Does this PR introduce _any_ user-facing change? No. This is a dev-only change. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48095 from dongjoon-hyun/SPARK-49620. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- dev/create-release/spark-rm/Dockerfile | 2 +- dev/infra/Dockerfile | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/dev/create-release/spark-rm/Dockerfile b/dev/create-release/spark-rm/Dockerfile index e86b91968bf8..e7f558b523d0 100644 --- a/dev/create-release/spark-rm/Dockerfile +++ b/dev/create-release/spark-rm/Dockerfile @@ -94,7 +94,7 @@ ENV R_LIBS_SITE "/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library RUN add-apt-repository ppa:pypy/ppa RUN mkdir -p /usr/local/pypy/pypy3.9 && \ curl -sqL https://downloads.python.org/pypy/pypy3.9-v7.3.16-linux64.tar.bz2 | tar xjf - -C /usr/local/pypy/pypy3.9 --strip-components=1 && \ -ln -sf /usr/local/pypy/pypy3.9/bin/pypy /usr/local/bin/pypy3.8 && \ +ln -sf /usr/local/pypy/pypy3.9/bin/pypy /usr/local/bin/pypy3.9 && \ ln -sf /usr/local/pypy/pypy3.9/bin/pypy /usr/local/bin/pypy3 RUN curl -sS https://bootstrap.pypa.io/get-pip.py | pypy3 RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas==2.2.2' scipy coverage matplotlib lxml diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile index ce4736299928..5939e429b2f3 100644 --- a/dev/infra/Dockerfile +++ b/dev/infra/Dockerfile @@ -88,7 +88,7 @@ ENV R_LIBS_SITE "/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library RUN add-apt-repository ppa:pypy/ppa RUN mkdir -p /usr/local/pypy/pypy3.9 && \ curl -sqL https://downloads.python.org/pypy/pypy3.9-v7.3.16-linux64.tar.bz2 | tar xjf - -C /usr/local/pypy/pypy3.9 --strip-components=1 && \ -ln -sf /usr/local/pypy/pypy3.9/bin/pypy /usr/local/bin/pypy3.8 && \ +ln -sf /usr/local/pypy/pypy3.9/bin/pypy /usr/local/bin/pypy3.9 && \ ln -sf /usr/local/pypy/pypy3.9/bin/pypy /usr/local/bin/pypy3 RUN curl -sS https://bootstrap.pypa.io/get-pip.py | pypy3 RUN pypy3 -m pip install 'numpy==1.26.4' 'six==1.16.0' 'pandas==2.2.2' scipy coverage matplotlib lxml - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark-kubernetes-operator) branch main updated: [SPARK-49625] Add `SparkCluster` state transition test
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git The following commit(s) were added to refs/heads/main by this push: new 07ff073 [SPARK-49625] Add `SparkCluster` state transition test 07ff073 is described below commit 07ff073b358e4f3b70c66c629f17d79625e534d3 Author: Qi Tan AuthorDate: Thu Sep 12 20:42:23 2024 -0700 [SPARK-49625] Add `SparkCluster` state transition test ### What changes were proposed in this pull request? Add e2e for Spark Cluster Happy State Transition Test ### Why are the changes needed? To simulate real user experiences ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? test locally ### Was this patch authored or co-authored using generative AI tooling? no Closes #122 from TQJADE/spark-cluster-state-transtition. Authored-by: Qi Tan Signed-off-by: Dongjoon Hyun --- .../spark-cluster-state-transition.yaml| 27 tests/e2e/state-transition/chainsaw-test.yaml | 37 +++--- .../spark-cluster-example-succeeded.yaml | 33 +++ 3 files changed, 93 insertions(+), 4 deletions(-) diff --git a/tests/e2e/assertions/spark-cluster/spark-cluster-state-transition.yaml b/tests/e2e/assertions/spark-cluster/spark-cluster-state-transition.yaml new file mode 100644 index 000..4194583 --- /dev/null +++ b/tests/e2e/assertions/spark-cluster/spark-cluster-state-transition.yaml @@ -0,0 +1,27 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +apiVersion: spark.apache.org/v1alpha1 +kind: SparkCluster +metadata: + name: spark-cluster-succeeded-test + namespace: ($SPARK_APP_NAMESPACE) +status: + stateTransitionHistory: +(*.currentStateSummary): + - "Submitted" + - "RunningHealthy" diff --git a/tests/e2e/state-transition/chainsaw-test.yaml b/tests/e2e/state-transition/chainsaw-test.yaml index 46f5c1f..6e95b69 100644 --- a/tests/e2e/state-transition/chainsaw-test.yaml +++ b/tests/e2e/state-transition/chainsaw-test.yaml @@ -18,22 +18,26 @@ apiVersion: chainsaw.kyverno.io/v1alpha1 kind: Test metadata: - name: spark-operator-spark-application-state-transition-validation + name: spark-operator-state-transition-validation spec: scenarios: - bindings: - name: TEST_NAME value: succeeded - - name: FILE_NAME + - name: APPLICATION_FILE_NAME value: spark-example-succeeded.yaml + - name: CLUSTER_FILE_NAME +value: spark-cluster-example-succeeded.yaml - name: SPARK_APPLICATION_NAME value: spark-job-succeeded-test + - name: SPARK_CLUSTER_NAME +value: spark-cluster-succeeded-test steps: - try: - script: env: - name: FILE_NAME -value: ($FILE_NAME) +value: ($APPLICATION_FILE_NAME) content: kubectl apply -f $FILE_NAME - assert: bindings: @@ -53,4 +57,29 @@ spec: value: ($SPARK_APPLICATION_NAME) timeout: 30s content: | - kubectl delete sparkapplication $SPARK_APPLICATION_NAME \ No newline at end of file + kubectl delete sparkapplication $SPARK_APPLICATION_NAME + - try: +- script: +env: + - name: FILE_NAME +value: ($CLUSTER_FILE_NAME) +content: kubectl apply -f $FILE_NAME +- assert: +bindings: + - name: SPARK_APP_NAMESPACE +value: default +timeout: 60s +file: "../assertions/spark-cluster/spark-cluster-state-transition.yaml" +catch: +- describe: +apiVersion: spark.apache.org/v1alpha1 +kind: SparkCluster +namespace: default +finally: + - script: + env: +- name: SPARK_CLUSTER_NAME + value: ($SPARK_CLUSTER_NAME) + timeout: 30s + content: | +kubectl delete sparkcluster $SPARK_CLUSTER_NAME diff --git a/tests/e2e/state-transition/spark-cluster-example-succeeded.yaml b/tests/e2e/sta
(spark-kubernetes-operator) branch main updated: [SPARK-49623] Refactor prefix `appResources` to `workloadResources`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git The following commit(s) were added to refs/heads/main by this push: new 9400fe3 [SPARK-49623] Refactor prefix `appResources` to `workloadResources` 9400fe3 is described below commit 9400fe3bdaac2207a930d8ec0d25e90f0486b030 Author: zhou-jiang AuthorDate: Thu Sep 12 17:11:38 2024 -0700 [SPARK-49623] Refactor prefix `appResources` to `workloadResources` ### What changes were proposed in this pull request? This PR refactors previous `appResources` to `workloadResources` for helm chart / template / values to avoid confusion. ### Why are the changes needed? In operator helm chart, prefix / group `appResources` was introduced to indicate the resources are created for Spark workloads whereas other resources are serving the operator deployment itself. As we extend the support to cover `SparkCluster`, the naming becomes confusing as the resources in fact serve both `SparkApp` and `SparkCluster`. Thus, this PR aims to refactor the naming prefix to cover the new resource in a more general fashion. ### Does this PR introduce _any_ user-facing change? No - not yet released ### How was this patch tested? CI + Helm chart test. ### Was this patch authored or co-authored using generative AI tooling? No Closes #121 from jiangzho/helm. Authored-by: zhou-jiang Signed-off-by: Dongjoon Hyun --- .../templates/_helpers.tpl | 12 +- .../templates/app-rbac.yaml| 176 - .../templates/operator-rbac.yaml | 10 +- .../templates/tests/test-rbac.yaml | 5 +- .../templates/workload-rbac.yaml | 176 + .../helm/spark-kubernetes-operator/values.yaml | 43 ++--- tests/e2e/helm/dynamic-config-values.yaml | 4 +- 7 files changed, 214 insertions(+), 212 deletions(-) diff --git a/build-tools/helm/spark-kubernetes-operator/templates/_helpers.tpl b/build-tools/helm/spark-kubernetes-operator/templates/_helpers.tpl index e8ee901..587c833 100644 --- a/build-tools/helm/spark-kubernetes-operator/templates/_helpers.tpl +++ b/build-tools/helm/spark-kubernetes-operator/templates/_helpers.tpl @@ -94,11 +94,11 @@ Create the path of the operator image to use {{- end }} {{/* -List of Spark app namespaces. If not provied in values, use the same namespace as operator +List of Spark workload namespaces. If not provied in values, use the same namespace as operator */}} -{{- define "spark-operator.appNamespacesStr" -}} -{{- if index (.Values.appResources.namespaces) "data" }} -{{- $ns_list := join "," .Values.appResources.namespaces.data }} +{{- define "spark-operator.workloadNamespacesStr" -}} +{{- if index (.Values.workloadResources.namespaces) "data" }} +{{- $ns_list := join "," .Values.workloadResources.namespaces.data }} {{- printf "%s" $ns_list }} {{- else }} {{- printf "%s" .Release.Namespace }} @@ -113,8 +113,8 @@ Default property overrides spark.kubernetes.operator.namespace={{ .Release.Namespace }} spark.kubernetes.operator.name={{- include "spark-operator.name" . }} spark.kubernetes.operator.dynamicConfig.enabled={{ .Values.operatorConfiguration.dynamicConfig.enable }} -{{- if .Values.appResources.namespaces.overrideWatchedNamespaces }} -spark.kubernetes.operator.watchedNamespaces={{ include "spark-operator.appNamespacesStr" . | trim }} +{{- if .Values.workloadResources.namespaces.overrideWatchedNamespaces }} +spark.kubernetes.operator.watchedNamespaces={{ include "spark-operator.workloadNamespacesStr" . | trim }} {{- end }} {{- end }} diff --git a/build-tools/helm/spark-kubernetes-operator/templates/app-rbac.yaml b/build-tools/helm/spark-kubernetes-operator/templates/app-rbac.yaml deleted file mode 100644 index 1ae62d9..000 --- a/build-tools/helm/spark-kubernetes-operator/templates/app-rbac.yaml +++ /dev/null @@ -1,176 +0,0 @@ -# Licensed to the Apache Software Foundation (ASF) under one or more -# contributor license agreements. See the NOTICE file distributed with -# this work for additional information regarding copyright ownership. -# The ASF licenses this file to You under the Apache License, Version 2.0 -# (the "License"); you may not use this file except in compliance with -# the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See th
(spark) branch master updated: [SPARK-49081][SQL][DOCS] Add data source options docs of `Protobuf`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 875def391665 [SPARK-49081][SQL][DOCS] Add data source options docs of `Protobuf` 875def391665 is described below commit 875def39166549f9de54b141f5397cb3f74a918e Author: Wei Guo AuthorDate: Thu Sep 12 16:26:33 2024 -0700 [SPARK-49081][SQL][DOCS] Add data source options docs of `Protobuf` ### What changes were proposed in this pull request? This PR aims to add data source options docs of `Protobuf` data source. Other data sources such as `csv`, `json` have corresponding options documents. The document section appears as follows: https://github.com/user-attachments/assets/6f40a69b-1350-4b6b-9a1e-d780fcabb9f1";> https://github.com/user-attachments/assets/80402560-474b-4608-be51-0a98d9324109";> ### Why are the changes needed? In order to facilitate Spark users to better understand and use the options of `Protobuf` data source. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA and local manual check with `SKIP_API=1 bundle exec jekyll build --watch`. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47570 from wayneguow/pb_docs. Authored-by: Wei Guo Signed-off-by: Dongjoon Hyun --- .../spark/sql/protobuf/utils/ProtobufOptions.scala | 12 ++-- docs/sql-data-sources-protobuf.md | 67 +- 2 files changed, 72 insertions(+), 7 deletions(-) diff --git a/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala b/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala index 6644bce98293..e85097a272f2 100644 --- a/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala +++ b/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala @@ -43,8 +43,8 @@ private[sql] class ProtobufOptions( /** * Adds support for recursive fields. If this option is is not specified, recursive fields are - * not permitted. Setting it to 0 drops the recursive fields, 1 allows it to be recursed once, - * and 2 allows it to be recursed twice and so on, up to 10. Values larger than 10 are not + * not permitted. Setting it to 1 drops the recursive fields, 0 allows it to be recursed once, + * and 3 allows it to be recursed twice and so on, up to 10. Values larger than 10 are not * allowed in order avoid inadvertently creating very large schemas. If a Protobuf message * has depth beyond this limit, the Spark struct returned is truncated after the recursion limit. * @@ -52,8 +52,8 @@ private[sql] class ProtobufOptions( * `message Person { string name = 1; Person friend = 2; }` * The following lists the schema with different values for this setting. * 1: `struct` - * 2: `struct>` - * 3: `struct>>` + * 2: `struct>` + * 3: `struct>>` * and so on. */ val recursiveFieldMaxDepth: Int = parameters.getOrElse("recursive.fields.max.depth", "-1").toInt @@ -181,7 +181,7 @@ private[sql] class ProtobufOptions( val upcastUnsignedInts: Boolean = parameters.getOrElse("upcast.unsigned.ints", false.toString).toBoolean - // Whether to unwrap the struct representation for well known primitve wrapper types when + // Whether to unwrap the struct representation for well known primitive wrapper types when // deserializing. By default, the wrapper types for primitives (i.e. google.protobuf.Int32Value, // google.protobuf.Int64Value, etc.) will get deserialized as structs. We allow the option to // deserialize them as their respective primitives. @@ -221,7 +221,7 @@ private[sql] class ProtobufOptions( // By default, in the spark schema field a will be dropped, which result in schema // b struct // If retain.empty.message.types=true, field a will be retained by inserting a dummy column. - // b struct, name: string> + // b struct, name: string> val retainEmptyMessage: Boolean = parameters.getOrElse("retain.empty.message.types", false.toString).toBoolean } diff --git a/docs/sql-data-sources-protobuf.md b/docs/sql-data-sources-protobuf.md index 34cb1d4997d2..4dd6579f92cd 100644 --- a/docs/sql-data-sources-protobuf.md +++ b/docs/sql-data-sources-protobuf.md @@ -434,4 +434,69 @@ message Person { ``` - \ No newline at end of file + + +## Data Source Option + +Data source options of Protobuf can be set via: +* the built-in functions below + * `from_protobuf` + * `to_protobuf` + + + Property NameDefaultMeaningScope + +
(spark-kubernetes-operator) branch main updated: [SPARK-49619] Upgrade Gradle to 8.10.1
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git The following commit(s) were added to refs/heads/main by this push: new 27a033c [SPARK-49619] Upgrade Gradle to 8.10.1 27a033c is described below commit 27a033ca3191c450509d5c9d16e2596512495b63 Author: Dongjoon Hyun AuthorDate: Thu Sep 12 13:57:54 2024 -0700 [SPARK-49619] Upgrade Gradle to 8.10.1 ### What changes were proposed in this pull request? This PR aims to upgrade `Gradle` to 8.10.1. ### Why are the changes needed? To bring the following bug fixes. - https://github.com/gradle/gradle/releases/tag/v8.10.1 - https://github.com/gradle/gradle/issues/30239 Gradle 8.10 Significantly Slower Due to Dependency Resolution - https://github.com/gradle/gradle/issues/30272 Broken equals() contract for LifecycleAwareProject - https://github.com/gradle/gradle/issues/30385 Gradle should not validate isolated projects when isolated projects is disabled ### Does this PR introduce _any_ user-facing change? No. This is a dev-only change. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #120 from dongjoon-hyun/SPARK-49619. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- build-tools/docker/Dockerfile| 2 +- gradle/wrapper/gradle-wrapper.properties | 4 ++-- gradlew | 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/build-tools/docker/Dockerfile b/build-tools/docker/Dockerfile index c6fabde..66dc6a3 100644 --- a/build-tools/docker/Dockerfile +++ b/build-tools/docker/Dockerfile @@ -14,7 +14,7 @@ # See the License for the specific language governing permissions and # limitations under the License. # -FROM gradle:8.10.0-jdk17-jammy AS builder +FROM gradle:8.10.1-jdk17-jammy AS builder WORKDIR /app COPY . . RUN ./gradlew clean build -x check diff --git a/gradle/wrapper/gradle-wrapper.properties b/gradle/wrapper/gradle-wrapper.properties index 3d2e79e..b44d3e9 100644 --- a/gradle/wrapper/gradle-wrapper.properties +++ b/gradle/wrapper/gradle-wrapper.properties @@ -17,8 +17,8 @@ distributionBase=GRADLE_USER_HOME distributionPath=wrapper/dists -distributionSha256Sum=682b4df7fe5accdca84a4d1ef6a3a6ab096b3efd5edf7de2bd8c758d95a93703 -distributionUrl=https\://services.gradle.org/distributions/gradle-8.10-all.zip +distributionSha256Sum=fdfca5dbc2834f0ece5020465737538e5ba679deeff5ab6c09621d67f8bb1a15 +distributionUrl=https\://services.gradle.org/distributions/gradle-8.10.1-all.zip networkTimeout=1 zipStoreBase=GRADLE_USER_HOME zipStorePath=wrapper/dists diff --git a/gradlew b/gradlew index 482b49a..2ccd3c3 100755 --- a/gradlew +++ b/gradlew @@ -87,11 +87,11 @@ APP_BASE_NAME=${0##*/} APP_HOME=$( cd "${APP_HOME:-./}" > /dev/null && pwd -P ) || exit if [ ! -e $APP_HOME/gradle/wrapper/gradle-wrapper.jar -a "$(command -v curl)" ]; then -curl -o $APP_HOME/gradle/wrapper/gradle-wrapper.jar https://raw.githubusercontent.com/gradle/gradle/v8.10.0/gradle/wrapper/gradle-wrapper.jar +curl -o $APP_HOME/gradle/wrapper/gradle-wrapper.jar https://raw.githubusercontent.com/gradle/gradle/v8.10.1/gradle/wrapper/gradle-wrapper.jar fi # If the file still doesn't exist, let's try `wget` and cross our fingers if [ ! -e $APP_HOME/gradle/wrapper/gradle-wrapper.jar -a "$(command -v wget)" ]; then -wget -O $APP_HOME/gradle/wrapper/gradle-wrapper.jar https://raw.githubusercontent.com/gradle/gradle/v8.10.0/gradle/wrapper/gradle-wrapper.jar +wget -O $APP_HOME/gradle/wrapper/gradle-wrapper.jar https://raw.githubusercontent.com/gradle/gradle/v8.10.1/gradle/wrapper/gradle-wrapper.jar fi # Use the maximum available, or set MAX_FD != -1 to use that value. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (317eddb7390c -> d2d293e3fb57)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 317eddb7390c [SPARK-49606][PS][DOCS] Improve documentation of Pandas on Spark plotting API add d2d293e3fb57 [SPARK-49598][K8S] Support user-defined labels for OnDemand PVCs No new revisions were added by this update. Summary of changes: docs/running-on-kubernetes.md | 18 ++ .../scala/org/apache/spark/deploy/k8s/Config.scala | 2 +- .../spark/deploy/k8s/KubernetesVolumeSpec.scala| 3 +- .../spark/deploy/k8s/KubernetesVolumeUtils.scala | 16 - .../k8s/features/MountVolumesFeatureStep.scala | 9 ++- .../spark/deploy/k8s/KubernetesTestConf.scala | 9 ++- .../deploy/k8s/KubernetesVolumeUtilsSuite.scala| 34 +- .../features/MountVolumesFeatureStepSuite.scala| 73 ++ 8 files changed, 154 insertions(+), 10 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (98f0d9f32322 -> 317eddb7390c)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 98f0d9f32322 [SPARK-49605][SQL] Fix the prompt when `ascendingOrder` is `DataTypeMismatch` in `SortArray` add 317eddb7390c [SPARK-49606][PS][DOCS] Improve documentation of Pandas on Spark plotting API No new revisions were added by this update. Summary of changes: python/pyspark/pandas/plot/core.py | 26 -- 1 file changed, 24 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-44811][BUILD] Upgrade Guava to 33.2.1-jre
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 1f24b2d72ed6 [SPARK-44811][BUILD] Upgrade Guava to 33.2.1-jre 1f24b2d72ed6 is described below commit 1f24b2d72ed6821a6cc6d1d22683d2f3ba2326a2 Author: Cheng Pan AuthorDate: Thu Sep 12 09:26:56 2024 -0700 [SPARK-44811][BUILD] Upgrade Guava to 33.2.1-jre ### What changes were proposed in this pull request? This PR upgrades Spark's built-in Guava from 14 to 33.2.1-jre Currently, Spark uses Guava 14 because the previous built-in Hive 2.3.9 is incompatible with new Guava versions. HIVE-27560 (https://github.com/apache/hive/pull/4542) makes Hive 2.3.10 compatible with Guava 14+ (thanks to LuciferYang) ### Why are the changes needed? It's a long-standing issue, see prior discussions at https://github.com/apache/spark/pull/35584, https://github.com/apache/spark/pull/36231, and https://github.com/apache/spark/pull/33989 ### Does this PR introduce _any_ user-facing change? Yes, some user-faced error messages changed. ### How was this patch tested? GA passed. Closes #42493 from pan3793/guava. Authored-by: Cheng Pan Signed-off-by: Dongjoon Hyun --- assembly/pom.xml | 2 +- core/pom.xml | 1 + dev/deps/spark-deps-hadoop-3-hive-2.3 | 7 ++- pom.xml| 3 ++- project/SparkBuild.scala | 2 +- .../spark/sql/catalyst/expressions/IntervalExpressionsSuite.scala | 2 +- .../src/test/resources/sql-tests/results/ansi/interval.sql.out | 4 ++-- sql/core/src/test/resources/sql-tests/results/interval.sql.out | 4 ++-- 8 files changed, 16 insertions(+), 9 deletions(-) diff --git a/assembly/pom.xml b/assembly/pom.xml index 4b074a88dab4..01bd324efc11 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -123,7 +123,7 @@ com.google.guava diff --git a/core/pom.xml b/core/pom.xml index 53d5ad71cebf..19f58940ed94 100644 --- a/core/pom.xml +++ b/core/pom.xml @@ -558,6 +558,7 @@ org.eclipse.jetty:jetty-util org.eclipse.jetty:jetty-server com.google.guava:guava + com.google.guava:failureaccess com.google.protobuf:* diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index c89c92815d45..2db86ed229a0 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -33,6 +33,7 @@ breeze-macros_2.13/2.1.0//breeze-macros_2.13-2.1.0.jar breeze_2.13/2.1.0//breeze_2.13-2.1.0.jar bundle/2.24.6//bundle-2.24.6.jar cats-kernel_2.13/2.8.0//cats-kernel_2.13-2.8.0.jar +checker-qual/3.42.0//checker-qual-3.42.0.jar chill-java/0.10.0//chill-java-0.10.0.jar chill_2.13/0.10.0//chill_2.13-0.10.0.jar commons-cli/1.9.0//commons-cli-1.9.0.jar @@ -62,12 +63,14 @@ derby/10.16.1.1//derby-10.16.1.1.jar derbyshared/10.16.1.1//derbyshared-10.16.1.1.jar derbytools/10.16.1.1//derbytools-10.16.1.1.jar dropwizard-metrics-hadoop-metrics2-reporter/0.1.2//dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar +error_prone_annotations/2.26.1//error_prone_annotations-2.26.1.jar esdk-obs-java/3.20.4.2//esdk-obs-java-3.20.4.2.jar +failureaccess/1.0.2//failureaccess-1.0.2.jar flatbuffers-java/24.3.25//flatbuffers-java-24.3.25.jar gcs-connector/hadoop3-2.2.21/shaded/gcs-connector-hadoop3-2.2.21-shaded.jar gmetric4j/1.0.10//gmetric4j-1.0.10.jar gson/2.11.0//gson-2.11.0.jar -guava/14.0.1//guava-14.0.1.jar +guava/33.2.1-jre//guava-33.2.1-jre.jar hadoop-aliyun/3.4.0//hadoop-aliyun-3.4.0.jar hadoop-annotations/3.4.0//hadoop-annotations-3.4.0.jar hadoop-aws/3.4.0//hadoop-aws-3.4.0.jar @@ -101,6 +104,7 @@ icu4j/75.1//icu4j-75.1.jar ini4j/0.5.4//ini4j-0.5.4.jar istack-commons-runtime/3.0.8//istack-commons-runtime-3.0.8.jar ivy/2.5.2//ivy-2.5.2.jar +j2objc-annotations/3.0.0//j2objc-annotations-3.0.0.jar jackson-annotations/2.17.2//jackson-annotations-2.17.2.jar jackson-core-asl/1.9.13//jackson-core-asl-1.9.13.jar jackson-core/2.17.2//jackson-core-2.17.2.jar @@ -184,6 +188,7 @@ lapack/3.0.3//lapack-3.0.3.jar leveldbjni-all/1.8//leveldbjni-all-1.8.jar libfb303/0.9.3//libfb303-0.9.3.jar libthrift/0.16.0//libthrift-0.16.0.jar +listenablefuture/.0-empty-to-avoid-conflict-with-guava//listenablefuture-.0-empty-to-avoid-conflict-with-guava.jar log4j-1.2-api/2.22.1//log4j-1.2-api-2.22.1.jar log4j-api/2.22.1//log4j-api-2.22.1.jar log4j-core/2.22.1//log4j-core-2.22.1.jar diff --git a/pom.xml b/pom.xml index 6f5c9b63f86d..b1497c782685 100644 --- a/pom.xml +++ b/pom.xml
(spark) branch branch-3.4 updated: [SPARK-49261][SQL] Don't replace literals in aggregate expressions with group-by expressions
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new ba05a6bcd972 [SPARK-49261][SQL] Don't replace literals in aggregate expressions with group-by expressions ba05a6bcd972 is described below commit ba05a6bcd972ed4d5d1ee7a31f1c770ed7bfaed7 Author: Bruce Robbins AuthorDate: Thu Sep 12 08:11:03 2024 -0700 [SPARK-49261][SQL] Don't replace literals in aggregate expressions with group-by expressions ### What changes were proposed in this pull request? Before this PR, `RewriteDistinctAggregates` could potentially replace literals in the aggregate expressions with output attributes from the `Expand` operator. This can occur when a group-by expression is a literal that happens by chance to match a literal used in an aggregate expression. E.g.: ``` create or replace temp view v1(a, b, c) as values (1, 1.001d, 2), (2, 3.001d, 4), (2, 3.001, 4); cache table v1; select round(sum(b), 6) as sum1, count(distinct a) as count1, count(distinct c) as count2 from ( select 6 as gb, * from v1 ) group by a, gb; ``` In the optimized plan, you can see that the literal 6 in the `round` function invocation has been patched with an output attribute (6#163) from the `Expand` operator: ``` == Optimized Logical Plan == 'Aggregate [a#123, 6#163], [round(first(sum(__auto_generated_subquery_name.b)#167, true) FILTER (WHERE (gid#162 = 0)), 6#163) AS sum1#114, count(__auto_generated_subquery_name.a#164) FILTER (WHERE (gid#162 = 1)) AS count1#115L, count(__auto_generated_subquery_name.c#165) FILTER (WHERE (gid#162 = 2)) AS count2#116L] +- Aggregate [a#123, 6#163, __auto_generated_subquery_name.a#164, __auto_generated_subquery_name.c#165, gid#162], [a#123, 6#163, __auto_generated_subquery_name.a#164, __auto_generated_subquery_name.c#165, gid#162, sum(__auto_generated_subquery_name.b#166) AS sum(__auto_generated_subquery_name.b)#167] +- Expand [[a#123, 6, null, null, 0, b#124], [a#123, 6, a#123, null, 1, null], [a#123, 6, null, c#125, 2, null]], [a#123, 6#163, __auto_generated_subquery_name.a#164, __auto_generated_subquery_name.c#165, gid#162, __auto_generated_subquery_name.b#166] +- InMemoryRelation [a#123, b#124, c#125], StorageLevel(disk, memory, deserialized, 1 replicas) +- LocalTableScan [a#6, b#7, c#8] ``` This is because the literal 6 was used in the group-by expressions (referred to as gb in the query, and renamed 6#163 in the `Expand` operator's output attributes). After this PR, foldable expressions in the aggregate expressions are kept as-is. ### Why are the changes needed? Some expressions require a foldable argument. In the above example, the `round` function requires a foldable expression as the scale argument. Because the scale argument is patched with an attribute, `RoundBase#checkInputDataTypes` returns an error, which leaves the `Aggregate` operator unresolved: ``` [INTERNAL_ERROR] Invalid call to dataType on unresolved object SQLSTATE: XX000 org.apache.spark.sql.catalyst.analysis.UnresolvedException: [INTERNAL_ERROR] Invalid call to dataType on unresolved object SQLSTATE: XX000 at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:255) at org.apache.spark.sql.catalyst.types.DataTypeUtils$.$anonfun$fromAttributes$1(DataTypeUtils.scala:241) at scala.collection.immutable.List.map(List.scala:247) at scala.collection.immutable.List.map(List.scala:79) at org.apache.spark.sql.catalyst.types.DataTypeUtils$.fromAttributes(DataTypeUtils.scala:241) at org.apache.spark.sql.catalyst.plans.QueryPlan.schema$lzycompute(QueryPlan.scala:428) at org.apache.spark.sql.catalyst.plans.QueryPlan.schema(QueryPlan.scala:428) at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:474) ... ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47876 from bersprockets/group_by_lit_issue. Authored-by: Bruce Robbins Signed-off-by: Dongjoon Hyun (cherry picked from commit 1a0791d006e25898b67cc17e1420f053a39091b9) Signed-off-by: Dongjoon Hyun --- .../optimizer/RewriteDistinctAggregates.scala | 3 ++- .../optimizer/RewriteDistinctAggregatesSuite.scala | 18 +- .../apache/spark/sql/DataFrameAggregateSuite.scala | 21 + 3 files changed, 40 insertions(+),
(spark) branch branch-3.5 updated: [SPARK-49261][SQL] Don't replace literals in aggregate expressions with group-by expressions
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 560efed3b00f [SPARK-49261][SQL] Don't replace literals in aggregate expressions with group-by expressions 560efed3b00f is described below commit 560efed3b00f4ac9be4356714c664bf0e9341c0b Author: Bruce Robbins AuthorDate: Thu Sep 12 08:11:03 2024 -0700 [SPARK-49261][SQL] Don't replace literals in aggregate expressions with group-by expressions ### What changes were proposed in this pull request? Before this PR, `RewriteDistinctAggregates` could potentially replace literals in the aggregate expressions with output attributes from the `Expand` operator. This can occur when a group-by expression is a literal that happens by chance to match a literal used in an aggregate expression. E.g.: ``` create or replace temp view v1(a, b, c) as values (1, 1.001d, 2), (2, 3.001d, 4), (2, 3.001, 4); cache table v1; select round(sum(b), 6) as sum1, count(distinct a) as count1, count(distinct c) as count2 from ( select 6 as gb, * from v1 ) group by a, gb; ``` In the optimized plan, you can see that the literal 6 in the `round` function invocation has been patched with an output attribute (6#163) from the `Expand` operator: ``` == Optimized Logical Plan == 'Aggregate [a#123, 6#163], [round(first(sum(__auto_generated_subquery_name.b)#167, true) FILTER (WHERE (gid#162 = 0)), 6#163) AS sum1#114, count(__auto_generated_subquery_name.a#164) FILTER (WHERE (gid#162 = 1)) AS count1#115L, count(__auto_generated_subquery_name.c#165) FILTER (WHERE (gid#162 = 2)) AS count2#116L] +- Aggregate [a#123, 6#163, __auto_generated_subquery_name.a#164, __auto_generated_subquery_name.c#165, gid#162], [a#123, 6#163, __auto_generated_subquery_name.a#164, __auto_generated_subquery_name.c#165, gid#162, sum(__auto_generated_subquery_name.b#166) AS sum(__auto_generated_subquery_name.b)#167] +- Expand [[a#123, 6, null, null, 0, b#124], [a#123, 6, a#123, null, 1, null], [a#123, 6, null, c#125, 2, null]], [a#123, 6#163, __auto_generated_subquery_name.a#164, __auto_generated_subquery_name.c#165, gid#162, __auto_generated_subquery_name.b#166] +- InMemoryRelation [a#123, b#124, c#125], StorageLevel(disk, memory, deserialized, 1 replicas) +- LocalTableScan [a#6, b#7, c#8] ``` This is because the literal 6 was used in the group-by expressions (referred to as gb in the query, and renamed 6#163 in the `Expand` operator's output attributes). After this PR, foldable expressions in the aggregate expressions are kept as-is. ### Why are the changes needed? Some expressions require a foldable argument. In the above example, the `round` function requires a foldable expression as the scale argument. Because the scale argument is patched with an attribute, `RoundBase#checkInputDataTypes` returns an error, which leaves the `Aggregate` operator unresolved: ``` [INTERNAL_ERROR] Invalid call to dataType on unresolved object SQLSTATE: XX000 org.apache.spark.sql.catalyst.analysis.UnresolvedException: [INTERNAL_ERROR] Invalid call to dataType on unresolved object SQLSTATE: XX000 at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:255) at org.apache.spark.sql.catalyst.types.DataTypeUtils$.$anonfun$fromAttributes$1(DataTypeUtils.scala:241) at scala.collection.immutable.List.map(List.scala:247) at scala.collection.immutable.List.map(List.scala:79) at org.apache.spark.sql.catalyst.types.DataTypeUtils$.fromAttributes(DataTypeUtils.scala:241) at org.apache.spark.sql.catalyst.plans.QueryPlan.schema$lzycompute(QueryPlan.scala:428) at org.apache.spark.sql.catalyst.plans.QueryPlan.schema(QueryPlan.scala:428) at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:474) ... ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47876 from bersprockets/group_by_lit_issue. Authored-by: Bruce Robbins Signed-off-by: Dongjoon Hyun (cherry picked from commit 1a0791d006e25898b67cc17e1420f053a39091b9) Signed-off-by: Dongjoon Hyun --- .../optimizer/RewriteDistinctAggregates.scala | 3 ++- .../optimizer/RewriteDistinctAggregatesSuite.scala | 18 +- .../apache/spark/sql/DataFrameAggregateSuite.scala | 21 + 3 files changed, 40 insertions(+),
(spark) branch master updated: [SPARK-49261][SQL] Don't replace literals in aggregate expressions with group-by expressions
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 1a0791d006e2 [SPARK-49261][SQL] Don't replace literals in aggregate expressions with group-by expressions 1a0791d006e2 is described below commit 1a0791d006e25898b67cc17e1420f053a39091b9 Author: Bruce Robbins AuthorDate: Thu Sep 12 08:11:03 2024 -0700 [SPARK-49261][SQL] Don't replace literals in aggregate expressions with group-by expressions ### What changes were proposed in this pull request? Before this PR, `RewriteDistinctAggregates` could potentially replace literals in the aggregate expressions with output attributes from the `Expand` operator. This can occur when a group-by expression is a literal that happens by chance to match a literal used in an aggregate expression. E.g.: ``` create or replace temp view v1(a, b, c) as values (1, 1.001d, 2), (2, 3.001d, 4), (2, 3.001, 4); cache table v1; select round(sum(b), 6) as sum1, count(distinct a) as count1, count(distinct c) as count2 from ( select 6 as gb, * from v1 ) group by a, gb; ``` In the optimized plan, you can see that the literal 6 in the `round` function invocation has been patched with an output attribute (6#163) from the `Expand` operator: ``` == Optimized Logical Plan == 'Aggregate [a#123, 6#163], [round(first(sum(__auto_generated_subquery_name.b)#167, true) FILTER (WHERE (gid#162 = 0)), 6#163) AS sum1#114, count(__auto_generated_subquery_name.a#164) FILTER (WHERE (gid#162 = 1)) AS count1#115L, count(__auto_generated_subquery_name.c#165) FILTER (WHERE (gid#162 = 2)) AS count2#116L] +- Aggregate [a#123, 6#163, __auto_generated_subquery_name.a#164, __auto_generated_subquery_name.c#165, gid#162], [a#123, 6#163, __auto_generated_subquery_name.a#164, __auto_generated_subquery_name.c#165, gid#162, sum(__auto_generated_subquery_name.b#166) AS sum(__auto_generated_subquery_name.b)#167] +- Expand [[a#123, 6, null, null, 0, b#124], [a#123, 6, a#123, null, 1, null], [a#123, 6, null, c#125, 2, null]], [a#123, 6#163, __auto_generated_subquery_name.a#164, __auto_generated_subquery_name.c#165, gid#162, __auto_generated_subquery_name.b#166] +- InMemoryRelation [a#123, b#124, c#125], StorageLevel(disk, memory, deserialized, 1 replicas) +- LocalTableScan [a#6, b#7, c#8] ``` This is because the literal 6 was used in the group-by expressions (referred to as gb in the query, and renamed 6#163 in the `Expand` operator's output attributes). After this PR, foldable expressions in the aggregate expressions are kept as-is. ### Why are the changes needed? Some expressions require a foldable argument. In the above example, the `round` function requires a foldable expression as the scale argument. Because the scale argument is patched with an attribute, `RoundBase#checkInputDataTypes` returns an error, which leaves the `Aggregate` operator unresolved: ``` [INTERNAL_ERROR] Invalid call to dataType on unresolved object SQLSTATE: XX000 org.apache.spark.sql.catalyst.analysis.UnresolvedException: [INTERNAL_ERROR] Invalid call to dataType on unresolved object SQLSTATE: XX000 at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:255) at org.apache.spark.sql.catalyst.types.DataTypeUtils$.$anonfun$fromAttributes$1(DataTypeUtils.scala:241) at scala.collection.immutable.List.map(List.scala:247) at scala.collection.immutable.List.map(List.scala:79) at org.apache.spark.sql.catalyst.types.DataTypeUtils$.fromAttributes(DataTypeUtils.scala:241) at org.apache.spark.sql.catalyst.plans.QueryPlan.schema$lzycompute(QueryPlan.scala:428) at org.apache.spark.sql.catalyst.plans.QueryPlan.schema(QueryPlan.scala:428) at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:474) ... ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47876 from bersprockets/group_by_lit_issue. Authored-by: Bruce Robbins Signed-off-by: Dongjoon Hyun --- .../optimizer/RewriteDistinctAggregates.scala | 3 ++- .../optimizer/RewriteDistinctAggregatesSuite.scala | 18 +- .../apache/spark/sql/DataFrameAggregateSuite.scala | 21 + 3 files changed, 40 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggr
(spark) branch master updated: [SPARK-49578][SQL][TESTS][FOLLOWUP] Regenerate Java 21 golden file for `postgreSQL/float4.sql` and `postgreSQL/int8.sql`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 19aad9ee36ed [SPARK-49578][SQL][TESTS][FOLLOWUP] Regenerate Java 21 golden file for `postgreSQL/float4.sql` and `postgreSQL/int8.sql` 19aad9ee36ed is described below commit 19aad9ee36edad0906b8223074351bfb76237c0a Author: yangjie01 AuthorDate: Thu Sep 12 07:17:28 2024 -0700 [SPARK-49578][SQL][TESTS][FOLLOWUP] Regenerate Java 21 golden file for `postgreSQL/float4.sql` and `postgreSQL/int8.sql` ### What changes were proposed in this pull request? This pr regenerate Java 21 golden file for `postgreSQL/float4.sql` and `postgreSQL/int8.sql` to fix Java 21 daily test. ### Why are the changes needed? Fix Java 21 daily test: - https://github.com/apache/spark/actions/runs/10823897095/job/30030200710 ``` [info] - postgreSQL/float4.sql *** FAILED *** (1 second, 100 milliseconds) [info] postgreSQL/float4.sql [info] Expected "...arameters" : { [info] "[ansiConfig" : "\"spark.sql.ansi.enabled\"", [info] "]expression" : "'N A ...", but got "...arameters" : { [info] "[]expression" : "'N A ..." Result did not match for query #11 [info] SELECT float('N A N') (SQLQueryTestSuite.scala:663) ... [info] - postgreSQL/int8.sql *** FAILED *** (2 seconds, 474 milliseconds) [info] postgreSQL/int8.sql [info] Expected "...arameters" : { [info] "[ansiConfig" : "\"spark.sql.ansi.enabled\"", [info] "]sourceType" : "\"BIG...", but got "...arameters" : { [info] "[]sourceType" : "\"BIG..." Result did not match for query #66 [info] SELECT CAST(q1 AS int) FROM int8_tbl WHERE q2 <> 456 (SQLQueryTestSuite.scala:663) ... [info] *** 2 TESTS FAILED *** [error] Failed: Total 3559, Failed 2, Errors 0, Passed 3557, Ignored 4 [error] Failed tests: [error] org.apache.spark.sql.SQLQueryTestSuite [error] (sql / Test / test) sbt.TestsFailedException: Tests unsuccessful ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass Github Acitons - Manual checked: `build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite" with Java 21, all test passed ` ### Was this patch authored or co-authored using generative AI tooling? No Closes #48089 from LuciferYang/SPARK-49578-FOLLOWUP. Authored-by: yangjie01 Signed-off-by: Dongjoon Hyun --- .../resources/sql-tests/results/postgreSQL/float4.sql.out.java21 | 7 --- .../resources/sql-tests/results/postgreSQL/int8.sql.out.java21 | 4 2 files changed, 11 deletions(-) diff --git a/sql/core/src/test/resources/sql-tests/results/postgreSQL/float4.sql.out.java21 b/sql/core/src/test/resources/sql-tests/results/postgreSQL/float4.sql.out.java21 index 6126411071bc..3c2189c39963 100644 --- a/sql/core/src/test/resources/sql-tests/results/postgreSQL/float4.sql.out.java21 +++ b/sql/core/src/test/resources/sql-tests/results/postgreSQL/float4.sql.out.java21 @@ -97,7 +97,6 @@ org.apache.spark.SparkNumberFormatException "errorClass" : "CAST_INVALID_INPUT", "sqlState" : "22018", "messageParameters" : { -"ansiConfig" : "\"spark.sql.ansi.enabled\"", "expression" : "'N A N'", "sourceType" : "\"STRING\"", "targetType" : "\"FLOAT\"" @@ -122,7 +121,6 @@ org.apache.spark.SparkNumberFormatException "errorClass" : "CAST_INVALID_INPUT", "sqlState" : "22018", "messageParameters" : { -"ansiConfig" : "\"spark.sql.ansi.enabled\"", "expression" : "'NaN x'", "sourceType" : "\"STRING\"", "targetType" : "\"FLOAT\"" @@ -147,7 +145,6 @@ org.apache.spark.SparkNumberFormatException "errorClass" : "CAST_INVALID_INPUT", "sqlState" : "22018", "messageParameters" : { -"ansiConfig" : "\"spark.sql.ansi.enabled\"", "expression" : "' INFINITYx'", "sourceType" : "\"STRING\"", "targetType" : "\"FLOAT\"" @@ -196,7 +193,6 @@ org.apache.spark.SparkNumberFormatException "errorCl
(spark) branch master updated (591a60df788a -> 07f5b2c1c5ff)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 591a60df788a [SPARK-49602][BUILD] Fix `assembly/pom.xml` to use `{project.version}` instead of `{version}` add 07f5b2c1c5ff [SPARK-49155][SQL][SS] Use more appropriate parameter type to construct `GenericArrayData` No new revisions were added by this update. Summary of changes: .../apache/spark/sql/kafka010/KafkaRecordToRowConverter.scala| 2 +- .../sql/catalyst/expressions/aggregate/HistogramNumeric.scala| 9 + .../spark/sql/catalyst/expressions/aggregate/collect.scala | 2 +- .../spark/sql/catalyst/expressions/collectionOperations.scala| 2 +- .../apache/spark/sql/catalyst/expressions/jsonExpressions.scala | 2 +- .../org/apache/spark/sql/catalyst/expressions/xml/xpath.scala| 2 +- .../scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala | 2 +- .../org/apache/spark/sql/execution/command/CommandUtils.scala| 2 +- 8 files changed, 12 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated: [SPARK-49595][CONNECT][SQL] Fix `DataFrame.unpivot/melt` in Spark Connect Scala Client
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 6ffa94d6d3be [SPARK-49595][CONNECT][SQL] Fix `DataFrame.unpivot/melt` in Spark Connect Scala Client 6ffa94d6d3be is described below commit 6ffa94d6d3be278910f282b35cc9cb4cd1dd2887 Author: Xinrong Meng AuthorDate: Wed Sep 11 08:52:33 2024 -0700 [SPARK-49595][CONNECT][SQL] Fix `DataFrame.unpivot/melt` in Spark Connect Scala Client Fix DataFrame.unpivot/melt in Spark Connect Scala Client by correctly assigning the name for the variable column. The original code used `setValueColumnName` for both the variable and value columns. This fix is necessary to ensure the correct behavior of the unpivot/melt operation. Yes. Variable and value columns can be set correctly as shown below. ```scala scala> val df = Seq((1, 11, 12L), (2, 21, 22L)).toDF("id", "int", "long") df: org.apache.spark.sql.package.DataFrame = [id: int, int: int ... 1 more field] scala> df.show() +---+---++ | id|int|long| +---+---++ | 1| 11| 12| | 2| 21| 22| +---+---++ ``` FROM (current master) ```scala scala> df.unpivot(Array($"id"), Array($"int", $"long"), "variable", "value").show() +---++-+ | id||value| +---++-+ | 1| int| 11| | 1|long| 12| | 2| int| 21| | 2|long| 22| +---++-+ ``` TO ```scala scala> df.unpivot(Array($"id"), Array($"int", $"long"), "variable", "value").show() +---++-+ | id|variable|value| +---++-+ | 1| int| 11| | 1|long| 12| | 2| int| 21| | 2|long| 22| +---++-+ ``` Existing tests. No. Closes #48069 from xinrong-meng/fix_unpivot. Authored-by: Xinrong Meng Signed-off-by: Dongjoon Hyun (cherry picked from commit e63b5601c1bd74b2b0054d48f944424d12b79835) Signed-off-by: Dongjoon Hyun (cherry picked from commit 96eebebaf5146144ff900c8081dfa5c5960b3bb2) Signed-off-by: Dongjoon Hyun --- .../jvm/src/main/scala/org/apache/spark/sql/Dataset.scala | 2 +- .../query-tests/explain-results/melt_no_values.explain| 2 +- .../query-tests/explain-results/melt_values.explain | 2 +- .../query-tests/explain-results/unpivot_no_values.explain | 2 +- .../query-tests/explain-results/unpivot_values.explain| 2 +- .../resources/query-tests/queries/melt_no_values.json | 1 + .../query-tests/queries/melt_no_values.proto.bin | Bin 71 -> 77 bytes .../test/resources/query-tests/queries/melt_values.json | 1 + .../resources/query-tests/queries/melt_values.proto.bin | Bin 73 -> 79 bytes .../resources/query-tests/queries/unpivot_no_values.json | 1 + .../query-tests/queries/unpivot_no_values.proto.bin | Bin 64 -> 70 bytes .../resources/query-tests/queries/unpivot_values.json | 1 + .../query-tests/queries/unpivot_values.proto.bin | Bin 80 -> 86 bytes 13 files changed, 9 insertions(+), 5 deletions(-) diff --git a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala index ca90afa14cf3..3fd93f09b9af 100644 --- a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala +++ b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala @@ -1142,7 +1142,7 @@ class Dataset[T] private[sql] ( val unpivot = builder.getUnpivotBuilder .setInput(plan.getRoot) .addAllIds(ids.toSeq.map(_.expr).asJava) - .setValueColumnName(variableColumnName) + .setVariableColumnName(variableColumnName) .setValueColumnName(valueColumnName) valuesOption.foreach { values => unpivot.getValuesBuilder diff --git a/connector/connect/common/src/test/resources/query-tests/explain-results/melt_no_values.explain b/connector/connect/common/src/test/resources/query-tests/explain-results/melt_no_values.explain index f61fc30a3a52..053937d84ec8 100644 --- a/connector/connect/common/src/test/resources/query-tests/explain-results/melt_no_values.explain +++ b/connector/connect/common/src/test/resources/query-tests/explain-results/melt_no_values.explain @@ -1,2 +1,2 @@ -Expand [[id#0L, a#0, b, b#0]], [id#0L, a#0, #0, value#0] +Expand [[id#0L, a#0, b, b#0]], [id#0L, a#0, name#0, value#0] +- LocalRelation , [id#0L, a#0, b#0] diff --git a/connector/connect/common/src/test/resources/query-tests/explain-results/melt_values.explain b/connector/conne
(spark) branch master updated: [SPARK-49602][BUILD] Fix `assembly/pom.xml` to use `{project.version}` instead of `{version}`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 591a60df788a [SPARK-49602][BUILD] Fix `assembly/pom.xml` to use `{project.version}` instead of `{version}` 591a60df788a is described below commit 591a60df788ae72226375f2d3e85c203200b4b93 Author: Dongjoon Hyun AuthorDate: Wed Sep 11 13:03:25 2024 -0700 [SPARK-49602][BUILD] Fix `assembly/pom.xml` to use `{project.version}` instead of `{version}` ### What changes were proposed in this pull request? This PR aims to fix `assembly/pom.xml` to use `{project.version}` instead of `{version}`. The original change was introduced recently by - #47402 ### Why are the changes needed? **BEFORE** ``` $ mvn clean | head -n9 [INFO] Scanning for projects... [WARNING] [WARNING] Some problems were encountered while building the effective model for org.apache.spark:spark-assembly_2.13:pom:4.0.0-SNAPSHOT [WARNING] The expression ${version} is deprecated. Please use ${project.version} instead. [WARNING] [WARNING] It is highly recommended to fix these problems because they threaten the stability of your build. [WARNING] [WARNING] For this reason, future Maven versions might no longer support building such malformed projects. [WARNING] ``` **AFTER** ``` $ mvn clean | head -n9 [INFO] Scanning for projects... [INFO] [INFO] Detecting the operating system and CPU architecture [INFO] [INFO] os.detected.name: osx [INFO] os.detected.arch: aarch_64 [INFO] os.detected.version: 15.0 [INFO] os.detected.version.major: 15 [INFO] os.detected.version.minor: 0 ``` ### Does this PR introduce _any_ user-facing change? No, this is a dev-only change for building distribution. ### How was this patch tested? Manual test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48081 from dongjoon-hyun/SPARK-49602. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- assembly/pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/assembly/pom.xml b/assembly/pom.xml index 8b21f7e808ce..4b074a88dab4 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -200,7 +200,7 @@ cp - ${basedir}/../connector/connect/client/jvm/target/spark-connect-client-jvm_${scala.binary.version}-${version}.jar + ${basedir}/../connector/connect/client/jvm/target/spark-connect-client-jvm_${scala.binary.version}-${project.version}.jar ${basedir}/target/scala-${scala.binary.version}/jars/connect-repl - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-49310][BUILD] Upgrade `Parquet` to 1.14.2
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a9502d42a500 [SPARK-49310][BUILD] Upgrade `Parquet` to 1.14.2 a9502d42a500 is described below commit a9502d42a5002bac9bee74fef8427a28cf1ddf86 Author: Fokko AuthorDate: Wed Sep 11 11:53:11 2024 -0700 [SPARK-49310][BUILD] Upgrade `Parquet` to 1.14.2 ### What changes were proposed in this pull request? This PR aims to upgrade Parquet to 1.14.2. ### Why are the changes needed? To bring the latest bug fixes. - https://mvnrepository.com/artifact/org.apache.parquet/parquet-common/1.14.2 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47807 from Fokko/fd-parquet. Lead-authored-by: Fokko Co-authored-by: Dongjoon Hyun Co-authored-by: Fokko Driesprong Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 12 ++-- pom.xml | 2 +- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 69123f91fcaf..c89c92815d45 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -236,12 +236,12 @@ orc-shims/2.0.2//orc-shims-2.0.2.jar oro/2.0.8//oro-2.0.8.jar osgi-resource-locator/1.0.3//osgi-resource-locator-1.0.3.jar paranamer/2.8//paranamer-2.8.jar -parquet-column/1.14.1//parquet-column-1.14.1.jar -parquet-common/1.14.1//parquet-common-1.14.1.jar -parquet-encoding/1.14.1//parquet-encoding-1.14.1.jar -parquet-format-structures/1.14.1//parquet-format-structures-1.14.1.jar -parquet-hadoop/1.14.1//parquet-hadoop-1.14.1.jar -parquet-jackson/1.14.1//parquet-jackson-1.14.1.jar +parquet-column/1.14.2//parquet-column-1.14.2.jar +parquet-common/1.14.2//parquet-common-1.14.2.jar +parquet-encoding/1.14.2//parquet-encoding-1.14.2.jar +parquet-format-structures/1.14.2//parquet-format-structures-1.14.2.jar +parquet-hadoop/1.14.2//parquet-hadoop-1.14.2.jar +parquet-jackson/1.14.2//parquet-jackson-1.14.2.jar pickle/1.5//pickle-1.5.jar py4j/0.10.9.7//py4j-0.10.9.7.jar remotetea-oncrpc/1.1.2//remotetea-oncrpc-1.1.2.jar diff --git a/pom.xml b/pom.xml index 4b769b1f7fee..6f5c9b63f86d 100644 --- a/pom.xml +++ b/pom.xml @@ -137,7 +137,7 @@ 3.8.0 10.16.1.1 -1.14.1 +1.14.2 2.0.2 shaded-protobuf 11.0.23 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-49085][CONNECT][BUILD][FOLLOWUP] Remove the erroneous `type` definition for `spark-protobuf` from `sql/connect/server/pom.xml`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new c5fd509ad3c0 [SPARK-49085][CONNECT][BUILD][FOLLOWUP] Remove the erroneous `type` definition for `spark-protobuf` from `sql/connect/server/pom.xml` c5fd509ad3c0 is described below commit c5fd509ad3c0781f8511986f3c4e52692bc783c4 Author: yangjie01 AuthorDate: Wed Sep 11 10:57:35 2024 -0700 [SPARK-49085][CONNECT][BUILD][FOLLOWUP] Remove the erroneous `type` definition for `spark-protobuf` from `sql/connect/server/pom.xml` ### What changes were proposed in this pull request? This pr corrects the erroneous changes made in https://github.com/apache/spark/pull/48051/files, remove the erroneous `type` definition for `spark-protobuf` from `sql/connect/server/pom.xml`. ### Why are the changes needed? When maven testing the connect server module, what we need is `spark-protobuf_2.13-4.0.0-SNAPSHOT.jar` rather than `spark-protobuf_2.13-4.0.0-SNAPSHOT-tests.jar`. And https://github.com/apache/spark/pull/48051/files caused the failure of the Maven daily test: - https://github.com/apache/spark/actions/runs/10812252676/job/30002163824 ``` - from_protobuf_messageClassName_descFilePath *** FAILED *** org.apache.spark.sql.AnalysisException: [PROTOBUF_NOT_LOADED_SQL_FUNCTIONS_UNUSABLE] Cannot call the FROM_PROTOBUF SQL function because the Protobuf data source is not loaded. Please restart your job or session with the 'spark-protobuf' package loaded, such as by using the --packages argument on the command line, and then retry your query or command again. SQLSTATE: 22KD3 at org.apache.spark.sql.errors.QueryCompilationErrors$.protobufNotLoadedSqlFunctionsUnusable(QueryCompilationErrors.scala:4096) at org.apache.spark.sql.catalyst.expressions.FromProtobuf.liftedTree1$1(toFromProtobufSqlFunctions.scala:184) at org.apache.spark.sql.catalyst.expressions.FromProtobuf.replacement$lzycompute(toFromProtobufSqlFunctions.scala:178) at org.apache.spark.sql.catalyst.expressions.FromProtobuf.replacement(toFromProtobufSqlFunctions.scala:157) at org.apache.spark.sql.catalyst.expressions.RuntimeReplaceable.dataType(Expression.scala:417) at org.apache.spark.sql.catalyst.expressions.RuntimeReplaceable.dataType$(Expression.scala:417) at org.apache.spark.sql.catalyst.expressions.FromProtobuf.dataType(toFromProtobufSqlFunctions.scala:86) at org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:195) at org.apache.spark.sql.catalyst.plans.logical.Project.$anonfun$output$1(basicLogicalOperators.scala:74) at scala.collection.immutable.List.map(List.scala:247) ... ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Actions - Manual check: ``` build/mvn clean install -DskipTests -Phive build/mvn test -pl sql/connect/server ``` **Before** ``` ... - from_protobuf_messageClassName_descFilePath_options *** FAILED *** org.apache.spark.sql.AnalysisException: [PROTOBUF_NOT_LOADED_SQL_FUNCTIONS_UNUSABLE] Cannot call the FROM_PROTOBUF SQL function because the Protobuf data source is not loaded. Please restart your job or session with the 'spark-protobuf' package loaded, such as by using the --packages argument on the command line, and then retry your query or command again. SQLSTATE: 22KD3 at org.apache.spark.sql.errors.QueryCompilationErrors$.protobufNotLoadedSqlFunctionsUnusable(QueryCompilationErrors.scala:4096) at org.apache.spark.sql.catalyst.expressions.FromProtobuf.liftedTree1$1(toFromProtobufSqlFunctions.scala:184) at org.apache.spark.sql.catalyst.expressions.FromProtobuf.replacement$lzycompute(toFromProtobufSqlFunctions.scala:178) at org.apache.spark.sql.catalyst.expressions.FromProtobuf.replacement(toFromProtobufSqlFunctions.scala:157) at org.apache.spark.sql.catalyst.expressions.RuntimeReplaceable.dataType(Expression.scala:417) at org.apache.spark.sql.catalyst.expressions.RuntimeReplaceable.dataType$(Expression.scala:417) at org.apache.spark.sql.catalyst.expressions.FromProtobuf.dataType(toFromProtobufSqlFunctions.scala:86) at org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:195) at org.apache.spark.sql.catalyst.plans.logical.Project.$anonfun$output$1(basicLogicalOperators.scala:74) at scala.collection.immutable.List.map(List.scala:247) ... Run completed in 2 minutes, 19 seconds. Total number of tests run: 948 Suites: completed 27, aborted 0 Tests: succeeded 942, failed 6, canceled 0, ignored 0, pending 0 *** 6 TESTS FAILED ***
(spark) branch master updated: [SPARK-49600][PYTHON] Remove `Python 3.6 and older`-related logic from `try_simplify_traceback`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new aee9f60b3966 [SPARK-49600][PYTHON] Remove `Python 3.6 and older`-related logic from `try_simplify_traceback` aee9f60b3966 is described below commit aee9f60b39669c7f32152a7f754e611de8af2592 Author: Dongjoon Hyun AuthorDate: Wed Sep 11 10:33:45 2024 -0700 [SPARK-49600][PYTHON] Remove `Python 3.6 and older`-related logic from `try_simplify_traceback` ### What changes were proposed in this pull request? Apache Spark 4.0.0 supports only Python 3.9+. - #46228 ### Why are the changes needed? To simplify and clarify the logic. I manually confirmed that this is the last logic about `sys.version_info` and `(3, 7)`. ``` $ git grep 'sys.version_info' | grep '(3, 7)' python/pyspark/util.py:if sys.version_info[:2] < (3, 7): python/pyspark/util.py:if "pypy" not in platform.python_implementation().lower() and sys.version_info[:2] >= (3, 7): ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48078 from dongjoon-hyun/SPARK-49600. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- python/pyspark/util.py | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/python/pyspark/util.py b/python/pyspark/util.py index 205e3d957a41..cca44435efe6 100644 --- a/python/pyspark/util.py +++ b/python/pyspark/util.py @@ -262,10 +262,6 @@ def try_simplify_traceback(tb: TracebackType) -> Optional[TracebackType]: if "pypy" in platform.python_implementation().lower(): # Traceback modification is not supported with PyPy in PySpark. return None -if sys.version_info[:2] < (3, 7): -# Traceback creation is not supported Python < 3.7. -# See https://bugs.python.org/issue30579. -return None import pyspark @@ -791,7 +787,7 @@ def is_remote_only() -> bool: if __name__ == "__main__": -if "pypy" not in platform.python_implementation().lower() and sys.version_info[:2] >= (3, 7): +if "pypy" not in platform.python_implementation().lower() and sys.version_info[:2] >= (3, 9): import doctest import pyspark.util from pyspark.core.context import SparkContext - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-49595][CONNECT][SQL] Fix `DataFrame.unpivot/melt` in Spark Connect Scala Client
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 96eebebaf514 [SPARK-49595][CONNECT][SQL] Fix `DataFrame.unpivot/melt` in Spark Connect Scala Client 96eebebaf514 is described below commit 96eebebaf5146144ff900c8081dfa5c5960b3bb2 Author: Xinrong Meng AuthorDate: Wed Sep 11 08:52:33 2024 -0700 [SPARK-49595][CONNECT][SQL] Fix `DataFrame.unpivot/melt` in Spark Connect Scala Client Fix DataFrame.unpivot/melt in Spark Connect Scala Client by correctly assigning the name for the variable column. The original code used `setValueColumnName` for both the variable and value columns. This fix is necessary to ensure the correct behavior of the unpivot/melt operation. Yes. Variable and value columns can be set correctly as shown below. ```scala scala> val df = Seq((1, 11, 12L), (2, 21, 22L)).toDF("id", "int", "long") df: org.apache.spark.sql.package.DataFrame = [id: int, int: int ... 1 more field] scala> df.show() +---+---++ | id|int|long| +---+---++ | 1| 11| 12| | 2| 21| 22| +---+---++ ``` FROM (current master) ```scala scala> df.unpivot(Array($"id"), Array($"int", $"long"), "variable", "value").show() +---++-+ | id||value| +---++-+ | 1| int| 11| | 1|long| 12| | 2| int| 21| | 2|long| 22| +---++-+ ``` TO ```scala scala> df.unpivot(Array($"id"), Array($"int", $"long"), "variable", "value").show() +---++-+ | id|variable|value| +---++-+ | 1| int| 11| | 1|long| 12| | 2| int| 21| | 2|long| 22| +---++-+ ``` Existing tests. No. Closes #48069 from xinrong-meng/fix_unpivot. Authored-by: Xinrong Meng Signed-off-by: Dongjoon Hyun (cherry picked from commit e63b5601c1bd74b2b0054d48f944424d12b79835) Signed-off-by: Dongjoon Hyun --- .../jvm/src/main/scala/org/apache/spark/sql/Dataset.scala | 2 +- .../query-tests/explain-results/melt_no_values.explain| 2 +- .../query-tests/explain-results/melt_values.explain | 2 +- .../query-tests/explain-results/unpivot_no_values.explain | 2 +- .../query-tests/explain-results/unpivot_values.explain| 2 +- .../resources/query-tests/queries/melt_no_values.json | 1 + .../query-tests/queries/melt_no_values.proto.bin | Bin 71 -> 77 bytes .../test/resources/query-tests/queries/melt_values.json | 1 + .../resources/query-tests/queries/melt_values.proto.bin | Bin 73 -> 79 bytes .../resources/query-tests/queries/unpivot_no_values.json | 1 + .../query-tests/queries/unpivot_no_values.proto.bin | Bin 64 -> 70 bytes .../resources/query-tests/queries/unpivot_values.json | 1 + .../query-tests/queries/unpivot_values.proto.bin | Bin 80 -> 86 bytes 13 files changed, 9 insertions(+), 5 deletions(-) diff --git a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala index bdaa4e28ba89..865596a669a0 100644 --- a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala +++ b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala @@ -1291,7 +1291,7 @@ class Dataset[T] private[sql] ( val unpivot = builder.getUnpivotBuilder .setInput(plan.getRoot) .addAllIds(ids.toSeq.map(_.expr).asJava) - .setValueColumnName(variableColumnName) + .setVariableColumnName(variableColumnName) .setValueColumnName(valueColumnName) valuesOption.foreach { values => unpivot.getValuesBuilder diff --git a/connector/connect/common/src/test/resources/query-tests/explain-results/melt_no_values.explain b/connector/connect/common/src/test/resources/query-tests/explain-results/melt_no_values.explain index f61fc30a3a52..053937d84ec8 100644 --- a/connector/connect/common/src/test/resources/query-tests/explain-results/melt_no_values.explain +++ b/connector/connect/common/src/test/resources/query-tests/explain-results/melt_no_values.explain @@ -1,2 +1,2 @@ -Expand [[id#0L, a#0, b, b#0]], [id#0L, a#0, #0, value#0] +Expand [[id#0L, a#0, b, b#0]], [id#0L, a#0, name#0, value#0] +- LocalRelation , [id#0L, a#0, b#0] diff --git a/connector/connect/common/src/test/resources/query-tests/explain-results/melt_values.explain b/connector/connect/common/src/test/resources/query-tests/explain-results/melt_values.explain index b5742d976dee..5a
(spark) branch master updated: [SPARK-49595][CONNECT][SQL] Fix `DataFrame.unpivot/melt` in Spark Connect Scala Client
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e63b5601c1bd [SPARK-49595][CONNECT][SQL] Fix `DataFrame.unpivot/melt` in Spark Connect Scala Client e63b5601c1bd is described below commit e63b5601c1bd74b2b0054d48f944424d12b79835 Author: Xinrong Meng AuthorDate: Wed Sep 11 08:52:33 2024 -0700 [SPARK-49595][CONNECT][SQL] Fix `DataFrame.unpivot/melt` in Spark Connect Scala Client ### What changes were proposed in this pull request? Fix DataFrame.unpivot/melt in Spark Connect Scala Client by correctly assigning the name for the variable column. The original code used `setValueColumnName` for both the variable and value columns. ### Why are the changes needed? This fix is necessary to ensure the correct behavior of the unpivot/melt operation. ### Does this PR introduce _any_ user-facing change? Yes. Variable and value columns can be set correctly as shown below. ```scala scala> val df = Seq((1, 11, 12L), (2, 21, 22L)).toDF("id", "int", "long") df: org.apache.spark.sql.package.DataFrame = [id: int, int: int ... 1 more field] scala> df.show() +---+---++ | id|int|long| +---+---++ | 1| 11| 12| | 2| 21| 22| +---+---++ ``` FROM (current master) ```scala scala> df.unpivot(Array($"id"), Array($"int", $"long"), "variable", "value").show() +---++-+ | id||value| +---++-+ | 1| int| 11| | 1|long| 12| | 2| int| 21| | 2|long| 22| +---++-+ ``` TO ```scala scala> df.unpivot(Array($"id"), Array($"int", $"long"), "variable", "value").show() +---++-+ | id|variable|value| +---++-+ | 1| int| 11| | 1|long| 12| | 2| int| 21| | 2|long| 22| +---++-+ ``` ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48069 from xinrong-meng/fix_unpivot. Authored-by: Xinrong Meng Signed-off-by: Dongjoon Hyun --- .../jvm/src/main/scala/org/apache/spark/sql/Dataset.scala | 2 +- .../query-tests/explain-results/melt_no_values.explain| 2 +- .../query-tests/explain-results/melt_values.explain | 2 +- .../query-tests/explain-results/unpivot_no_values.explain | 2 +- .../query-tests/explain-results/unpivot_values.explain| 2 +- .../resources/query-tests/queries/melt_no_values.json | 1 + .../query-tests/queries/melt_no_values.proto.bin | Bin 71 -> 77 bytes .../test/resources/query-tests/queries/melt_values.json | 1 + .../resources/query-tests/queries/melt_values.proto.bin | Bin 73 -> 79 bytes .../resources/query-tests/queries/unpivot_no_values.json | 1 + .../query-tests/queries/unpivot_no_values.proto.bin | Bin 64 -> 70 bytes .../resources/query-tests/queries/unpivot_values.json | 1 + .../query-tests/queries/unpivot_values.proto.bin | Bin 80 -> 86 bytes 13 files changed, 9 insertions(+), 5 deletions(-) diff --git a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala index f5606215be89..519193ebd9c7 100644 --- a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala +++ b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala @@ -481,7 +481,7 @@ class Dataset[T] private[sql] ( val unpivot = builder.getUnpivotBuilder .setInput(plan.getRoot) .addAllIds(ids.toImmutableArraySeq.map(_.expr).asJava) - .setValueColumnName(variableColumnName) + .setVariableColumnName(variableColumnName) .setValueColumnName(valueColumnName) valuesOption.foreach { values => unpivot.getValuesBuilder diff --git a/sql/connect/common/src/test/resources/query-tests/explain-results/melt_no_values.explain b/sql/connect/common/src/test/resources/query-tests/explain-results/melt_no_values.explain index f61fc30a3a52..053937d84ec8 100644 --- a/sql/connect/common/src/test/resources/query-tests/explain-results/melt_no_values.explain +++ b/sql/connect/common/src/test/resources/query-tests/explain-results/melt_no_values.explain @@ -1,2 +1,2 @@ -Expand [[id#0L, a#0, b, b#0]], [id#0L, a#0, #0, value#0] +Expand [[id#0L, a#0, b, b#0]], [id#0L, a#0, name#0, value#0] +- LocalRelation , [id#0L, a#0, b#0] diff --git a/sql/connect/common/src/test/resources/query-tests/explain-results/melt_values.explai
(spark-kubernetes-operator) branch main updated: [SPARK-49527] Add `ConfOptionDocGenerator` to generate Spark Operator Config Property Doc
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git The following commit(s) were added to refs/heads/main by this push: new e842e3b [SPARK-49527] Add `ConfOptionDocGenerator` to generate Spark Operator Config Property Doc e842e3b is described below commit e842e3bf8f08c5999d050aac059414424def5fa7 Author: zhou-jiang AuthorDate: Wed Sep 11 08:48:18 2024 -0700 [SPARK-49527] Add `ConfOptionDocGenerator` to generate Spark Operator Config Property Doc ### What changes were proposed in this pull request? This PR adds a module `docs-utils` to automatically generate config properties doc page from source. ### Why are the changes needed? This helps to keep config property docs up-to-date for properties by adding it as a gradle task. ### Does this PR introduce _any_ user-facing change? No (doc only, not released) ### How was this patch tested? Pass the CIs ### Was this patch authored or co-authored using generative AI tooling? No Closes #118 from jiangzho/doc_utils. Lead-authored-by: zhou-jiang Co-authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .../docs-utils/build.gradle| 26 ++- .../k8s/operator/utils/ConfOptionDocGenerator.java | 87 ++ .../apache/spark/k8s/operator/utils/DocTable.java | 69 + settings.gradle| 2 + .../spark/k8s/operator/config/ConfigOption.java| 4 +- 5 files changed, 182 insertions(+), 6 deletions(-) diff --git a/settings.gradle b/build-tools/docs-utils/build.gradle similarity index 57% copy from settings.gradle copy to build-tools/docs-utils/build.gradle index 8b2b816..2cdde29 100644 --- a/settings.gradle +++ b/build-tools/docs-utils/build.gradle @@ -16,7 +16,25 @@ * specific language governing permissions and limitations * under the License. */ -rootProject.name = 'apache-spark-kubernetes-operator' -include 'spark-operator-api' -include 'spark-submission-worker' -include 'spark-operator' + +ext { +javaMainClass = "org.apache.spark.k8s.operator.utils.ConfOptionDocGenerator" +docsPath = System.getProperty("user.dir") + "/docs" +} + +dependencies { +implementation project(":spark-operator") +implementation(libs.log4j.core) +implementation(libs.log4j.slf4j.impl) +compileOnly(libs.lombok) +annotationProcessor(libs.lombok) +} + +test { +useJUnitPlatform() +} + +tasks.register('generateConfPropsDoc', Exec) { +description = "Generate config properties doc for operator" +commandLine "java", "-classpath", sourceSets.main.runtimeClasspath.getAsPath(), javaMainClass, docsPath +} diff --git a/build-tools/docs-utils/src/main/java/org/apache/spark/k8s/operator/utils/ConfOptionDocGenerator.java b/build-tools/docs-utils/src/main/java/org/apache/spark/k8s/operator/utils/ConfOptionDocGenerator.java new file mode 100644 index 000..c0f2f49 --- /dev/null +++ b/build-tools/docs-utils/src/main/java/org/apache/spark/k8s/operator/utils/ConfOptionDocGenerator.java @@ -0,0 +1,87 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.spark.k8s.operator.utils; + +import java.io.File; +import java.io.IOException; +import java.io.PrintWriter; +import java.lang.reflect.Field; +import java.util.List; + +import lombok.extern.slf4j.Slf4j; + +import org.apache.spark.k8s.operator.config.ConfigOption; +import org.apache.spark.k8s.operator.config.SparkOperatorConf; + +@Slf4j +public class ConfOptionDocGenerator { + public static final String CONF_FILE_NAME = "config_properties.md"; + public static final String DEFAULT_DOCS_PATH = "docs"; + public static final String GENERATED_FILE_HEADER = + "This doc is automatically generated by gradle task, manual updates would be overridden."; + + public void generate(String docsPath) throws
(spark) branch master updated (cc6d6f17bdee -> 70482f6f82b1)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from cc6d6f17bdee [SPARK-49519][SQL] Merge options of table and relation when constructing FileScanBuilder add 70482f6f82b1 [SPARK-49599][BUILD] Upgrade snappy-java to 1.1.10.7 No new revisions were added by this update. Summary of changes: dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- pom.xml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org