(spark) branch master updated: [SPARK-47701][SQL][TESTS] Postgres: Add test for Composite and Range types
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 9a20794b252d [SPARK-47701][SQL][TESTS] Postgres: Add test for Composite and Range types 9a20794b252d is described below commit 9a20794b252d207b6864f656e7fab85007911537 Author: Kent Yao AuthorDate: Tue Apr 2 22:43:05 2024 -0700 [SPARK-47701][SQL][TESTS] Postgres: Add test for Composite and Range types ### What changes were proposed in this pull request? Add tests for Composite and Range types of postgres. ### Why are the changes needed? test improvments ### Does this PR introduce _any_ user-facing change? no, test-only ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #45827 from yaooqinn/SPARK-47701. Authored-by: Kent Yao Signed-off-by: Dongjoon Hyun --- .../spark/sql/jdbc/PostgresIntegrationSuite.scala | 26 ++ 1 file changed, 26 insertions(+) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala index 9015434cedd0..f70bd8091204 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala @@ -178,6 +178,15 @@ class PostgresIntegrationSuite extends DockerJDBCIntegrationSuite { conn.prepareStatement("CREATE TABLE test_bit_array (c1 bit(1)[], c2 bit(5)[])").executeUpdate() conn.prepareStatement("INSERT INTO test_bit_array VALUES (ARRAY[B'1', B'0'], " + "ARRAY[B'1', B'00010'])").executeUpdate() + +conn.prepareStatement( + """ +|CREATE TYPE complex AS ( +|b bool, +|d double precision +|)""".stripMargin).executeUpdate() +conn.prepareStatement("CREATE TABLE complex_table (c1 complex)").executeUpdate() +conn.prepareStatement("INSERT INTO complex_table VALUES (ROW(true, 1.0))").executeUpdate() } test("Type mapping for various types") { @@ -516,4 +525,21 @@ class PostgresIntegrationSuite extends DockerJDBCIntegrationSuite { }, errorClass = null) } + + test("SPARK-47701: Reading complex type") { +val df = spark.read.jdbc(jdbcUrl, "complex_table", new Properties) +checkAnswer(df, Row("(t,1)")) +val df2 = spark.read.format("jdbc") + .option("url", jdbcUrl) + .option("query", "SELECT (c1).b, (c1).d FROM complex_table").load() +checkAnswer(df2, Row(true, 1.0d)) + } + + test("SPARK-47701: Range Types") { +val df = spark.read.format("jdbc") + .option("url", jdbcUrl) + .option("query", "SELECT '[3,7)'::int4range") + .load() +checkAnswer(df, Row("[3,7)")) + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-45733][PYTHON][TESTS][FOLLOWUP] Skip `pyspark.sql.tests.connect.client.test_client` if not should_test_connect
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 360a3f9023d0 [SPARK-45733][PYTHON][TESTS][FOLLOWUP] Skip `pyspark.sql.tests.connect.client.test_client` if not should_test_connect 360a3f9023d0 is described below commit 360a3f9023d08812e3f3c44af9cdac644c5d67b2 Author: Dongjoon Hyun AuthorDate: Tue Apr 2 22:30:08 2024 -0700 [SPARK-45733][PYTHON][TESTS][FOLLOWUP] Skip `pyspark.sql.tests.connect.client.test_client` if not should_test_connect ### What changes were proposed in this pull request? This is a follow-up of the following. - https://github.com/apache/spark/pull/43591 ### Why are the changes needed? This test requires `pandas` which is an optional dependency in Apache Spark. ``` $ python/run-tests --modules=pyspark-connect --parallelism=1 --python-executables=python3.10 --testnames 'pyspark.sql.tests.connect.client.test_client' Running PySpark tests. Output is in /Users/dongjoon/APACHE/spark-merge/python/unit-tests.log Will test against the following Python executables: ['python3.10'] Will test the following Python tests: ['pyspark.sql.tests.connect.client.test_client'] python3.10 python_implementation is CPython python3.10 version is: Python 3.10.13 Starting test(python3.10): pyspark.sql.tests.connect.client.test_client (temp output: /Users/dongjoon/APACHE/spark-merge/python/target/216a8716-3a1f-4cf9-9c7c-63087f29f892/python3.10__pyspark.sql.tests.connect.client.test_client__tydue4ck.log) Traceback (most recent call last): File "/Users/dongjoon/.pyenv/versions/3.10.13/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/Users/dongjoon/.pyenv/versions/3.10.13/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/Users/dongjoon/APACHE/spark-merge/python/pyspark/sql/tests/connect/client/test_client.py", line 137, in class TestPolicy(DefaultPolicy): NameError: name 'DefaultPolicy' is not defined ``` ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Pass the CIs and manually test without `pandas`. ``` $ pip3 uninstall pandas $ python/run-tests --modules=pyspark-connect --parallelism=1 --python-executables=python3.10 --testnames 'pyspark.sql.tests.connect.client.test_client' Running PySpark tests. Output is in /Users/dongjoon/APACHE/spark-merge/python/unit-tests.log Will test against the following Python executables: ['python3.10'] Will test the following Python tests: ['pyspark.sql.tests.connect.client.test_client'] python3.10 python_implementation is CPython python3.10 version is: Python 3.10.13 Starting test(python3.10): pyspark.sql.tests.connect.client.test_client (temp output: /Users/dongjoon/APACHE/spark-merge/python/target/acf07ed5-938a-4272-87e1-47e3bf8b988e/python3.10__pyspark.sql.tests.connect.client.test_client__sfdosnek.log) Finished test(python3.10): pyspark.sql.tests.connect.client.test_client (0s) ... 13 tests were skipped Tests passed in 0 seconds Skipped tests in pyspark.sql.tests.connect.client.test_client with python3.10: test_basic_flow (pyspark.sql.tests.connect.client.test_client.SparkConnectClientReattachTestCase) ... skip (0.002s) test_fail_and_retry_during_execute (pyspark.sql.tests.connect.client.test_client.SparkConnectClientReattachTestCase) ... skip (0.000s) test_fail_and_retry_during_reattach (pyspark.sql.tests.connect.client.test_client.SparkConnectClientReattachTestCase) ... skip (0.000s) test_fail_during_execute (pyspark.sql.tests.connect.client.test_client.SparkConnectClientReattachTestCase) ... skip (0.000s) test_channel_builder (pyspark.sql.tests.connect.client.test_client.SparkConnectClientTestCase) ... skip (0.000s) test_channel_builder_with_session (pyspark.sql.tests.connect.client.test_client.SparkConnectClientTestCase) ... skip (0.000s) test_interrupt_all (pyspark.sql.tests.connect.client.test_client.SparkConnectClientTestCase) ... skip (0.000s) test_is_closed (pyspark.sql.tests.connect.client.test_client.SparkConnectClientTestCase) ... skip (0.000s) test_properties (pyspark.sql.tests.connect.client.test_client.SparkConnectClientTestCase) ... skip (0.000s) test_retry (pyspark.sql.tests.connect.client.test_client.SparkConnectClientTestCase) ... skip (0.000s) test_retry_client_unit (pyspark.sql.tests.connect.client.test_client.SparkConnectClientTestCase) ... skip (0.000s) test_user_agent_default (pyspark.sql.tests.connect.client.test_client.SparkConnectClientTestCase) ... skip
Re: [PR] Add Spark Connect page [spark-website]
dongjoon-hyun commented on code in PR #511: URL: https://github.com/apache/spark-website/pull/511#discussion_r1548892591 ## spark-connect/index.md: ## @@ -0,0 +1,151 @@ +--- +layout: global +type: "page singular" +title: Spark Connect +description: Spark Connect makes remote Spark development easier. +subproject: Spark Connect +--- + +This post explains the Spark Connect architecture, the benefits of Spark Connect, and how to upgrade to Spark Connect. + +Let’s start by exploring the architecture of Spark Connect at a high level. + +## High-level Spark Connect architecture + +Spark Connect is a protocol that specifies how a client application can communicate with a remote Spark Server. Clients that implement the Spark Connect protocol can connect and make requests to remote Spark Servers, very similar to how client applications can connect to databases using a JDBC driver - a query `spark.table("some_table").limit(5)` should simply return the result. This architecture gives end users a great developer experience. + +Here’s how Spark Connect works at a high level: + +1. A connection is established between the Client and Spark Server +2. The Client converts a DataFrame query to an unresolved logical plan +3. The unresolved logical plan is encoded and sent to the Spark Server +4. The Spark Server runs the query +5. The Spark Server sends the results back to the Client + + + +Let’s go through these steps in more detail to get a better understanding of the inner workings of Spark Connect. + +**Establishing a connection between the Client and Spark Server** + +The network communication for Spark Connect uses the [gRPC framework](https://grpc.io/). + +gRPC is performant and language agnostic. Spark Connect uses language-agnostic technologies, so it’s portable. + +**Converting a DataFrame query to an unresolved logical plan** + +The Client parses DataFrame queries and converts them to unresolved logical plans. + +Suppose you have the following DataFrame query: `spark.table("some_table").limit(5)`. + +Here’s the unresolved logical plan for the query: + +``` +== Parsed Logical Plan == +GlobalLimit 5 ++- LocalLimit 5 + +- SubqueryAlias spark_catalog.default.some_table + +- Relation spark_catalog.default.some_table[character#15,franchise#16] parquet +``` + +The Client is responsible for creating the unresolved logical plan and passing it to the Spark Server for execution. + +**Sending the unresolved logical plan to the Spark Server** + +The unresolved logical plan must be serialized so it can be sent over a network. Spark Connect uses Protocol Buffers, which are “language-neutral, platform-neutral extensible mechanisms for serializing structured data”. + +The Client and the Spark Server must be able to communicate with a language-neutral format like Protocol Buffers because they might be using different programming languages or different software versions. + +Now let’s look at how the Spark Server executes the query. + +**Executing the query on the Spark Server** + +The Spark Server receives the unresolved logical plan (once the Protocol Buffer is deserialized) and executes it just like any other query. + +Spark performs many optimizations to an unresolved logical plan before executing the query. All of these optimizations happen on the Spark Server. + +Spark Connect lets you leverage Spark’s powerful query optimization capabilities, even with Clients that don’t depend on Spark or the JVM. + +**Sending the results back to the Client** + +The Spark Server sends the results back to the Client after executing the query. + +The results are sent to the client as Apache Arrow record batches. A record batch includes many rows of data. + +The record batch is streamed to the client, which means it is sent in partial chunks, not all at once. Streaming the results from the Spark Server to the Client prevents memory issues caused by an excessively large request. + +Here’s a recap of how Spark Connect works in image form: + + + +## Benefits of Spark Connect + +Let’s now turn our attention to the benefits of the Spark Connect architecture. + +**Spark Connect workloads are easier to maintain** + +With the Spark JVM architecture, the client and Spark Driver must run identical software versions. They need the same Java, Scala, and other dependency versions. Suppose you develop a Spark project on your local machine, package it as a JAR file, and deploy it in the cloud to run on a production dataset. You need to build the JAR file on your local machine with the same dependencies used in the cloud. If you compile the JAR file with Scala 2.13, you must also provision the cluster with a Spark JAR compiled with Scala 2.13. + +Suppose you are building your JAR with Scala 2.12, and your cloud provider releases a new runtime built with Scala 2.13. For Spark JVM, you need to update your project locally, which may be challenging. For example, when you update your project
Re: [PR] Add Spark Connect page [spark-website]
dongjoon-hyun commented on code in PR #511: URL: https://github.com/apache/spark-website/pull/511#discussion_r1548892050 ## spark-connect/index.md: ## @@ -0,0 +1,151 @@ +--- +layout: global +type: "page singular" +title: Spark Connect +description: Spark Connect makes remote Spark development easier. +subproject: Spark Connect +--- + +This post explains the Spark Connect architecture, the benefits of Spark Connect, and how to upgrade to Spark Connect. + +Let’s start by exploring the architecture of Spark Connect at a high level. + +## High-level Spark Connect architecture + +Spark Connect is a protocol that specifies how a client application can communicate with a remote Spark Server. Clients that implement the Spark Connect protocol can connect and make requests to remote Spark Servers, very similar to how client applications can connect to databases using a JDBC driver - a query `spark.table("some_table").limit(5)` should simply return the result. This architecture gives end users a great developer experience. + +Here’s how Spark Connect works at a high level: + +1. A connection is established between the Client and Spark Server +2. The Client converts a DataFrame query to an unresolved logical plan +3. The unresolved logical plan is encoded and sent to the Spark Server +4. The Spark Server runs the query +5. The Spark Server sends the results back to the Client + + + +Let’s go through these steps in more detail to get a better understanding of the inner workings of Spark Connect. + +**Establishing a connection between the Client and Spark Server** + +The network communication for Spark Connect uses the [gRPC framework](https://grpc.io/). + +gRPC is performant and language agnostic. Spark Connect uses language-agnostic technologies, so it’s portable. + +**Converting a DataFrame query to an unresolved logical plan** + +The Client parses DataFrame queries and converts them to unresolved logical plans. + +Suppose you have the following DataFrame query: `spark.table("some_table").limit(5)`. + +Here’s the unresolved logical plan for the query: + +``` +== Parsed Logical Plan == +GlobalLimit 5 ++- LocalLimit 5 + +- SubqueryAlias spark_catalog.default.some_table + +- Relation spark_catalog.default.some_table[character#15,franchise#16] parquet +``` + +The Client is responsible for creating the unresolved logical plan and passing it to the Spark Server for execution. + +**Sending the unresolved logical plan to the Spark Server** + +The unresolved logical plan must be serialized so it can be sent over a network. Spark Connect uses Protocol Buffers, which are “language-neutral, platform-neutral extensible mechanisms for serializing structured data”. + +The Client and the Spark Server must be able to communicate with a language-neutral format like Protocol Buffers because they might be using different programming languages or different software versions. + +Now let’s look at how the Spark Server executes the query. + +**Executing the query on the Spark Server** + +The Spark Server receives the unresolved logical plan (once the Protocol Buffer is deserialized) and executes it just like any other query. + +Spark performs many optimizations to an unresolved logical plan before executing the query. All of these optimizations happen on the Spark Server. + +Spark Connect lets you leverage Spark’s powerful query optimization capabilities, even with Clients that don’t depend on Spark or the JVM. + +**Sending the results back to the Client** + +The Spark Server sends the results back to the Client after executing the query. + +The results are sent to the client as Apache Arrow record batches. A record batch includes many rows of data. + +The record batch is streamed to the client, which means it is sent in partial chunks, not all at once. Streaming the results from the Spark Server to the Client prevents memory issues caused by an excessively large request. + +Here’s a recap of how Spark Connect works in image form: + + + +## Benefits of Spark Connect + +Let’s now turn our attention to the benefits of the Spark Connect architecture. + +**Spark Connect workloads are easier to maintain** + +With the Spark JVM architecture, the client and Spark Driver must run identical software versions. They need the same Java, Scala, and other dependency versions. Suppose you develop a Spark project on your local machine, package it as a JAR file, and deploy it in the cloud to run on a production dataset. You need to build the JAR file on your local machine with the same dependencies used in the cloud. If you compile the JAR file with Scala 2.13, you must also provision the cluster with a Spark JAR compiled with Scala 2.13. + +Suppose you are building your JAR with Scala 2.12, and your cloud provider releases a new runtime built with Scala 2.13. For Spark JVM, you need to update your project locally, which may be challenging. For example, when you update your project
(spark) branch master updated (2c2a2adc3275 -> 344f640b2f35)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 2c2a2adc3275 [SPARK-47655][SS] Integrate timer with Initial State handling for state-v2 add 344f640b2f35 [SPARK-47454][PYTHON][TESTS][FOLLOWUP] Skip `test_create_dataframe_from_pandas_with_day_time_interval` if pandas is not avaiable No new revisions were added by this update. Summary of changes: python/pyspark/sql/tests/test_creation.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (49b7b6b9fe6b -> 2c2a2adc3275)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 49b7b6b9fe6b [SPARK-47691][SQL] Postgres: Support multi dimensional array on the write side add 2c2a2adc3275 [SPARK-47655][SS] Integrate timer with Initial State handling for state-v2 No new revisions were added by this update. Summary of changes: .../spark/sql/streaming/StatefulProcessor.scala| 7 +- .../streaming/TransformWithStateExec.scala | 3 +- .../TransformWithStateInitialStateSuite.scala | 124 - .../sql/streaming/TransformWithStateSuite.scala| 60 ++ 4 files changed, 164 insertions(+), 30 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (55b5ff6f45fd -> 49b7b6b9fe6b)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 55b5ff6f45fd [SPARK-47669][SQL][CONNECT][PYTHON] Add `Column.try_cast` add 49b7b6b9fe6b [SPARK-47691][SQL] Postgres: Support multi dimensional array on the write side No new revisions were added by this update. Summary of changes: .../spark/sql/jdbc/PostgresIntegrationSuite.scala | 23 ++ .../sql/execution/datasources/jdbc/JdbcUtils.scala | 23 +- .../apache/spark/sql/jdbc/PostgresDialect.scala| 2 +- 3 files changed, 42 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
Re: [PR] Add Spark Connect page [spark-website]
MrPowers commented on PR #511: URL: https://github.com/apache/spark-website/pull/511#issuecomment-2033318574 Thank you for the review @dongjoon-hyun. This pull request was merged, so I will address your comments in another PR. I really appreciate your feedback. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47669][SQL][CONNECT][PYTHON] Add `Column.try_cast`
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 55b5ff6f45fd [SPARK-47669][SQL][CONNECT][PYTHON] Add `Column.try_cast` 55b5ff6f45fd is described below commit 55b5ff6f45fde0048cfd4d8d1a41d6e7f65fd121 Author: Ruifeng Zheng AuthorDate: Wed Apr 3 08:15:08 2024 +0800 [SPARK-47669][SQL][CONNECT][PYTHON] Add `Column.try_cast` ### What changes were proposed in this pull request? Add `try_cast` function in Column APIs ### Why are the changes needed? for functionality parity ### Does this PR introduce _any_ user-facing change? yes ``` >>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame( ... [(2, "123"), (5, "Bob"), (3, None)], ["age", "name"]) >>> df.select(df.name.try_cast("double")).show() +-+ | name| +-+ |123.0| | NULL| | NULL| +-+ ``` ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #45796 from zhengruifeng/connect_try_cast. Authored-by: Ruifeng Zheng Signed-off-by: Ruifeng Zheng --- .../main/scala/org/apache/spark/sql/Column.scala | 35 .../apache/spark/sql/PlanGenerationTestSuite.scala | 4 + .../main/protobuf/spark/connect/expressions.proto | 10 +++ .../explain-results/column_try_cast.explain| 2 + .../query-tests/queries/column_try_cast.json | 29 +++ .../query-tests/queries/column_try_cast.proto.bin | Bin 0 -> 173 bytes .../sql/connect/planner/SparkConnectPlanner.scala | 18 ++-- .../docs/source/reference/pyspark.sql/column.rst | 1 + python/pyspark/sql/column.py | 62 + python/pyspark/sql/connect/column.py | 17 python/pyspark/sql/connect/expressions.py | 14 +++ .../pyspark/sql/connect/proto/expressions_pb2.py | 96 +++-- .../pyspark/sql/connect/proto/expressions_pb2.pyi | 28 ++ .../main/scala/org/apache/spark/sql/Column.scala | 37 14 files changed, 301 insertions(+), 52 deletions(-) diff --git a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Column.scala b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Column.scala index 4cb99541ccf0..dec699f4f1a8 100644 --- a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Column.scala +++ b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Column.scala @@ -1090,6 +1090,41 @@ class Column private[sql] (@DeveloperApi val expr: proto.Expression) extends Log */ def cast(to: String): Column = cast(DataTypeParser.parseDataType(to)) + /** + * Casts the column to a different data type and the result is null on failure. + * {{{ + * // Casts colA to IntegerType. + * import org.apache.spark.sql.types.IntegerType + * df.select(df("colA").try_cast(IntegerType)) + * + * // equivalent to + * df.select(df("colA").try_cast("int")) + * }}} + * + * @group expr_ops + * @since 4.0.0 + */ + def try_cast(to: DataType): Column = Column { builder => +builder.getCastBuilder + .setExpr(expr) + .setType(DataTypeProtoConverter.toConnectProtoType(to)) + .setEvalMode(proto.Expression.Cast.EvalMode.EVAL_MODE_TRY) + } + + /** + * Casts the column to a different data type and the result is null on failure. + * {{{ + * // Casts colA to integer. + * df.select(df("colA").try_cast("int")) + * }}} + * + * @group expr_ops + * @since 4.0.0 + */ + def try_cast(to: String): Column = { +try_cast(DataTypeParser.parseDataType(to)) + } + /** * Returns a sort expression based on the descending order of the column. * {{{ diff --git a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala index 46789057ed3c..5fde8b04735b 100644 --- a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala +++ b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala @@ -871,6 +871,10 @@ class PlanGenerationTestSuite fn.col("a").cast("long") } + columnTest("try_cast") { +fn.col("a").try_cast("long") + } + orderColumnTest("desc") { fn.col("b").desc } diff --git a/connector/connect/common/src/main/protobuf/spark/connect/expressions.proto b/connector/connect/common/src/main/protobuf/spark/connect/expressions.proto index c636bf68..726ae5dd1c21 100644 ---
Re: [PR] Add Spark Connect page [spark-website]
MrPowers commented on code in PR #511: URL: https://github.com/apache/spark-website/pull/511#discussion_r1548753924 ## spark-connect/index.md: ## @@ -0,0 +1,151 @@ +--- +layout: global +type: "page singular" +title: Spark Connect +description: Spark Connect makes remote Spark development easier. +subproject: Spark Connect +--- + +This post explains the Spark Connect architecture, the benefits of Spark Connect, and how to upgrade to Spark Connect. Review Comment: No, I didn't copy this from anywhere, but happy to reword it. Any suggestions? Perhaps "This page" would be better. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
Re: [PR] Add Spark Connect page [spark-website]
dongjoon-hyun commented on code in PR #511: URL: https://github.com/apache/spark-website/pull/511#discussion_r1548741671 ## spark-connect/index.md: ## @@ -0,0 +1,151 @@ +--- +layout: global +type: "page singular" +title: Spark Connect +description: Spark Connect makes remote Spark development easier. +subproject: Spark Connect +--- + +This post explains the Spark Connect architecture, the benefits of Spark Connect, and how to upgrade to Spark Connect. Review Comment: `This post` sounds a little weird in Apache Spark website. Is this copied from your blog? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
Re: [PR] Add Spark Connect page [spark-website]
srowen commented on PR #511: URL: https://github.com/apache/spark-website/pull/511#issuecomment-2033256929 Merged to asf-site -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
Re: [PR] Add Spark Connect page [spark-website]
srowen closed pull request #511: Add Spark Connect page URL: https://github.com/apache/spark-website/pull/511 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47697][INFRA] Add Scala style check for invalid MDC usage
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 27fdf96842ea [SPARK-47697][INFRA] Add Scala style check for invalid MDC usage 27fdf96842ea is described below commit 27fdf96842ea5a98ea5835bbf78b33b26db1fd3b Author: Gengliang Wang AuthorDate: Tue Apr 2 15:45:23 2024 -0700 [SPARK-47697][INFRA] Add Scala style check for invalid MDC usage ### What changes were proposed in this pull request? Add Scala style check for invalid MDC usage to avoid invalid MDC usage `s"Task ${MDC(TASK_ID, taskId)} failed"`, which should be `log"Task ${MDC(TASK_ID, taskId)} failed"`. ### Why are the changes needed? This makes development and PR review of the structured logging migration easier. ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? Manual test, verified it will throw errors on invalid MDC usage. ### Was this patch authored or co-authored using generative AI tooling? No Closes #45823 from gengliangwang/style. Authored-by: Gengliang Wang Signed-off-by: Dongjoon Hyun --- scalastyle-config.xml | 5 + 1 file changed, 5 insertions(+) diff --git a/scalastyle-config.xml b/scalastyle-config.xml index 50bd5a33ccb2..cd5a576c086f 100644 --- a/scalastyle-config.xml +++ b/scalastyle-config.xml @@ -150,6 +150,11 @@ This file is divided into 3 sections: // scalastyle:on println]]> + +s".*\$\{MDC\( + + + spark(.sqlContext)?.sparkContext.hadoopConfiguration
Re: [PR] Add Spark Connect page [spark-website]
MrPowers commented on PR #511: URL: https://github.com/apache/spark-website/pull/511#issuecomment-2033168687 @srowen - I rebased, good call. I could also break this up into two separate commits if that would help. Sorry for sending the massive PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
Re: [PR] Add Spark Connect page [spark-website]
srowen commented on PR #511: URL: https://github.com/apache/spark-website/pull/511#issuecomment-2033120138 For good measure, rebase this? I think this also include your change that was merged months ago at https://github.com/apache/spark-website/commit/a6ce63fb9c82dc8f25f42f377b487c0de2aff826 , so I suspect this branch is stale. It might merge, but would make it easier to review, a little bit -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
Re: [PR] Add Spark Connect page [spark-website]
MrPowers commented on PR #511: URL: https://github.com/apache/spark-website/pull/511#issuecomment-2033080398 Here's the main file that should be reviewed: https://github.com/apache/spark-website/pull/511/files#diff-65090e623c6aabb88b162b7e17e2a81dc359e84234e9fac9561340f7acd05431 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[PR] Add Spark Connect page [spark-website]
MrPowers opened a new pull request, #511: URL: https://github.com/apache/spark-website/pull/511 This PR adds a Spark Connect page. Sorry for the huge PR. This PR adds Spark Connect to the dropdown which apparently modifies a bunch of HTML files. Let me know if this is OK! https://github.com/apache/spark-website/assets/2722395/e1279758-e216-4123-acbc-1b244d9c63f6;> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47699][BUILD] Upgrade `gcs-connector` to 2.2.21 and add a note for 3.0.0
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d11d9cf729ab [SPARK-47699][BUILD] Upgrade `gcs-connector` to 2.2.21 and add a note for 3.0.0 d11d9cf729ab is described below commit d11d9cf729ab699c68770337d35043ebf58195cf Author: Dongjoon Hyun AuthorDate: Tue Apr 2 13:31:18 2024 -0700 [SPARK-47699][BUILD] Upgrade `gcs-connector` to 2.2.21 and add a note for 3.0.0 ### What changes were proposed in this pull request? This PR aims to upgrade `gcs-connector` to 2.2.21 and add a note for 3.0.0. ### Why are the changes needed? This PR aims to upgrade `gcs-connector` to bring the latest bug fixes. However, due to the following, we stick to use 2.2.21. - https://github.com/GoogleCloudDataproc/hadoop-connectors/issues/1114 - `gcs-connector` 2.2.21 has shaded Guava 32.1.2-jre. - https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/15c8ee41a15d6735442f36333f1d67792c93b9cf/pom.xml#L100 - `gcs-connector` 3.0.0 has shaded Guava 31.1-jre. - https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/667bf17291dbaa96a60f06df58c7a528bc4a8f79/pom.xml#L97 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually. ``` $ dev/make-distribution.sh -Phadoop-cloud $ cd dist $ export KEYFILE=~/.ssh/apache-spark.json $ export EMAIL=$(jq -r '.client_email' < $KEYFILE) $ export PRIVATE_KEY_ID=$(jq -r '.private_key_id' < $KEYFILE) $ export PRIVATE_KEY="$(jq -r '.private_key' < $KEYFILE)" $ bin/spark-shell \ -c spark.hadoop.fs.gs.auth.service.account.email=$EMAIL \ -c spark.hadoop.fs.gs.auth.service.account.private.key.id=$PRIVATE_KEY_ID \ -c spark.hadoop.fs.gs.auth.service.account.private.key="$PRIVATE_KEY" Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 4.0.0-SNAPSHOT /_/ Using Scala version 2.13.13 (OpenJDK 64-Bit Server VM, Java 21.0.2) Type in expressions to have them evaluated. Type :help for more information. {"ts":"2024-04-02T13:08:31.513-0700","level":"WARN","msg":"Unable to load native-hadoop library for your platform... using builtin-java classes where applicable","logger":"org.apache.hadoop.util.NativeCodeLoader"} Spark context Web UI available at http://localhost:4040 Spark context available as 'sc' (master = local[*], app id = local-1712088511841). Spark session available as 'spark'. scala> spark.read.text("gs://apache-spark-bucket/README.md").count() val res0: Long = 124 scala> spark.read.orc("examples/src/main/resources/users.orc").write.mode("overwrite").orc("gs://apache-spark-bucket/users.orc") scala> spark.read.orc("gs://apache-spark-bucket/users.orc").show() +--+--++ | name|favorite_color|favorite_numbers| +--+--++ |Alyssa| NULL| [3, 9, 15, 20]| | Ben| red| []| +--+--++ ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45824 from dongjoon-hyun/SPARK-47699. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- pom.xml | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index a564ec9f044a..c6913ceeff13 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -66,7 +66,7 @@ eclipse-collections-api/11.1.0//eclipse-collections-api-11.1.0.jar eclipse-collections/11.1.0//eclipse-collections-11.1.0.jar esdk-obs-java/3.20.4.2//esdk-obs-java-3.20.4.2.jar flatbuffers-java/23.5.26//flatbuffers-java-23.5.26.jar -gcs-connector/hadoop3-2.2.20/shaded/gcs-connector-hadoop3-2.2.20-shaded.jar +gcs-connector/hadoop3-2.2.21/shaded/gcs-connector-hadoop3-2.2.21-shaded.jar gmetric4j/1.0.10//gmetric4j-1.0.10.jar gson/2.2.4//gson-2.2.4.jar guava/14.0.1//guava-14.0.1.jar diff --git a/pom.xml b/pom.xml index b70d091796a5..ca949a05c81c 100644 --- a/pom.xml +++ b/pom.xml @@ -163,7 +163,8 @@ 2.24.6 0.12.8 -hadoop3-2.2.20 + +hadoop3-2.2.21 4.5.14 4.4.16 - To unsubscribe, e-mail:
(spark) branch master updated (db0975cb2a1c -> e6144e4d75b7)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from db0975cb2a1c [SPARK-47602][CORE] Resource managers: Migrate logError with variables to structured logging framework add e6144e4d75b7 [SPARK-47695][BUILD] Upgrade AWS SDK v2 to 2.24.6 No new revisions were added by this update. Summary of changes: dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- hadoop-cloud/pom.xml | 11 +++ pom.xml | 2 +- 3 files changed, 13 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (8d4e9647c971 -> db0975cb2a1c)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 8d4e9647c971 [SPARK-47684][SQL] Postgres: Map length unspecified bpchar to StringType add db0975cb2a1c [SPARK-47602][CORE] Resource managers: Migrate logError with variables to structured logging framework No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/internal/LogKey.scala | 14 ++- .../scala/org/apache/spark/internal/Logging.scala | 7 ++-- .../scala/org/apache/spark/util/MDCSuite.scala | 49 ++ .../apache/spark/util/PatternLoggingSuite.scala| 6 ++- .../apache/spark/util/StructuredLoggingSuite.scala | 40 -- .../cluster/k8s/ExecutorPodsAllocator.scala| 8 ++-- .../k8s/integrationtest/DepsTestsSuite.scala | 3 +- .../spark/deploy/yarn/ApplicationMaster.scala | 11 +++-- .../org/apache/spark/deploy/yarn/Client.scala | 5 ++- .../apache/spark/deploy/yarn/YarnAllocator.scala | 6 ++- .../cluster/YarnClientSchedulerBackend.scala | 7 ++-- 11 files changed, 133 insertions(+), 23 deletions(-) create mode 100644 common/utils/src/test/scala/org/apache/spark/util/MDCSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (ba98b7ae53f8 -> 8d4e9647c971)
This is an automated email from the ASF dual-hosted git repository. yao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from ba98b7ae53f8 [SPARK-47689][SQL] Do not wrap query execution error during data writing add 8d4e9647c971 [SPARK-47684][SQL] Postgres: Map length unspecified bpchar to StringType No new revisions were added by this update. Summary of changes: .../spark/sql/jdbc/PostgresIntegrationSuite.scala | 32 +++--- .../apache/spark/sql/jdbc/PostgresDialect.scala| 4 +++ 2 files changed, 14 insertions(+), 22 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated: [SPARK-47666][SQL][3.4] Fix NPE when reading mysql bit array as LongType
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 9f8eb54c7566 [SPARK-47666][SQL][3.4] Fix NPE when reading mysql bit array as LongType 9f8eb54c7566 is described below commit 9f8eb54c7566a2f99396b615d11f57c458e30936 Author: Kent Yao AuthorDate: Tue Apr 2 20:48:15 2024 +0800 [SPARK-47666][SQL][3.4] Fix NPE when reading mysql bit array as LongType ### What changes were proposed in this pull request? This PR fixes NPE when reading mysql bit array as LongType ### Why are the changes needed? bugfix ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #45793 from yaooqinn/PR_TOOL_PICK_PR_45790_BRANCH-3.4. Authored-by: Kent Yao Signed-off-by: Kent Yao --- .../apache/spark/sql/jdbc/MySQLIntegrationSuite.scala | 10 +- .../sql/execution/datasources/jdbc/JdbcUtils.scala | 18 ++ 2 files changed, 19 insertions(+), 9 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala index d0fcbfb7aaa8..883bb0ce7bab 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala @@ -61,6 +61,9 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { + "17, 7, 123456789, 123456789012345, 123456789012345.123456789012345, " + "42.75, 1.0002)").executeUpdate() +conn.prepareStatement("INSERT INTO numbers VALUES (null, null, " + + "null, null, null, null, null, null, null)").executeUpdate() + conn.prepareStatement("CREATE TABLE dates (d DATE, t TIME, dt DATETIME, ts TIMESTAMP, " + "yr YEAR)").executeUpdate() conn.prepareStatement("INSERT INTO dates VALUES ('1991-11-09', '13:31:24', " @@ -100,7 +103,7 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { test("Numeric types") { val df = sqlContext.read.jdbc(jdbcUrl, "numbers", new Properties) val rows = df.collect() -assert(rows.length == 1) +assert(rows.length == 2) val types = rows(0).toSeq.map(x => x.getClass.toString) assert(types.length == 9) assert(types(0).equals("class java.lang.Boolean")) @@ -205,6 +208,11 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { """.stripMargin.replaceAll("\n", " ")) assert(sql("select x, y from queryOption").collect.toSet == expectedResult) } + + test("SPARK-47666: Check nulls for result set getters") { +val nulls = spark.read.jdbc(jdbcUrl, "numbers", new Properties).tail(1).head +assert(nulls === Row(null, null, null, null, null, null, null, null, null)) + } } /** diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala index 5c76c6b1095b..90778c3053dd 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala @@ -426,14 +426,16 @@ object JdbcUtils extends Logging with SQLConfHelper { case LongType if metadata.contains("binarylong") => (rs: ResultSet, row: InternalRow, pos: Int) => -val bytes = rs.getBytes(pos + 1) -var ans = 0L -var j = 0 -while (j < bytes.length) { - ans = 256 * ans + (255 & bytes(j)) - j = j + 1 -} -row.setLong(pos, ans) +val l = nullSafeConvert[Array[Byte]](rs.getBytes(pos + 1), bytes => { + var ans = 0L + var j = 0 + while (j < bytes.length) { +ans = 256 * ans + (255 & bytes(j)) +j = j + 1 + } + ans +}) +row.update(pos, l) case LongType => (rs: ResultSet, row: InternalRow, pos: Int) => - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47689][SQL] Do not wrap query execution error during data writing
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ba98b7ae53f8 [SPARK-47689][SQL] Do not wrap query execution error during data writing ba98b7ae53f8 is described below commit ba98b7ae53f8b106cb0cfd8f0b3fe1a8068c1ce6 Author: Wenchen Fan AuthorDate: Tue Apr 2 20:34:32 2024 +0800 [SPARK-47689][SQL] Do not wrap query execution error during data writing ### What changes were proposed in this pull request? It's quite confusing to report `TASK_WRITE_FAILED` error when the error was caused by input query execution. This PR updates the error wrapping code to not wrap with `TASK_WRITE_FAILED` if the error was from input query execution. ### Why are the changes needed? better error reporting ### Does this PR introduce _any_ user-facing change? yes, now people won't see `TASK_WRITE_FAILED` error if the error was from input query. ### How was this patch tested? updated tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #45797 from cloud-fan/write-error. Authored-by: Wenchen Fan Signed-off-by: Wenchen Fan --- .../execution/datasources/FileFormatWriter.scala | 29 ++- .../apache/spark/sql/CharVarcharTestSuite.scala| 256 - .../sql/errors/QueryExecutionAnsiErrorsSuite.scala | 16 +- .../org/apache/spark/sql/sources/InsertSuite.scala | 40 ++-- .../spark/sql/HiveCharVarcharTestSuite.scala | 20 +- 5 files changed, 89 insertions(+), 272 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala index 9dbadbd97ec7..1df63aa14b4b 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala @@ -41,7 +41,7 @@ import org.apache.spark.sql.errors.QueryExecutionErrors import org.apache.spark.sql.execution.{ProjectExec, SortExec, SparkPlan, SQLExecution, UnsafeExternalRowSorter} import org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec import org.apache.spark.sql.internal.SQLConf -import org.apache.spark.util.{SerializableConfiguration, Utils} +import org.apache.spark.util.{NextIterator, SerializableConfiguration, Utils} import org.apache.spark.util.ArrayImplicits._ @@ -401,9 +401,10 @@ object FileFormatWriter extends Logging { } try { + val queryFailureCapturedIterator = new QueryFailureCapturedIterator(iterator) Utils.tryWithSafeFinallyAndFailureCallbacks(block = { // Execute the task to write rows out and commit the task. -dataWriter.writeWithIterator(iterator) +dataWriter.writeWithIterator(queryFailureCapturedIterator) dataWriter.commit() })(catchBlock = { // If there is an error, abort the task @@ -413,6 +414,8 @@ object FileFormatWriter extends Logging { dataWriter.close() }) } catch { + case e: QueryFailureDuringWrite => +throw e.queryFailure case e: FetchFailedException => throw e case f: FileAlreadyExistsException if SQLConf.get.fastFailFileFormatOutput => @@ -452,3 +455,25 @@ object FileFormatWriter extends Logging { } } } + +// A exception wrapper to indicate that the error was thrown when executing the query, not writing +// the data +private class QueryFailureDuringWrite(val queryFailure: Throwable) extends Throwable + +// An iterator wrapper to rethrow any error from the given iterator with `QueryFailureDuringWrite`. +private class QueryFailureCapturedIterator(data: Iterator[InternalRow]) + extends NextIterator[InternalRow] { + + override protected def getNext(): InternalRow = try { +if (data.hasNext) { + data.next() +} else { + finished = true + null +} + } catch { +case t: Throwable => throw new QueryFailureDuringWrite(t) + } + + override protected def close(): Unit = {} +} diff --git a/sql/core/src/test/scala/org/apache/spark/sql/CharVarcharTestSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/CharVarcharTestSuite.scala index 7e845e69c772..013177425da7 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/CharVarcharTestSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/CharVarcharTestSuite.scala @@ -17,7 +17,7 @@ package org.apache.spark.sql -import org.apache.spark.{SparkConf, SparkException, SparkRuntimeException} +import org.apache.spark.{SparkConf, SparkRuntimeException} import org.apache.spark.sql.catalyst.expressions.Attribute import org.apache.spark.sql.catalyst.parser.CatalystSqlParser import
(spark) branch master updated (03f4e45cd7e9 -> b1b1fde42bd8)
This is an automated email from the ASF dual-hosted git repository. yao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 03f4e45cd7e9 [SPARK-47685][SQL] Restore the support for `Stream` type in `Dataset#groupBy` add b1b1fde42bd8 Revert "[SPARK-47684][SQL] Postgres: Map length unspecified bpchar to StringType" No new revisions were added by this update. Summary of changes: .../spark/sql/jdbc/PostgresIntegrationSuite.scala | 34 +++--- .../apache/spark/sql/jdbc/PostgresDialect.scala| 4 --- 2 files changed, 23 insertions(+), 15 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-47666][SQL][3.5] Fix NPE when reading mysql bit array as LongType
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 2dda441bec01 [SPARK-47666][SQL][3.5] Fix NPE when reading mysql bit array as LongType 2dda441bec01 is described below commit 2dda441bec018b1386248b54ccfef46defa5a07a Author: Kent Yao AuthorDate: Tue Apr 2 18:16:35 2024 +0800 [SPARK-47666][SQL][3.5] Fix NPE when reading mysql bit array as LongType ### What changes were proposed in this pull request? This PR fixes NPE when reading mysql bit array as LongType ### Why are the changes needed? bugfix ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #45792 from yaooqinn/PR_TOOL_PICK_PR_45790_BRANCH-3.5. Authored-by: Kent Yao Signed-off-by: Kent Yao --- .../apache/spark/sql/jdbc/MySQLIntegrationSuite.scala | 10 +- .../sql/execution/datasources/jdbc/JdbcUtils.scala | 18 ++ 2 files changed, 19 insertions(+), 9 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala index 68d88fbc552a..bc7302163d9a 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala @@ -62,6 +62,9 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { + "17, 7, 123456789, 123456789012345, 123456789012345.123456789012345, " + "42.75, 1.0002, -128, 255)").executeUpdate() +conn.prepareStatement("INSERT INTO numbers VALUES (null, null, " + + "null, null, null, null, null, null, null, null, null)").executeUpdate() + conn.prepareStatement("CREATE TABLE dates (d DATE, t TIME, dt DATETIME, ts TIMESTAMP, " + "yr YEAR)").executeUpdate() conn.prepareStatement("INSERT INTO dates VALUES ('1991-11-09', '13:31:24', " @@ -101,7 +104,7 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { test("Numeric types") { val df = sqlContext.read.jdbc(jdbcUrl, "numbers", new Properties) val rows = df.collect() -assert(rows.length == 1) +assert(rows.length == 2) val types = rows(0).toSeq.map(x => x.getClass.toString) assert(types.length == 11) assert(types(0).equals("class java.lang.Boolean")) @@ -212,6 +215,11 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { """.stripMargin.replaceAll("\n", " ")) assert(sql("select x, y from queryOption").collect.toSet == expectedResult) } + + test("SPARK-47666: Check nulls for result set getters") { +val nulls = spark.read.jdbc(jdbcUrl, "numbers", new Properties).tail(1).head +assert(nulls === Row(null, null, null, null, null, null, null, null, null, null, null)) + } } /** diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala index 3521a50cd2dd..7b5c4cfc9b6e 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala @@ -430,14 +430,16 @@ object JdbcUtils extends Logging with SQLConfHelper { case LongType if metadata.contains("binarylong") => (rs: ResultSet, row: InternalRow, pos: Int) => -val bytes = rs.getBytes(pos + 1) -var ans = 0L -var j = 0 -while (j < bytes.length) { - ans = 256 * ans + (255 & bytes(j)) - j = j + 1 -} -row.setLong(pos, ans) +val l = nullSafeConvert[Array[Byte]](rs.getBytes(pos + 1), bytes => { + var ans = 0L + var j = 0 + while (j < bytes.length) { +ans = 256 * ans + (255 & bytes(j)) +j = j + 1 + } + ans +}) +row.update(pos, l) case LongType => (rs: ResultSet, row: InternalRow, pos: Int) => - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (a598f654066d -> 03f4e45cd7e9)
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from a598f654066d [SPARK-47664][PYTHON][CONNECT][TESTS][FOLLOW-UP] Add more tests add 03f4e45cd7e9 [SPARK-47685][SQL] Restore the support for `Stream` type in `Dataset#groupBy` No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/RelationalGroupedDataset.scala | 4 +++- .../test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala | 8 +++- 2 files changed, 10 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47664][PYTHON][CONNECT][TESTS][FOLLOW-UP] Add more tests
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a598f654066d [SPARK-47664][PYTHON][CONNECT][TESTS][FOLLOW-UP] Add more tests a598f654066d is described below commit a598f654066d79bb886c15db082f665eeaa77a55 Author: Ruifeng Zheng AuthorDate: Tue Apr 2 16:33:32 2024 +0800 [SPARK-47664][PYTHON][CONNECT][TESTS][FOLLOW-UP] Add more tests ### What changes were proposed in this pull request? Add more tests ### Why are the changes needed? for test coverage, to address https://github.com/apache/spark/pull/45788#discussion_r1546663788 ### Does this PR introduce _any_ user-facing change? no, test only ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #45809 from zhengruifeng/col_name_val_test. Authored-by: Ruifeng Zheng Signed-off-by: Ruifeng Zheng --- .../sql/tests/connect/test_connect_basic.py| 47 +- 1 file changed, 46 insertions(+), 1 deletion(-) diff --git a/python/pyspark/sql/tests/connect/test_connect_basic.py b/python/pyspark/sql/tests/connect/test_connect_basic.py index 786b9e2896c2..3b8e8165b4bf 100755 --- a/python/pyspark/sql/tests/connect/test_connect_basic.py +++ b/python/pyspark/sql/tests/connect/test_connect_basic.py @@ -1219,6 +1219,11 @@ class SparkConnectBasicTests(SparkConnectSQLTestCase): self.assert_eq(parse_attr_name("`a"), None) self.assert_eq(parse_attr_name("a`"), None) +self.assert_eq(parse_attr_name("`a`.b"), ["a", "b"]) +self.assert_eq(parse_attr_name("`a`.`b`"), ["a", "b"]) +self.assert_eq(parse_attr_name("`a```.b"), ["a`", "b"]) +self.assert_eq(parse_attr_name("`a``.b"), None) + self.assert_eq(parse_attr_name("a.b.c"), ["a", "b", "c"]) self.assert_eq(parse_attr_name("`a`.`b`.`c`"), ["a", "b", "c"]) self.assert_eq(parse_attr_name("a.`b`.c"), ["a", "b", "c"]) @@ -1284,7 +1289,6 @@ class SparkConnectBasicTests(SparkConnectSQLTestCase): self.assertTrue(verify_col_name("m.`s`.id", cdf.schema)) self.assertTrue(verify_col_name("`m`.`s`.`id`", cdf.schema)) self.assertFalse(verify_col_name("m.`s.id`", cdf.schema)) -self.assertFalse(verify_col_name("m.`s.id`", cdf.schema)) self.assertTrue(verify_col_name("a", cdf.schema)) self.assertTrue(verify_col_name("`a`", cdf.schema)) @@ -1294,6 +1298,47 @@ class SparkConnectBasicTests(SparkConnectSQLTestCase): self.assertTrue(verify_col_name("`a`.`v`", cdf.schema)) self.assertFalse(verify_col_name("`a`.`x`", cdf.schema)) +cdf = ( +self.connect.range(10) +.withColumn("v", CF.lit(123)) +.withColumn("s.s", CF.struct("id", "v")) +.withColumn("m`", CF.struct("`s.s`", "v")) +) + +# root +# |-- id: long (nullable = false) +# |-- v: string (nullable = false) +# |-- s.s: struct (nullable = false) +# ||-- id: long (nullable = false) +# ||-- v: string (nullable = false) +# |-- m`: struct (nullable = false) +# ||-- s.s: struct (nullable = false) +# |||-- id: long (nullable = false) +# |||-- v: string (nullable = false) +# ||-- v: string (nullable = false) + +self.assertFalse(verify_col_name("s", cdf.schema)) +self.assertFalse(verify_col_name("`s`", cdf.schema)) +self.assertFalse(verify_col_name("s.s", cdf.schema)) +self.assertFalse(verify_col_name("s.`s`", cdf.schema)) +self.assertFalse(verify_col_name("`s`.s", cdf.schema)) +self.assertTrue(verify_col_name("`s.s`", cdf.schema)) + +self.assertFalse(verify_col_name("m", cdf.schema)) +self.assertFalse(verify_col_name("`m`", cdf.schema)) +self.assertTrue(verify_col_name("`m```", cdf.schema)) + +self.assertFalse(verify_col_name("`m```.s", cdf.schema)) +self.assertFalse(verify_col_name("`m```.`s`", cdf.schema)) +self.assertFalse(verify_col_name("`m```.s.s", cdf.schema)) +self.assertFalse(verify_col_name("`m```.s.`s`", cdf.schema)) +self.assertTrue(verify_col_name("`m```.`s.s`", cdf.schema)) + +self.assertFalse(verify_col_name("`m```.s.s.v", cdf.schema)) +self.assertFalse(verify_col_name("`m```.s.`s`.v", cdf.schema)) +self.assertTrue(verify_col_name("`m```.`s.s`.v", cdf.schema)) +self.assertTrue(verify_col_name("`m```.`s.s`.`v`", cdf.schema)) + if __name__ == "__main__": from pyspark.sql.tests.connect.test_connect_basic import * # noqa: F401 - To unsubscribe, e-mail:
(spark) branch master updated (e2e6e09a24db -> 22771a6a87e7)
This is an automated email from the ASF dual-hosted git repository. yao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from e2e6e09a24db [SPARK-47634][SQL] Add legacy support for disabling map key normalization add 22771a6a87e7 [SPARK-47684][SQL] Postgres: Map length unspecified bpchar to StringType No new revisions were added by this update. Summary of changes: .../spark/sql/jdbc/PostgresIntegrationSuite.scala | 34 +++--- .../apache/spark/sql/jdbc/PostgresDialect.scala| 4 +++ 2 files changed, 15 insertions(+), 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47634][SQL] Add legacy support for disabling map key normalization
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e2e6e09a24db [SPARK-47634][SQL] Add legacy support for disabling map key normalization e2e6e09a24db is described below commit e2e6e09a24dbb828b55ee35f60da22c3df4ac2c5 Author: Stevo Mitric AuthorDate: Tue Apr 2 16:28:27 2024 +0800 [SPARK-47634][SQL] Add legacy support for disabling map key normalization ### What changes were proposed in this pull request? Added a `DISABLE_MAP_KEY_NORMALIZATION` option in `SQLConf` to allow for legacy creation of a map without key normalization (keys `0.0` and `-0.0`) in `ArrayBasedMapBuilder`. ### Why are the changes needed? As a legacy fallback option. ### Does this PR introduce _any_ user-facing change? New `DISABLE_MAP_KEY_NORMALIZATION` config option. ### How was this patch tested? New UT proposed in this PR ### Was this patch authored or co-authored using generative AI tooling? No Closes #45760 from stevomitric/stevomitric/normalize-conf. Authored-by: Stevo Mitric Signed-off-by: Wenchen Fan --- docs/sql-migration-guide.md | 1 + .../spark/sql/catalyst/util/ArrayBasedMapBuilder.scala | 12 +++- .../main/scala/org/apache/spark/sql/internal/SQLConf.scala | 10 ++ .../spark/sql/catalyst/util/ArrayBasedMapBuilderSuite.scala | 10 ++ 4 files changed, 28 insertions(+), 5 deletions(-) diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index b22665487c7b..13d6702c4cf9 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -24,6 +24,7 @@ license: | ## Upgrading from Spark SQL 3.5 to 4.0 +- Since Spark 4.0, the default behaviour when inserting elements in a map is changed to first normalize keys -0.0 to 0.0. The affected SQL functions are `create_map`, `map_from_arrays`, `map_from_entries`, and `map_concat`. To restore the previous behaviour, set `spark.sql.legacy.disableMapKeyNormalization` to `true`. - Since Spark 4.0, the default value of `spark.sql.maxSinglePartitionBytes` is changed from `Long.MaxValue` to `128m`. To restore the previous behavior, set `spark.sql.maxSinglePartitionBytes` to `9223372036854775807`(`Long.MaxValue`). - Since Spark 4.0, any read of SQL tables takes into consideration the SQL configs `spark.sql.files.ignoreCorruptFiles`/`spark.sql.files.ignoreMissingFiles` instead of the core config `spark.files.ignoreCorruptFiles`/`spark.files.ignoreMissingFiles`. - Since Spark 4.0, `spark.sql.hive.metastore` drops the support of Hive prior to 2.0.0 as they require JDK 8 that Spark does not support anymore. Users should migrate to higher versions. diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapBuilder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapBuilder.scala index d13c3c6026a2..a2d41ebf04e1 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapBuilder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapBuilder.scala @@ -53,11 +53,13 @@ class ArrayBasedMapBuilder(keyType: DataType, valueType: DataType) extends Seria private val mapKeyDedupPolicy = SQLConf.get.getConf(SQLConf.MAP_KEY_DEDUP_POLICY) - private lazy val keyNormalizer: Any => Any = keyType match { -case FloatType => NormalizeFloatingNumbers.FLOAT_NORMALIZER -case DoubleType => NormalizeFloatingNumbers.DOUBLE_NORMALIZER -case _ => identity - } + private lazy val keyNormalizer: Any => Any = +(SQLConf.get.getConf(SQLConf.DISABLE_MAP_KEY_NORMALIZATION), keyType) match { + case (false, FloatType) => NormalizeFloatingNumbers.FLOAT_NORMALIZER + case (false, DoubleType) => NormalizeFloatingNumbers.DOUBLE_NORMALIZER + case _ => identity +} + def put(key: Any, value: Any): Unit = { if (key == null) { diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index fc7425ce2bea..af4498274620 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -3247,6 +3247,16 @@ object SQLConf { .stringConf .createWithDefault("") + val DISABLE_MAP_KEY_NORMALIZATION = +buildConf("spark.sql.legacy.disableMapKeyNormalization") + .internal() + .doc("Disables key normalization when creating a map with `ArrayBasedMapBuilder`. When " + +"set to `true` it will prevent key normalization when building a map, which will " + +"allow for values such as `-0.0` and `0.0` to be present as distinct
(spark) branch master updated: [SPARK-47663][CORE][TESTS] add end to end test for task limiting according to different cpu and gpu configurations
This is an automated email from the ASF dual-hosted git repository. weichenxu123 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new eb9b12692601 [SPARK-47663][CORE][TESTS] add end to end test for task limiting according to different cpu and gpu configurations eb9b12692601 is described below commit eb9b126926016e0156b1d40b1d7ed33d4705d2bb Author: Bobby Wang AuthorDate: Tue Apr 2 15:30:10 2024 +0800 [SPARK-47663][CORE][TESTS] add end to end test for task limiting according to different cpu and gpu configurations ### What changes were proposed in this pull request? Add an end-to-end unit test to ensure that the number of tasks is calculated correctly according to the different task CPU amound and task GPU amount. ### Why are the changes needed? To increase the test coverage. More details can be found at https://github.com/apache/spark/pull/45528#discussion_r1545905575 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? The CI can pass. ### Was this patch authored or co-authored using generative AI tooling? No Closes #45794 from wbo4958/end2end-test. Authored-by: Bobby Wang Signed-off-by: Weichen Xu --- .../CoarseGrainedSchedulerBackendSuite.scala | 47 ++ 1 file changed, 47 insertions(+) diff --git a/core/src/test/scala/org/apache/spark/scheduler/CoarseGrainedSchedulerBackendSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/CoarseGrainedSchedulerBackendSuite.scala index 6e94f9abe67b..a75f470deec3 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/CoarseGrainedSchedulerBackendSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/CoarseGrainedSchedulerBackendSuite.scala @@ -179,6 +179,53 @@ class CoarseGrainedSchedulerBackendSuite extends SparkFunSuite with LocalSparkCo } } + // Every item corresponds to (CPU resources per task, GPU resources per task, + // and the GPU addresses assigned to all tasks). + Seq( +(1, 1, Array(Array("0"), Array("1"), Array("2"), Array("3"))), +(1, 2, Array(Array("0", "1"), Array("2", "3"))), +(1, 4, Array(Array("0", "1", "2", "3"))), +(2, 1, Array(Array("0"), Array("1"))), +(4, 1, Array(Array("0"))), +(4, 2, Array(Array("0", "1"))), +(2, 2, Array(Array("0", "1"), Array("2", "3"))), +(4, 4, Array(Array("0", "1", "2", "3"))), +(1, 3, Array(Array("0", "1", "2"))), +(3, 1, Array(Array("0"))) + ).foreach { case (taskCpus, taskGpus, expectedGpuAddresses) => +test(s"SPARK-47663 end to end test validating if task cpus:${taskCpus} and " + + s"task gpus: ${taskGpus} works") { + withTempDir { dir => +val discoveryScript = createTempScriptWithExpectedOutput( + dir, "gpuDiscoveryScript", """{"name": "gpu","addresses":["0", "1", "2", "3"]}""") +val conf = new SparkConf() + .set(CPUS_PER_TASK, taskCpus) + .setMaster("local-cluster[1, 4, 1024]") + .setAppName("test") + .set(WORKER_GPU_ID.amountConf, "4") + .set(WORKER_GPU_ID.discoveryScriptConf, discoveryScript) + .set(EXECUTOR_GPU_ID.amountConf, "4") + .set(TASK_GPU_ID.amountConf, taskGpus.toString) + +sc = new SparkContext(conf) +eventually(timeout(executorUpTimeout)) { + // Ensure all executors have been launched. + assert(sc.getExecutorIds().length == 1) +} + +val numPartitions = Seq(4 / taskCpus, 4 / taskGpus).min +val ret = sc.parallelize(1 to 20, numPartitions).mapPartitions { _ => + val tc = TaskContext.get() + assert(tc.cpus() == taskCpus) + val gpus = tc.resources()("gpu").addresses + Iterator.single(gpus) +}.collect() + +assert(ret === expectedGpuAddresses) + } +} + } + // Here we just have test for one happy case instead of all cases: other cases are covered in // FsHistoryProviderSuite. test("custom log url for Spark UI is applied") { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47686][SQL][TESTS] Use `=!=` instead of `!==` in `JoinHintSuite`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 00162b82fe3c [SPARK-47686][SQL][TESTS] Use `=!=` instead of `!==` in `JoinHintSuite` 00162b82fe3c is described below commit 00162b82fe3c48e26303394d1a91d026fe8d9b4c Author: yangjie01 AuthorDate: Mon Apr 1 23:45:00 2024 -0700 [SPARK-47686][SQL][TESTS] Use `=!=` instead of `!==` in `JoinHintSuite` ### What changes were proposed in this pull request? This pr use `=!=` instead of `!==` in `JoinHintSuite`. `!==` is a deprecated API since 2.0.0, and its test already exists in `DeprecatedAPISuite`. ### Why are the changes needed? Clean up deprecated API usage. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #45812 from LuciferYang/SPARK-47686. Authored-by: yangjie01 Signed-off-by: Dongjoon Hyun --- sql/core/src/test/scala/org/apache/spark/sql/JoinHintSuite.scala | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/JoinHintSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/JoinHintSuite.scala index 66746263ff90..53e47f428c3a 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/JoinHintSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/JoinHintSuite.scala @@ -695,11 +695,11 @@ class JoinHintSuite extends PlanTest with SharedSparkSession with AdaptiveSparkP val hintAppender = new LogAppender(s"join hint check for equi-join") withLogAppender(hintAppender, level = Some(Level.WARN)) { assertBroadcastNLJoin( -df1.hint("SHUFFLE_HASH").join(df2, $"a1" !== $"b1"), BuildRight) +df1.hint("SHUFFLE_HASH").join(df2, $"a1" =!= $"b1"), BuildRight) } withLogAppender(hintAppender, level = Some(Level.WARN)) { assertBroadcastNLJoin( -df1.join(df2.hint("MERGE"), $"a1" !== $"b1"), BuildRight) +df1.join(df2.hint("MERGE"), $"a1" =!= $"b1"), BuildRight) } val logs = hintAppender.loggingEvents.map(_.getMessage.getFormattedMessage) .filter(_.contains("is not supported in the query:")) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org