[spark] branch branch-3.5 updated: [SPARK-45291][SQL][REST] Use unknown query execution id instead of no such app when id is invalid
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 94661758c30 [SPARK-45291][SQL][REST] Use unknown query execution id instead of no such app when id is invalid 94661758c30 is described below commit 94661758c3072a279a29d0c493ce419af0414d3a Author: Kent Yao AuthorDate: Mon Sep 25 14:23:46 2023 +0800 [SPARK-45291][SQL][REST] Use unknown query execution id instead of no such app when id is invalid ### What changes were proposed in this pull request? This PR fixes `/api/v1/applications/{appId}/sql/{executionId}` API when the executionId is invalid. Before this, we get `no such app: $appId`; after this, we get `unknown query execution id: $executionId` ### Why are the changes needed? bugfix ### Does this PR introduce _any_ user-facing change? no, bugfix ### How was this patch tested? new test ### Was this patch authored or co-authored using generative AI tooling? no Closes #43073 from yaooqinn/SPARK-45291. Authored-by: Kent Yao Signed-off-by: Kent Yao (cherry picked from commit 5d422155f1dae09f1631375d09e2f3c8dffba9a5) Signed-off-by: Kent Yao --- .../scala/org/apache/spark/status/api/v1/sql/SqlResource.scala | 3 +-- .../status/api/v1/sql/SqlResourceWithActualMetricsSuite.scala| 9 + 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/status/api/v1/sql/SqlResource.scala b/sql/core/src/main/scala/org/apache/spark/status/api/v1/sql/SqlResource.scala index 3c96f612da6..fa5bea5f9bb 100644 --- a/sql/core/src/main/scala/org/apache/spark/status/api/v1/sql/SqlResource.scala +++ b/sql/core/src/main/scala/org/apache/spark/status/api/v1/sql/SqlResource.scala @@ -56,10 +56,9 @@ private[v1] class SqlResource extends BaseAppResource { planDescription: Boolean): ExecutionData = { withUI { ui => val sqlStore = new SQLAppStatusStore(ui.store.store) - val graph = sqlStore.planGraph(execId) sqlStore .execution(execId) -.map(prepareExecutionData(_, graph, details, planDescription)) +.map(prepareExecutionData(_, sqlStore.planGraph(execId), details, planDescription)) .getOrElse(throw new NotFoundException("unknown query execution id: " + execId)) } } diff --git a/sql/core/src/test/scala/org/apache/spark/status/api/v1/sql/SqlResourceWithActualMetricsSuite.scala b/sql/core/src/test/scala/org/apache/spark/status/api/v1/sql/SqlResourceWithActualMetricsSuite.scala index 658f79fc289..c63c748953f 100644 --- a/sql/core/src/test/scala/org/apache/spark/status/api/v1/sql/SqlResourceWithActualMetricsSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/status/api/v1/sql/SqlResourceWithActualMetricsSuite.scala @@ -19,6 +19,7 @@ package org.apache.spark.status.api.v1.sql import java.net.URL import java.text.SimpleDateFormat +import javax.servlet.http.HttpServletResponse import org.json4s.DefaultFormats import org.json4s.jackson.JsonMethods @@ -148,4 +149,12 @@ class SqlResourceWithActualMetricsSuite } } + test("SPARK-45291: Use unknown query execution id instead of no such app when id is invalid") { +val url = new URL(spark.sparkContext.ui.get.webUrl + + s"/api/v1/applications/${spark.sparkContext.applicationId}/sql/${Long.MaxValue}") +val (code, resultOpt, error) = getContentAndCode(url) +assert(code === HttpServletResponse.SC_NOT_FOUND) +assert(resultOpt.isEmpty) +assert(error.get === s"unknown query execution id: ${Long.MaxValue}") + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-45291][SQL][REST] Use unknown query execution id instead of no such app when id is invalid
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 5d422155f1d [SPARK-45291][SQL][REST] Use unknown query execution id instead of no such app when id is invalid 5d422155f1d is described below commit 5d422155f1dae09f1631375d09e2f3c8dffba9a5 Author: Kent Yao AuthorDate: Mon Sep 25 14:23:46 2023 +0800 [SPARK-45291][SQL][REST] Use unknown query execution id instead of no such app when id is invalid ### What changes were proposed in this pull request? This PR fixes `/api/v1/applications/{appId}/sql/{executionId}` API when the executionId is invalid. Before this, we get `no such app: $appId`; after this, we get `unknown query execution id: $executionId` ### Why are the changes needed? bugfix ### Does this PR introduce _any_ user-facing change? no, bugfix ### How was this patch tested? new test ### Was this patch authored or co-authored using generative AI tooling? no Closes #43073 from yaooqinn/SPARK-45291. Authored-by: Kent Yao Signed-off-by: Kent Yao --- .../scala/org/apache/spark/status/api/v1/sql/SqlResource.scala | 3 +-- .../status/api/v1/sql/SqlResourceWithActualMetricsSuite.scala| 9 + 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/status/api/v1/sql/SqlResource.scala b/sql/core/src/main/scala/org/apache/spark/status/api/v1/sql/SqlResource.scala index 3c96f612da6..fa5bea5f9bb 100644 --- a/sql/core/src/main/scala/org/apache/spark/status/api/v1/sql/SqlResource.scala +++ b/sql/core/src/main/scala/org/apache/spark/status/api/v1/sql/SqlResource.scala @@ -56,10 +56,9 @@ private[v1] class SqlResource extends BaseAppResource { planDescription: Boolean): ExecutionData = { withUI { ui => val sqlStore = new SQLAppStatusStore(ui.store.store) - val graph = sqlStore.planGraph(execId) sqlStore .execution(execId) -.map(prepareExecutionData(_, graph, details, planDescription)) +.map(prepareExecutionData(_, sqlStore.planGraph(execId), details, planDescription)) .getOrElse(throw new NotFoundException("unknown query execution id: " + execId)) } } diff --git a/sql/core/src/test/scala/org/apache/spark/status/api/v1/sql/SqlResourceWithActualMetricsSuite.scala b/sql/core/src/test/scala/org/apache/spark/status/api/v1/sql/SqlResourceWithActualMetricsSuite.scala index 658f79fc289..c63c748953f 100644 --- a/sql/core/src/test/scala/org/apache/spark/status/api/v1/sql/SqlResourceWithActualMetricsSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/status/api/v1/sql/SqlResourceWithActualMetricsSuite.scala @@ -19,6 +19,7 @@ package org.apache.spark.status.api.v1.sql import java.net.URL import java.text.SimpleDateFormat +import javax.servlet.http.HttpServletResponse import org.json4s.DefaultFormats import org.json4s.jackson.JsonMethods @@ -148,4 +149,12 @@ class SqlResourceWithActualMetricsSuite } } + test("SPARK-45291: Use unknown query execution id instead of no such app when id is invalid") { +val url = new URL(spark.sparkContext.ui.get.webUrl + + s"/api/v1/applications/${spark.sparkContext.applicationId}/sql/${Long.MaxValue}") +val (code, resultOpt, error) = getContentAndCode(url) +assert(code === HttpServletResponse.SC_NOT_FOUND) +assert(resultOpt.isEmpty) +assert(error.get === s"unknown query execution id: ${Long.MaxValue}") + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new fb2bee37c96 [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0 fb2bee37c96 is described below commit fb2bee37c964bf2164fc89a0a55085dd0c840b56 Author: zhyhimont AuthorDate: Mon Sep 25 15:22:32 2023 +0900 [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0 ### What changes were proposed in this pull request? Support `isocalendar` from the pandas 2.0.0 ### Why are the changes needed? When pandas 2.0.0 is released, we should match the behavior in pandas API on Spark. ### Does this PR introduce _any_ user-facing change? Added new method `DatetimeIndex.isocalendar` and removed two depreceted `DatetimeIndex.week` and `DatetimeIndex.weekofyear` ``` dfs = ps.from_pandas(pd.date_range(start='2019-12-29', freq='D', periods=4).to_series()) dfs.dt.isocalendar() year week day 2019-12-29 2019527 2019-12-30 2020 11 2019-12-31 2020 12 2020-01-01 2020 13 dfs.dt.isocalendar().week 2019-12-2952 2019-12-30 1 2019-12-31 1 2020-01-01 1 ``` ### How was this patch tested? UT was updated Closes #40420 from dzhigimont/SPARK-42617_ZH. Lead-authored-by: zhyhimont Co-authored-by: Zhyhimont Dmitry Co-authored-by: Dmitry Zhyhimont Co-authored-by: Zhyhimont Dmitry Signed-off-by: Hyukjin Kwon --- .../source/reference/pyspark.pandas/indexing.rst | 3 +- .../source/reference/pyspark.pandas/series.rst | 3 +- python/pyspark/pandas/datetimes.py | 70 -- python/pyspark/pandas/indexes/base.py | 4 +- python/pyspark/pandas/indexes/datetimes.py | 49 +-- python/pyspark/pandas/namespace.py | 3 +- .../pyspark/pandas/tests/indexes/test_datetime.py | 28 ++--- .../pandas/tests/indexes/test_datetime_property.py | 19 +- .../pyspark/pandas/tests/test_series_datetime.py | 17 +- 9 files changed, 100 insertions(+), 96 deletions(-) diff --git a/python/docs/source/reference/pyspark.pandas/indexing.rst b/python/docs/source/reference/pyspark.pandas/indexing.rst index 70d463c052a..d6be57ee9c8 100644 --- a/python/docs/source/reference/pyspark.pandas/indexing.rst +++ b/python/docs/source/reference/pyspark.pandas/indexing.rst @@ -338,8 +338,7 @@ Time/date components DatetimeIndex.minute DatetimeIndex.second DatetimeIndex.microsecond - DatetimeIndex.week - DatetimeIndex.weekofyear + DatetimeIndex.isocalendar DatetimeIndex.dayofweek DatetimeIndex.day_of_week DatetimeIndex.weekday diff --git a/python/docs/source/reference/pyspark.pandas/series.rst b/python/docs/source/reference/pyspark.pandas/series.rst index 552acec096f..7b658d45d4b 100644 --- a/python/docs/source/reference/pyspark.pandas/series.rst +++ b/python/docs/source/reference/pyspark.pandas/series.rst @@ -313,8 +313,7 @@ Datetime Properties Series.dt.minute Series.dt.second Series.dt.microsecond - Series.dt.week - Series.dt.weekofyear + Series.dt.isocalendar Series.dt.dayofweek Series.dt.weekday Series.dt.dayofyear diff --git a/python/pyspark/pandas/datetimes.py b/python/pyspark/pandas/datetimes.py index b0649cf5761..4b6e23fae7a 100644 --- a/python/pyspark/pandas/datetimes.py +++ b/python/pyspark/pandas/datetimes.py @@ -18,7 +18,6 @@ """ Date/Time related functions on pandas-on-Spark Series """ -import warnings from typing import Any, Optional, Union, no_type_check import numpy as np @@ -27,7 +26,9 @@ from pandas.tseries.offsets import DateOffset import pyspark.pandas as ps import pyspark.sql.functions as F -from pyspark.sql.types import DateType, TimestampType, TimestampNTZType, LongType, IntegerType +from pyspark.sql.types import DateType, TimestampType, TimestampNTZType, IntegerType +from pyspark.pandas import DataFrame +from pyspark.pandas.config import option_context class DatetimeMethods: @@ -116,26 +117,59 @@ class DatetimeMethods: def nanosecond(self) -> "ps.Series": raise NotImplementedError() -# TODO(SPARK-42617): Support isocalendar.week and replace it. -# See also https://github.com/pandas-dev/pandas/pull/33595. -@property -def week(self) -> "ps.Series": +def isocalendar(self) -> "ps.DataFrame": """ -The week ordinal of the year. +Calculate year, week, and day according to the ISO 8601 standard. -.. deprecated:: 3.4.0 -""" -warnings.warn( -"weekofyear and week have been deprecated.", -FutureWarning, -) -
[GitHub] [spark-website] panbingkun commented on pull request #474: [SPARK-44820][DOCS] Switch languages consistently across docs for all code snippets
panbingkun commented on PR #474: URL: https://github.com/apache/spark-website/pull/474#issuecomment-1732813770 > @panbingkun yes let's update the spark website (this repo) to fix this UI issue for published docs. Okay, let me to fix it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f81f51467b8 -> bb0d287114f)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from f81f51467b8 [SPARK-45257][CORE][FOLLOWUP] Correct the from version in migration guide add bb0d287114f [SPARK-45294][PYTHON][DOCS] Use JDK 17 in Binder integration for PySpark live notebooks No new revisions were added by this update. Summary of changes: binder/apt.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-45257][CORE][FOLLOWUP] Correct the from version in migration guide
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f81f51467b8 [SPARK-45257][CORE][FOLLOWUP] Correct the from version in migration guide f81f51467b8 is described below commit f81f51467b85779086873860d5bac0d5429c9a29 Author: Cheng Pan AuthorDate: Mon Sep 25 09:37:01 2023 +0800 [SPARK-45257][CORE][FOLLOWUP] Correct the from version in migration guide ### What changes were proposed in this pull request? Correct the from version in migration guide ### Why are the changes needed? Address comments https://github.com/apache/spark/commit/8d599972872225e336467700715b1d4771624efe#r128053622 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Review ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43072 from pan3793/SPARK-45257-followup. Authored-by: Cheng Pan Signed-off-by: Kent Yao --- docs/core-migration-guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/core-migration-guide.md b/docs/core-migration-guide.md index 765c3494f66..2464d774240 100644 --- a/docs/core-migration-guide.md +++ b/docs/core-migration-guide.md @@ -22,7 +22,7 @@ license: | * Table of contents {:toc} -## Upgrading from Core 3.4 to 4.0 +## Upgrading from Core 3.5 to 4.0 - Since Spark 4.0, Spark will compress event logs. To restore the behavior before Spark 4.0, you can set `spark.eventLog.compress` to `false`. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-45240][SQL][CONNECT] Implement Error Enrichment for Python Client
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 913991046c6 [SPARK-45240][SQL][CONNECT] Implement Error Enrichment for Python Client 913991046c6 is described below commit 913991046c6d2b707eab64bd8ca874f9b9bb6581 Author: Yihong He AuthorDate: Mon Sep 25 09:35:06 2023 +0900 [SPARK-45240][SQL][CONNECT] Implement Error Enrichment for Python Client ### What changes were proposed in this pull request? - Implemented the reconstruction of the exception with un-truncated error messages and full server-side stacktrace (includes cause exceptions) based on the responses of FetchErrorDetails RPC. Examples: `./bin/pyspark --remote local` ```python >>> spark.sql("""select from_json('{"d": "02-29"}', 'd date', map('dateFormat', 'MM-dd'))""").collect() Traceback (most recent call last): File "", line 1, in File "/Users/yihonghe/Workspace/spark/python/pyspark/sql/connect/session.py", line 556, in sql data, properties = self.client.execute_command(cmd.command(self._client)) File "/Users/yihonghe/Workspace/spark/python/pyspark/sql/connect/client/core.py", line 958, in execute_command data, _, _, _, properties = self._execute_and_fetch(req) File "/Users/yihonghe/Workspace/spark/python/pyspark/sql/connect/client/core.py", line 1259, in _execute_and_fetch for response in self._execute_and_fetch_as_iterator(req): File "/Users/yihonghe/Workspace/spark/python/pyspark/sql/connect/client/core.py", line 1240, in _execute_and_fetch_as_iterator self._handle_error(error) File "/Users/yihonghe/Workspace/spark/python/pyspark/sql/connect/client/core.py", line 1479, in _handle_error self._handle_rpc_error(error) File "/Users/yihonghe/Workspace/spark/python/pyspark/sql/connect/client/core.py", line 1533, in _handle_rpc_error raise convert_exception( pyspark.errors.exceptions.connect.SparkUpgradeException: [INCONSISTENT_BEHAVIOR_CROSS_VERSION.PARSE_DATETIME_BY_NEW_PARSER] You may get a different result due to the upgrading to Spark >= 3.0: Fail to parse '02-29' in the new parser. You can set "spark.sql.legacy.timeParserPolicy" to "LEGACY" to restore the behavior before Spark 3.0, or set to "CORRECTED" and treat it as an invalid datetime string. JVM stacktrace: org.apache.spark.SparkUpgradeException: [INCONSISTENT_BEHAVIOR_CROSS_VERSION.PARSE_DATETIME_BY_NEW_PARSER] You may get a different result due to the upgrading to Spark >= 3.0: Fail to parse '02-29' in the new parser. You can set "spark.sql.legacy.timeParserPolicy" to "LEGACY" to restore the behavior before Spark 3.0, or set to "CORRECTED" and treat it as an invalid datetime string. at org.apache.spark.sql.errors.ExecutionErrors.failToParseDateTimeInNewParserError(ExecutionErrors.scala:54) at org.apache.spark.sql.errors.ExecutionErrors.failToParseDateTimeInNewParserError$(ExecutionErrors.scala:48) at org.apache.spark.sql.errors.ExecutionErrors$.failToParseDateTimeInNewParserError(ExecutionErrors.scala:218) at org.apache.spark.sql.catalyst.util.DateTimeFormatterHelper$$anonfun$checkParsedDiff$1.applyOrElse(DateTimeFormatterHelper.scala:142) at org.apache.spark.sql.catalyst.util.DateTimeFormatterHelper$$anonfun$checkParsedDiff$1.applyOrElse(DateTimeFormatterHelper.scala:135) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:35) at org.apache.spark.sql.catalyst.util.Iso8601DateFormatter.parse(DateFormatter.scala:59) at org.apache.spark.sql.catalyst.json.JacksonParser$$anonfun$$nestedInanonfun$makeConverter$11$1.applyOrElse(JacksonParser.scala:302) at org.apache.spark.sql.catalyst.json.JacksonParser$$anonfun$$nestedInanonfun$makeConverter$11$1.applyOrElse(JacksonParser.scala:299) at org.apache.spark.sql.catalyst.json.JacksonParser.parseJsonToken(JacksonParser.scala:404) at org.apache.spark.sql.catalyst.json.JacksonParser.$anonfun$makeConverter$11(JacksonParser.scala:299) at org.apache.spark.sql.catalyst.json.JacksonParser.org$apache$spark$sql$catalyst$json$JacksonParser$$convertObject(JacksonParser.scala:457) at org.apache.spark.sql.catalyst.json.JacksonParser$$anonfun$$nestedInanonfun$makeStructRootConverter$3$1.applyOrElse(JacksonParser.scala:123) at org.apache.spark.sql.catalyst.json.JacksonParser$$anonfun$$nestedInanonfun$makeStructRootConverter$3$1.applyOrElse(JacksonParser.scala:122) at org.apache.spark.sql.catalyst.json.JacksonParser.parseJsonToken(JacksonParser.scala:404) at org.apache.spark.sql.catalyst.json.JacksonParser.$anonfun$makeStructRootConverter$3(Jackson
[spark] branch master updated: [SPARK-45207][SQL][CONNECT] Implement Error Enrichment for Scala Client
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4863be5632f [SPARK-45207][SQL][CONNECT] Implement Error Enrichment for Scala Client 4863be5632f is described below commit 4863be5632f3165a5699a525235ea118c1e1f7eb Author: Yihong He AuthorDate: Mon Sep 25 09:35:33 2023 +0900 [SPARK-45207][SQL][CONNECT] Implement Error Enrichment for Scala Client ### What changes were proposed in this pull request? - Implemented the reconstruction of the complete exception (un-truncated error messages, cause exceptions, server-side stacktrace) based on the responses of FetchErrorDetails RPC. ### Why are the changes needed? - Cause exceptions play an important role in the current control flow, such as in StreamingQueryException. They are also valuable for debugging. - Un-truncated error message is useful for debugging - Providing server-side stack traces aids in effectively diagnosing server-related issues. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - `build/sbt "connect-client-jvm/testOnly *ClientE2ETestSuite"` - `build/sbt "connect-client-jvm/testOnly *ClientStreamingQuerySuite"` ### Was this patch authored or co-authored using generative AI tooling? No Closes #42987 from heyihong/SPARK-45207. Authored-by: Yihong He Signed-off-by: Hyukjin Kwon --- .../org/apache/spark/sql/ClientE2ETestSuite.scala | 59 ++- .../sql/streaming/ClientStreamingQuerySuite.scala | 41 - .../client/CustomSparkConnectBlockingStub.scala| 44 - .../connect/client/GrpcExceptionConverter.scala| 192 + 4 files changed, 292 insertions(+), 44 deletions(-) diff --git a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala index 21892542eab..ec9b1698a4e 100644 --- a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala +++ b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala @@ -18,6 +18,7 @@ package org.apache.spark.sql import java.io.{ByteArrayOutputStream, PrintStream} import java.nio.file.Files +import java.time.DateTimeException import java.util.Properties import scala.collection.JavaConverters._ @@ -29,7 +30,7 @@ import org.apache.commons.lang3.{JavaVersion, SystemUtils} import org.scalactic.TolerantNumerics import org.scalatest.PrivateMethodTester -import org.apache.spark.{SparkArithmeticException, SparkException} +import org.apache.spark.{SparkArithmeticException, SparkException, SparkUpgradeException} import org.apache.spark.SparkBuildInfo.{spark_version => SPARK_VERSION} import org.apache.spark.sql.catalyst.analysis.{NamespaceAlreadyExistsException, NoSuchDatabaseException, NoSuchTableException, TableAlreadyExistsException, TempTableAlreadyExistsException} import org.apache.spark.sql.catalyst.encoders.AgnosticEncoders.StringEncoder @@ -44,6 +45,62 @@ import org.apache.spark.sql.types._ class ClientE2ETestSuite extends RemoteSparkSession with SQLHelper with PrivateMethodTester { + for (enrichErrorEnabled <- Seq(false, true)) { +test(s"cause exception - ${enrichErrorEnabled}") { + withSQLConf("spark.sql.connect.enrichError.enabled" -> enrichErrorEnabled.toString) { +val ex = intercept[SparkUpgradeException] { + spark +.sql(""" +|select from_json( +| '{"d": "02-29"}', +| 'd date', +| map('dateFormat', 'MM-dd')) +|""".stripMargin) +.collect() +} +if (enrichErrorEnabled) { + assert(ex.getCause.isInstanceOf[DateTimeException]) +} else { + assert(ex.getCause == null) +} + } +} + } + + test(s"throw SparkException with large cause exception") { +withSQLConf("spark.sql.connect.enrichError.enabled" -> "true") { + val session = spark + import session.implicits._ + + val throwException = +udf((_: String) => throw new SparkException("test" * 1)) + + val ex = intercept[SparkException] { +Seq("1").toDS.withColumn("udf_val", throwException($"value")).collect() + } + + assert(ex.getCause.isInstanceOf[SparkException]) + assert(ex.getCause.getMessage.contains("test" * 1)) +} + } + + for (isServerStackTraceEnabled <- Seq(false, true)) { +test(s"server-side stack trace is set in exceptions - ${isServerStackTraceEnabled}") { + withSQLConf( +"spark.sql.connect.serverStacktrace.enabled" -> isServerStackTraceEnabled.toString, +"spark.s
[spark] branch master updated: [SPARK-45279][PYTHON][CONNECT] Attach plan_id for all logical plans
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 609552e19cf [SPARK-45279][PYTHON][CONNECT] Attach plan_id for all logical plans 609552e19cf is described below commit 609552e19cfe75109b1b4641baadd79360e75443 Author: Ruifeng Zheng AuthorDate: Mon Sep 25 08:17:08 2023 +0800 [SPARK-45279][PYTHON][CONNECT] Attach plan_id for all logical plans ### What changes were proposed in this pull request? Attach plan_id for all logical plans, except `CachedRelation` ### Why are the changes needed? 1, all logical plans should contain its plan id in protos 2, catalog plans also contain the plan id in scala client, e.g. https://github.com/apache/spark/blob/05f5dccbd34218c7d399228529853bdb1595f3a2/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala#L63-L67 `newDataset` method will set the plan id ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? CI ### Was this patch authored or co-authored using generative AI tooling? no Closes #43055 from zhengruifeng/connect_plan_id. Authored-by: Ruifeng Zheng Signed-off-by: Ruifeng Zheng --- python/pyspark/sql/connect/plan.py | 79 +++--- 1 file changed, 40 insertions(+), 39 deletions(-) diff --git a/python/pyspark/sql/connect/plan.py b/python/pyspark/sql/connect/plan.py index 219545cf646..6758b3673f3 100644 --- a/python/pyspark/sql/connect/plan.py +++ b/python/pyspark/sql/connect/plan.py @@ -1190,9 +1190,7 @@ class CollectMetrics(LogicalPlan): def plan(self, session: "SparkConnectClient") -> proto.Relation: assert self._child is not None - -plan = proto.Relation() -plan.common.plan_id = self._child._plan_id +plan = self._create_proto_relation() plan.collect_metrics.input.CopyFrom(self._child.plan(session)) plan.collect_metrics.name = self._name plan.collect_metrics.metrics.extend([self.col_to_expr(x, session) for x in self._exprs]) @@ -1689,7 +1687,9 @@ class CurrentDatabase(LogicalPlan): super().__init__(None) def plan(self, session: "SparkConnectClient") -> proto.Relation: -return proto.Relation(catalog=proto.Catalog(current_database=proto.CurrentDatabase())) +plan = self._create_proto_relation() +plan.catalog.current_database.SetInParent() +return plan class SetCurrentDatabase(LogicalPlan): @@ -1698,7 +1698,7 @@ class SetCurrentDatabase(LogicalPlan): self._db_name = db_name def plan(self, session: "SparkConnectClient") -> proto.Relation: -plan = proto.Relation() +plan = self._create_proto_relation() plan.catalog.set_current_database.db_name = self._db_name return plan @@ -1709,7 +1709,8 @@ class ListDatabases(LogicalPlan): self._pattern = pattern def plan(self, session: "SparkConnectClient") -> proto.Relation: -plan = proto.Relation(catalog=proto.Catalog(list_databases=proto.ListDatabases())) +plan = self._create_proto_relation() +plan.catalog.list_databases.SetInParent() if self._pattern is not None: plan.catalog.list_databases.pattern = self._pattern return plan @@ -1722,7 +1723,8 @@ class ListTables(LogicalPlan): self._pattern = pattern def plan(self, session: "SparkConnectClient") -> proto.Relation: -plan = proto.Relation(catalog=proto.Catalog(list_tables=proto.ListTables())) +plan = self._create_proto_relation() +plan.catalog.list_tables.SetInParent() if self._db_name is not None: plan.catalog.list_tables.db_name = self._db_name if self._pattern is not None: @@ -1737,7 +1739,8 @@ class ListFunctions(LogicalPlan): self._pattern = pattern def plan(self, session: "SparkConnectClient") -> proto.Relation: -plan = proto.Relation(catalog=proto.Catalog(list_functions=proto.ListFunctions())) +plan = self._create_proto_relation() +plan.catalog.list_functions.SetInParent() if self._db_name is not None: plan.catalog.list_functions.db_name = self._db_name if self._pattern is not None: @@ -1752,7 +1755,7 @@ class ListColumns(LogicalPlan): self._db_name = db_name def plan(self, session: "SparkConnectClient") -> proto.Relation: -plan = proto.Relation(catalog=proto.Catalog(list_columns=proto.ListColumns())) +plan = self._create_proto_relation() plan.catalog.list_columns.table_name = self._table_name if self._db_name is not None: plan.catalog.list_columns.db_name = self._db_name @@ -1765,7 +1768,7 @@ class GetDatabase(Logica
[spark] branch branch-3.3 updated: [SPARK-45286][DOCS] Add back Matomo analytics
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new 9a28200f6e4 [SPARK-45286][DOCS] Add back Matomo analytics 9a28200f6e4 is described below commit 9a28200f6e461c4929dd6e05b6dd55fe984c0924 Author: Sean Owen AuthorDate: Sun Sep 24 14:17:55 2023 -0500 [SPARK-45286][DOCS] Add back Matomo analytics ### What changes were proposed in this pull request? Add analytics to doc pages using the ASF's Matomo service ### Why are the changes needed? We had previously removed Google Analytics from the website and release docs, per ASF policy: https://github.com/apache/spark/pull/36310 We just restored analytics using the ASF-hosted Matomo service on the website: https://github.com/apache/spark-website/commit/a1548627b48a62c2e51870d1488ca3e09397bd30 This change would put the same new tracking code back into the release docs. It would let us see what docs and resources are most used, I suppose. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? N/A ### Was this patch authored or co-authored using generative AI tooling? No Closes #43063 from srowen/SPARK-45286. Authored-by: Sean Owen Signed-off-by: Sean Owen (cherry picked from commit a881438114ea3e8e918d981ef89ed1ab956d6fca) Signed-off-by: Sean Owen --- docs/_layouts/global.html | 19 +++ 1 file changed, 19 insertions(+) diff --git a/docs/_layouts/global.html b/docs/_layouts/global.html index d4463922766..2d139f5e0fb 100755 --- a/docs/_layouts/global.html +++ b/docs/_layouts/global.html @@ -33,6 +33,25 @@ https://cdn.jsdelivr.net/npm/docsearch.js@2/dist/cdn/docsearch.min.css"; /> +{% production %} + + +var _paq = window._paq = window._paq || []; +/* tracker methods like "setCustomDimension" should be called before "trackPageView" */ +_paq.push(["disableCookies"]); +_paq.push(['trackPageView']); +_paq.push(['enableLinkTracking']); +(function() { + var u="https://analytics.apache.org/";; + _paq.push(['setTrackerUrl', u+'matomo.php']); + _paq.push(['setSiteId', '40']); + var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0]; + g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s); +})(); + + +{% endproduction %} +
[spark] branch branch-3.4 updated: [SPARK-45286][DOCS] Add back Matomo analytics
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 20924aa581a [SPARK-45286][DOCS] Add back Matomo analytics 20924aa581a is described below commit 20924aa581a2c5c49ec700689f1888dd7db79e6b Author: Sean Owen AuthorDate: Sun Sep 24 14:17:55 2023 -0500 [SPARK-45286][DOCS] Add back Matomo analytics ### What changes were proposed in this pull request? Add analytics to doc pages using the ASF's Matomo service ### Why are the changes needed? We had previously removed Google Analytics from the website and release docs, per ASF policy: https://github.com/apache/spark/pull/36310 We just restored analytics using the ASF-hosted Matomo service on the website: https://github.com/apache/spark-website/commit/a1548627b48a62c2e51870d1488ca3e09397bd30 This change would put the same new tracking code back into the release docs. It would let us see what docs and resources are most used, I suppose. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? N/A ### Was this patch authored or co-authored using generative AI tooling? No Closes #43063 from srowen/SPARK-45286. Authored-by: Sean Owen Signed-off-by: Sean Owen (cherry picked from commit a881438114ea3e8e918d981ef89ed1ab956d6fca) Signed-off-by: Sean Owen --- docs/_layouts/global.html | 19 +++ 1 file changed, 19 insertions(+) diff --git a/docs/_layouts/global.html b/docs/_layouts/global.html index d4463922766..2d139f5e0fb 100755 --- a/docs/_layouts/global.html +++ b/docs/_layouts/global.html @@ -33,6 +33,25 @@ https://cdn.jsdelivr.net/npm/docsearch.js@2/dist/cdn/docsearch.min.css"; /> +{% production %} + + +var _paq = window._paq = window._paq || []; +/* tracker methods like "setCustomDimension" should be called before "trackPageView" */ +_paq.push(["disableCookies"]); +_paq.push(['trackPageView']); +_paq.push(['enableLinkTracking']); +(function() { + var u="https://analytics.apache.org/";; + _paq.push(['setTrackerUrl', u+'matomo.php']); + _paq.push(['setSiteId', '40']); + var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0]; + g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s); +})(); + + +{% endproduction %} +
[spark] branch branch-3.5 updated: [SPARK-45286][DOCS] Add back Matomo analytics
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 609306ff5da [SPARK-45286][DOCS] Add back Matomo analytics 609306ff5da is described below commit 609306ff5daa8ff7c2212088d33c0911ad0f4989 Author: Sean Owen AuthorDate: Sun Sep 24 14:17:55 2023 -0500 [SPARK-45286][DOCS] Add back Matomo analytics ### What changes were proposed in this pull request? Add analytics to doc pages using the ASF's Matomo service ### Why are the changes needed? We had previously removed Google Analytics from the website and release docs, per ASF policy: https://github.com/apache/spark/pull/36310 We just restored analytics using the ASF-hosted Matomo service on the website: https://github.com/apache/spark-website/commit/a1548627b48a62c2e51870d1488ca3e09397bd30 This change would put the same new tracking code back into the release docs. It would let us see what docs and resources are most used, I suppose. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? N/A ### Was this patch authored or co-authored using generative AI tooling? No Closes #43063 from srowen/SPARK-45286. Authored-by: Sean Owen Signed-off-by: Sean Owen (cherry picked from commit a881438114ea3e8e918d981ef89ed1ab956d6fca) Signed-off-by: Sean Owen --- docs/_layouts/global.html | 19 +++ 1 file changed, 19 insertions(+) diff --git a/docs/_layouts/global.html b/docs/_layouts/global.html index 9b7c4692461..8c4435fdf31 100755 --- a/docs/_layouts/global.html +++ b/docs/_layouts/global.html @@ -32,6 +32,25 @@ https://cdn.jsdelivr.net/npm/docsearch.js@2/dist/cdn/docsearch.min.css"; /> +{% production %} + + +var _paq = window._paq = window._paq || []; +/* tracker methods like "setCustomDimension" should be called before "trackPageView" */ +_paq.push(["disableCookies"]); +_paq.push(['trackPageView']); +_paq.push(['enableLinkTracking']); +(function() { + var u="https://analytics.apache.org/";; + _paq.push(['setTrackerUrl', u+'matomo.php']); + _paq.push(['setSiteId', '40']); + var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0]; + g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s); +})(); + + +{% endproduction %} +
[spark] branch master updated: [SPARK-45286][DOCS] Add back Matomo analytics
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a881438114e [SPARK-45286][DOCS] Add back Matomo analytics a881438114e is described below commit a881438114ea3e8e918d981ef89ed1ab956d6fca Author: Sean Owen AuthorDate: Sun Sep 24 14:17:55 2023 -0500 [SPARK-45286][DOCS] Add back Matomo analytics ### What changes were proposed in this pull request? Add analytics to doc pages using the ASF's Matomo service ### Why are the changes needed? We had previously removed Google Analytics from the website and release docs, per ASF policy: https://github.com/apache/spark/pull/36310 We just restored analytics using the ASF-hosted Matomo service on the website: https://github.com/apache/spark-website/commit/a1548627b48a62c2e51870d1488ca3e09397bd30 This change would put the same new tracking code back into the release docs. It would let us see what docs and resources are most used, I suppose. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? N/A ### Was this patch authored or co-authored using generative AI tooling? No Closes #43063 from srowen/SPARK-45286. Authored-by: Sean Owen Signed-off-by: Sean Owen --- docs/_layouts/global.html | 19 +++ 1 file changed, 19 insertions(+) diff --git a/docs/_layouts/global.html b/docs/_layouts/global.html index e857efad6f0..c2f05cfd6bb 100755 --- a/docs/_layouts/global.html +++ b/docs/_layouts/global.html @@ -32,6 +32,25 @@ https://cdn.jsdelivr.net/npm/docsearch.js@2/dist/cdn/docsearch.min.css"; /> +{% production %} + + +var _paq = window._paq = window._paq || []; +/* tracker methods like "setCustomDimension" should be called before "trackPageView" */ +_paq.push(["disableCookies"]); +_paq.push(['trackPageView']); +_paq.push(['enableLinkTracking']); +(function() { + var u="https://analytics.apache.org/";; + _paq.push(['setTrackerUrl', u+'matomo.php']); + _paq.push(['setSiteId', '40']); + var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0]; + g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s); +})(); + + +{% endproduction %} +