RE: [PATCH v2 00/11] Connect VFIO to IOMMUFD
On 2022/11/14 20:51, Yi Liu wrote: > On 2022/11/10 00:57, Jason Gunthorpe wrote: >> On Tue, Nov 08, 2022 at 11:18:03PM +0800, Yi Liu wrote: >>> On 2022/11/8 17:19, Nicolin Chen wrote: >>>> On Mon, Nov 07, 2022 at 08:52:44PM -0400, Jason Gunthorpe wrote: >>>> >>>>> This is on github: >>>>> https://github.com/jgunthorpe/linux/commits/vfio_iommufd >>>> [...] >>>>> v2: >>>>>- Rebase to v6.1-rc3, v4 iommufd series >>>>>- Fixup comments and commit messages from list remarks >>>>>- Fix leaking of the iommufd for mdevs >>>>>- New patch to fix vfio modaliases when vfio container is disabled >>>>>- Add a dmesg once when the iommufd provided /dev/vfio/vfio is opened >>>>> to signal that iommufd is providing this >>>> >>>> I've redone my previous sanity tests. Except those reported bugs, >>>> things look fine. Once we fix those issues, GVT and other modules >>>> can run some more stressful tests, I think. >>> >>> our side is also starting test (gvt, nic passthrough) this version. >>> need to wait a while for the result. >> >> I've updated the branches with the two functional fixes discussed on >> the list plus all the doc updates. >> > > I see, due to timzone, the kernel we grabbed is 37c9e6e44d77a, it has > slight diff in the scripts/kernel-doc compared with the latest commit > (6bb16a9c67769). I don't think it impacts the test. > > https://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd.git/log/?h=for-next > (37c9e6e44d77a) > > Our side, Yu He, Lixiao Yang has done below tests on Intel platform > with the above kernel, results are: > > 1) GVT-g test suit passed, Intel iGFx passthrough passed. > > 2) NIC passthrough test with different guest memory (1G/4G), passed. > > 3) Booting two different QEMUs in the same time but one QEMU opens > legacy /dev/vfio/vfio and another opens /dev/iommu. Tests passed. > > 4) Tried below Kconfig combinations, results are expected. > > VFIO_CONTAINER=y, IOMMUFD=y -- test pass > VFIO_CONTAINER=y, IOMMUFD=n -- test pass > VFIO_CONTAINER=n, IOMMUFD=y , IOMMUFD_VFIO_CONTAINER=y -- test pass > VFIO_CONTAINER=n, IOMMUFD=y , IOMMUFD_VFIO_CONTAINER=n -- no > /dev/vfio/vfio, so test fail, expected > > 5) Tested devices from multi-device group. Assign such devices to the > same VM, pass; assign them to different VMs, fail; assign them to a VM > with Intel virtual VT-d, fail; Results are expected. > > Meanwhile, I also tested the branch in development branch for nesting, > the basic functionality looks good. > > Tested-by: Yi Liu > Tested-by: Lixiao Yang -- Regards, Lixiao Yang
Re: [Intel-gfx] [PATCH v2 00/11] Connect VFIO to IOMMUFD
On 2022/11/14 20:51, Yi Liu wrote: > On 2022/11/10 00:57, Jason Gunthorpe wrote: >> On Tue, Nov 08, 2022 at 11:18:03PM +0800, Yi Liu wrote: >>> On 2022/11/8 17:19, Nicolin Chen wrote: >>>> On Mon, Nov 07, 2022 at 08:52:44PM -0400, Jason Gunthorpe wrote: >>>> >>>>> This is on github: >>>>> https://github.com/jgunthorpe/linux/commits/vfio_iommufd >>>> [...] >>>>> v2: >>>>>- Rebase to v6.1-rc3, v4 iommufd series >>>>>- Fixup comments and commit messages from list remarks >>>>>- Fix leaking of the iommufd for mdevs >>>>>- New patch to fix vfio modaliases when vfio container is disabled >>>>>- Add a dmesg once when the iommufd provided /dev/vfio/vfio is opened >>>>> to signal that iommufd is providing this >>>> >>>> I've redone my previous sanity tests. Except those reported bugs, >>>> things look fine. Once we fix those issues, GVT and other modules >>>> can run some more stressful tests, I think. >>> >>> our side is also starting test (gvt, nic passthrough) this version. >>> need to wait a while for the result. >> >> I've updated the branches with the two functional fixes discussed on >> the list plus all the doc updates. >> > > I see, due to timzone, the kernel we grabbed is 37c9e6e44d77a, it has > slight diff in the scripts/kernel-doc compared with the latest commit > (6bb16a9c67769). I don't think it impacts the test. > > https://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd.git/log/?h=for-next > (37c9e6e44d77a) > > Our side, Yu He, Lixiao Yang has done below tests on Intel platform > with the above kernel, results are: > > 1) GVT-g test suit passed, Intel iGFx passthrough passed. > > 2) NIC passthrough test with different guest memory (1G/4G), passed. > > 3) Booting two different QEMUs in the same time but one QEMU opens > legacy /dev/vfio/vfio and another opens /dev/iommu. Tests passed. > > 4) Tried below Kconfig combinations, results are expected. > > VFIO_CONTAINER=y, IOMMUFD=y -- test pass > VFIO_CONTAINER=y, IOMMUFD=n -- test pass > VFIO_CONTAINER=n, IOMMUFD=y , IOMMUFD_VFIO_CONTAINER=y -- test pass > VFIO_CONTAINER=n, IOMMUFD=y , IOMMUFD_VFIO_CONTAINER=n -- no > /dev/vfio/vfio, so test fail, expected > > 5) Tested devices from multi-device group. Assign such devices to the > same VM, pass; assign them to different VMs, fail; assign them to a VM > with Intel virtual VT-d, fail; Results are expected. > > Meanwhile, I also tested the branch in development branch for nesting, > the basic functionality looks good. > > Tested-by: Yi Liu > Tested-by: Lixiao Yang -- Regards, Lixiao Yang
[spark-website] branch asf-site updated: Remove preview for 3.0 in Download page (#368)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 13be9bc Remove preview for 3.0 in Download page (#368) 13be9bc is described below commit 13be9bcd059cfb60f60320e520c3eb36adf00cc8 Author: wuyi AuthorDate: Wed Nov 10 00:57:51 2021 +0800 Remove preview for 3.0 in Download page (#368) --- downloads.md| 8 site/downloads.html | 8 2 files changed, 16 deletions(-) diff --git a/downloads.md b/downloads.md index 518ae5b..993bd7a 100644 --- a/downloads.md +++ b/downloads.md @@ -30,14 +30,6 @@ window.onload = function () { Note that, Spark 2.x is pre-built with Scala 2.11 except version 2.4.2, which is pre-built with Scala 2.12. Spark 3.0+ is pre-built with Scala 2.12. -### Latest preview release -Preview releases, as the name suggests, are releases for previewing upcoming features. -Unlike nightly packages, preview releases have been audited by the project's management committee -to satisfy the legal requirements of Apache Software Foundation's release policy. -Preview releases are not meant to be functional, i.e. they can and highly likely will contain -critical bugs or documentation errors. -The latest preview release is Spark 3.0.0-preview2, published on Dec 23, 2019. - ### Link with Spark Spark artifacts are [hosted in Maven Central](https://search.maven.org/search?q=g:org.apache.spark). You can add a Maven dependency with the following coordinates: diff --git a/site/downloads.html b/site/downloads.html index 8869e19..2deb4e4 100644 --- a/site/downloads.html +++ b/site/downloads.html @@ -174,14 +174,6 @@ window.onload = function () { Note that, Spark 2.x is pre-built with Scala 2.11 except version 2.4.2, which is pre-built with Scala 2.12. Spark 3.0+ is pre-built with Scala 2.12. -Latest preview release -Preview releases, as the name suggests, are releases for previewing upcoming features. -Unlike nightly packages, preview releases have been audited by the project’s management committee -to satisfy the legal requirements of Apache Software Foundation’s release policy. -Preview releases are not meant to be functional, i.e. they can and highly likely will contain -critical bugs or documentation errors. -The latest preview release is Spark 3.0.0-preview2, published on Dec 23, 2019. - Link with Spark Spark artifacts are https://search.maven.org/search?q=g:org.apache.spark";>hosted in Maven Central. You can add a Maven dependency with the following coordinates: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-website] branch asf-site updated: Update Spark 3.3 release window (#366)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new ec02d91 Update Spark 3.3 release window (#366) ec02d91 is described below commit ec02d9186df432ada948b4c22e326814c0ec79b2 Author: Hyukjin Kwon AuthorDate: Sat Oct 30 05:39:02 2021 +0900 Update Spark 3.3 release window (#366) --- site/versioning-policy.html | 8 versioning-policy.md| 8 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/site/versioning-policy.html b/site/versioning-policy.html index d4d024b..ed75de8 100644 --- a/site/versioning-policy.html +++ b/site/versioning-policy.html @@ -272,7 +272,7 @@ available APIs. generally be released about 6 months after 2.2.0. Maintenance releases happen as needed in between feature releases. Major releases do not happen according to a fixed schedule. -Spark 3.2 release window +Spark 3.3 release window @@ -283,15 +283,15 @@ in between feature releases. Major releases do not happen according to a fixed s - July 1st 2021 + March 15th 2022 Code freeze. Release branch cut. - Mid July 2021 + Late March 2022 QA period. Focus on bug fixes, tests, stability and docs. Generally, no new features merged. - August 2021 + April 2022 Release candidates (RC), voting, etc. until final release passes diff --git a/versioning-policy.md b/versioning-policy.md index 3d3f03f..55a0bd3 100644 --- a/versioning-policy.md +++ b/versioning-policy.md @@ -103,13 +103,13 @@ In general, feature ("minor") releases occur about every 6 months. Hence, Spark generally be released about 6 months after 2.2.0. Maintenance releases happen as needed in between feature releases. Major releases do not happen according to a fixed schedule. -Spark 3.2 release window +Spark 3.3 release window | Date | Event | | - | - | -| July 1st 2021 | Code freeze. Release branch cut.| -| Mid July 2021 | QA period. Focus on bug fixes, tests, stability and docs. Generally, no new features merged.| -| August 2021 | Release candidates (RC), voting, etc. until final release passes| +| March 15th 2022 | Code freeze. Release branch cut.| +| Late March 2022 | QA period. Focus on bug fixes, tests, stability and docs. Generally, no new features merged.| +| April 2022 | Release candidates (RC), voting, etc. until final release passes| Maintenance releases and EOL - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ed9e6fc -> dfa3978)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ed9e6fc [SPARK-33565][INFRA][FOLLOW-UP] Keep the test coverage with Python 3.8 in GitHub Actions add dfa3978 [SPARK-33551][SQL] Do not use custom shuffle reader for repartition No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/internal/SQLConf.scala| 2 +- .../execution/adaptive/AdaptiveSparkPlanExec.scala | 31 +++--- .../adaptive/CoalesceShufflePartitions.scala | 11 +- ...costing.scala => CustomShuffleReaderRule.scala} | 15 +-- .../adaptive/OptimizeLocalShuffleReader.scala | 9 +- .../execution/adaptive/OptimizeSkewedJoin.scala| 14 ++- .../adaptive/AdaptiveQueryExecSuite.scala | 116 - 7 files changed, 162 insertions(+), 36 deletions(-) copy sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/{costing.scala => CustomShuffleReaderRule.scala} (69%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a09747b -> 14aeab3)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a09747b [SPARK-33063][K8S] Improve error message for insufficient K8s volume confs add 14aeab3 [SPARK-33038][SQL] Combine AQE initial and current plan string when two plans are the same No new revisions were added by this update. Summary of changes: .../execution/adaptive/AdaptiveSparkPlanExec.scala | 50 ++--- .../sql-tests/results/explain-aqe.sql.out | 123 +++-- .../adaptive/AdaptiveQueryExecSuite.scala | 4 +- 3 files changed, 47 insertions(+), 130 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a09747b -> 14aeab3)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a09747b [SPARK-33063][K8S] Improve error message for insufficient K8s volume confs add 14aeab3 [SPARK-33038][SQL] Combine AQE initial and current plan string when two plans are the same No new revisions were added by this update. Summary of changes: .../execution/adaptive/AdaptiveSparkPlanExec.scala | 50 ++--- .../sql-tests/results/explain-aqe.sql.out | 123 +++-- .../adaptive/AdaptiveQueryExecSuite.scala | 4 +- 3 files changed, 47 insertions(+), 130 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a09747b -> 14aeab3)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a09747b [SPARK-33063][K8S] Improve error message for insufficient K8s volume confs add 14aeab3 [SPARK-33038][SQL] Combine AQE initial and current plan string when two plans are the same No new revisions were added by this update. Summary of changes: .../execution/adaptive/AdaptiveSparkPlanExec.scala | 50 ++--- .../sql-tests/results/explain-aqe.sql.out | 123 +++-- .../adaptive/AdaptiveQueryExecSuite.scala | 4 +- 3 files changed, 47 insertions(+), 130 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a09747b -> 14aeab3)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a09747b [SPARK-33063][K8S] Improve error message for insufficient K8s volume confs add 14aeab3 [SPARK-33038][SQL] Combine AQE initial and current plan string when two plans are the same No new revisions were added by this update. Summary of changes: .../execution/adaptive/AdaptiveSparkPlanExec.scala | 50 ++--- .../sql-tests/results/explain-aqe.sql.out | 123 +++-- .../adaptive/AdaptiveQueryExecSuite.scala | 4 +- 3 files changed, 47 insertions(+), 130 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a09747b -> 14aeab3)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a09747b [SPARK-33063][K8S] Improve error message for insufficient K8s volume confs add 14aeab3 [SPARK-33038][SQL] Combine AQE initial and current plan string when two plans are the same No new revisions were added by this update. Summary of changes: .../execution/adaptive/AdaptiveSparkPlanExec.scala | 50 ++--- .../sql-tests/results/explain-aqe.sql.out | 123 +++-- .../adaptive/AdaptiveQueryExecSuite.scala | 4 +- 3 files changed, 47 insertions(+), 130 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-website] branch asf-site updated: Update the artifactId in the Download Page #276
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 2c5679f Update the artifactId in the Download Page #276 2c5679f is described below commit 2c5679f415c3605726e68c0a2b8c204c91131d0c Author: Xiao Li AuthorDate: Tue Jun 23 17:38:28 2020 -0700 Update the artifactId in the Download Page #276 The existing artifactId is not correct. We need to update it from 2.11 to 2.12 --- downloads.md| 2 +- site/downloads.html | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/downloads.md b/downloads.md index 2ed9870..880024a 100644 --- a/downloads.md +++ b/downloads.md @@ -40,7 +40,7 @@ The latest preview release is Spark 3.0.0-preview2, published on Dec 23, 2019. Spark artifacts are [hosted in Maven Central](https://search.maven.org/search?q=g:org.apache.spark). You can add a Maven dependency with the following coordinates: groupId: org.apache.spark -artifactId: spark-core_2.11 +artifactId: spark-core_2.12 version: 3.0.0 ### Installing with PyPi diff --git a/site/downloads.html b/site/downloads.html index e3b060f..d820471 100644 --- a/site/downloads.html +++ b/site/downloads.html @@ -240,7 +240,7 @@ The latest preview release is Spark 3.0.0-preview2, published on Dec 23, 2019.Spark artifacts are https://search.maven.org/search?q=g:org.apache.spark";>hosted in Maven Central. You can add a Maven dependency with the following coordinates: groupId: org.apache.spark -artifactId: spark-core_2.11 +artifactId: spark-core_2.12 version: 3.0.0 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31387] Handle unknown operation/session ID in HiveThriftServer2Listener
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 512cb2f [SPARK-31387] Handle unknown operation/session ID in HiveThriftServer2Listener 512cb2f is described below commit 512cb2f0246a0d020f0ba726b4596555b15797c6 Author: Ali Smesseim AuthorDate: Tue May 12 09:14:34 2020 -0700 [SPARK-31387] Handle unknown operation/session ID in HiveThriftServer2Listener ### What changes were proposed in this pull request? The update methods in HiveThriftServer2Listener now check if the parameter operation/session ID actually exist in the `sessionList` and `executionList` respectively. This prevents NullPointerExceptions if the operation or session ID is unknown. Instead, a warning is written to the log. Also, in HiveSessionImpl.close(), we catch any exception thrown by `operationManager.closeOperation`. If for any reason this throws an exception, other operations are not prevented from being closed. ### Why are the changes needed? The listener's update methods would throw an exception if the operation or session ID is unknown. In Spark 2, where the listener is called directly, this hampers with the caller's control flow. In Spark 3, the exception is caught by the ListenerBus but results in an uninformative NullPointerException. In HiveSessionImpl.close(), if an exception is thrown when closing an operation, all following operations are not closed. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Unit tests Closes #28155 from alismess-db/hive-thriftserver-listener-update-safer. Authored-by: Ali Smesseim Signed-off-by: gatorsmile (cherry picked from commit 6994c64efd5770a8fd33220cbcaddc1d96fed886) Signed-off-by: gatorsmile --- .../ui/HiveThriftServer2Listener.scala | 120 - .../hive/thriftserver/HiveSessionImplSuite.scala | 73 + .../ui/HiveThriftServer2ListenerSuite.scala| 16 +++ .../hive/service/cli/session/HiveSessionImpl.java | 6 +- .../hive/service/cli/session/HiveSessionImpl.java | 6 +- 5 files changed, 170 insertions(+), 51 deletions(-) diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala index 6d0a506..20a8f2c 100644 --- a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala +++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala @@ -25,6 +25,7 @@ import scala.collection.mutable.ArrayBuffer import org.apache.hive.service.server.HiveServer2 import org.apache.spark.{SparkConf, SparkContext} +import org.apache.spark.internal.Logging import org.apache.spark.internal.config.Status.LIVE_ENTITY_UPDATE_PERIOD import org.apache.spark.scheduler._ import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.ExecutionState @@ -38,7 +39,7 @@ private[thriftserver] class HiveThriftServer2Listener( kvstore: ElementTrackingStore, sparkConf: SparkConf, server: Option[HiveServer2], -live: Boolean = true) extends SparkListener { +live: Boolean = true) extends SparkListener with Logging { private val sessionList = new ConcurrentHashMap[String, LiveSessionData]() private val executionList = new ConcurrentHashMap[String, LiveExecutionData]() @@ -131,60 +132,81 @@ private[thriftserver] class HiveThriftServer2Listener( updateLiveStore(session) } - private def onSessionClosed(e: SparkListenerThriftServerSessionClosed): Unit = { -val session = sessionList.get(e.sessionId) -session.finishTimestamp = e.finishTime -updateStoreWithTriggerEnabled(session) -sessionList.remove(e.sessionId) - } + private def onSessionClosed(e: SparkListenerThriftServerSessionClosed): Unit = +Option(sessionList.get(e.sessionId)) match { + case None => logWarning(s"onSessionClosed called with unknown session id: ${e.sessionId}") + case Some(sessionData) => +val session = sessionData +session.finishTimestamp = e.finishTime +updateStoreWithTriggerEnabled(session) +sessionList.remove(e.sessionId) +} - private def onOperationStart(e: SparkListenerThriftServerOperationStart): Unit = { -val info = getOrCreateExecution( - e.id, - e.statement, - e.sessionId, - e.startTime, - e.userName) - -info.state = ExecutionState.STARTED -executionList.put(e.id, info) -sessionList.get(e.sessionId).totalExecution += 1 -executionList.get
[spark] branch branch-3.0 updated: [SPARK-31387] Handle unknown operation/session ID in HiveThriftServer2Listener
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 512cb2f [SPARK-31387] Handle unknown operation/session ID in HiveThriftServer2Listener 512cb2f is described below commit 512cb2f0246a0d020f0ba726b4596555b15797c6 Author: Ali Smesseim AuthorDate: Tue May 12 09:14:34 2020 -0700 [SPARK-31387] Handle unknown operation/session ID in HiveThriftServer2Listener ### What changes were proposed in this pull request? The update methods in HiveThriftServer2Listener now check if the parameter operation/session ID actually exist in the `sessionList` and `executionList` respectively. This prevents NullPointerExceptions if the operation or session ID is unknown. Instead, a warning is written to the log. Also, in HiveSessionImpl.close(), we catch any exception thrown by `operationManager.closeOperation`. If for any reason this throws an exception, other operations are not prevented from being closed. ### Why are the changes needed? The listener's update methods would throw an exception if the operation or session ID is unknown. In Spark 2, where the listener is called directly, this hampers with the caller's control flow. In Spark 3, the exception is caught by the ListenerBus but results in an uninformative NullPointerException. In HiveSessionImpl.close(), if an exception is thrown when closing an operation, all following operations are not closed. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Unit tests Closes #28155 from alismess-db/hive-thriftserver-listener-update-safer. Authored-by: Ali Smesseim Signed-off-by: gatorsmile (cherry picked from commit 6994c64efd5770a8fd33220cbcaddc1d96fed886) Signed-off-by: gatorsmile --- .../ui/HiveThriftServer2Listener.scala | 120 - .../hive/thriftserver/HiveSessionImplSuite.scala | 73 + .../ui/HiveThriftServer2ListenerSuite.scala| 16 +++ .../hive/service/cli/session/HiveSessionImpl.java | 6 +- .../hive/service/cli/session/HiveSessionImpl.java | 6 +- 5 files changed, 170 insertions(+), 51 deletions(-) diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala index 6d0a506..20a8f2c 100644 --- a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala +++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala @@ -25,6 +25,7 @@ import scala.collection.mutable.ArrayBuffer import org.apache.hive.service.server.HiveServer2 import org.apache.spark.{SparkConf, SparkContext} +import org.apache.spark.internal.Logging import org.apache.spark.internal.config.Status.LIVE_ENTITY_UPDATE_PERIOD import org.apache.spark.scheduler._ import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.ExecutionState @@ -38,7 +39,7 @@ private[thriftserver] class HiveThriftServer2Listener( kvstore: ElementTrackingStore, sparkConf: SparkConf, server: Option[HiveServer2], -live: Boolean = true) extends SparkListener { +live: Boolean = true) extends SparkListener with Logging { private val sessionList = new ConcurrentHashMap[String, LiveSessionData]() private val executionList = new ConcurrentHashMap[String, LiveExecutionData]() @@ -131,60 +132,81 @@ private[thriftserver] class HiveThriftServer2Listener( updateLiveStore(session) } - private def onSessionClosed(e: SparkListenerThriftServerSessionClosed): Unit = { -val session = sessionList.get(e.sessionId) -session.finishTimestamp = e.finishTime -updateStoreWithTriggerEnabled(session) -sessionList.remove(e.sessionId) - } + private def onSessionClosed(e: SparkListenerThriftServerSessionClosed): Unit = +Option(sessionList.get(e.sessionId)) match { + case None => logWarning(s"onSessionClosed called with unknown session id: ${e.sessionId}") + case Some(sessionData) => +val session = sessionData +session.finishTimestamp = e.finishTime +updateStoreWithTriggerEnabled(session) +sessionList.remove(e.sessionId) +} - private def onOperationStart(e: SparkListenerThriftServerOperationStart): Unit = { -val info = getOrCreateExecution( - e.id, - e.statement, - e.sessionId, - e.startTime, - e.userName) - -info.state = ExecutionState.STARTED -executionList.put(e.id, info) -sessionList.get(e.sessionId).totalExecution += 1 -executionList.get
[spark] branch master updated (e248bc7 -> 6994c64)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e248bc7 [SPARK-31610][SPARK-31668][ML] Address hashingTF saving&loading bug and expose hashFunc property in HashingTF add 6994c64 [SPARK-31387] Handle unknown operation/session ID in HiveThriftServer2Listener No new revisions were added by this update. Summary of changes: .../ui/HiveThriftServer2Listener.scala | 120 - .../hive/thriftserver/HiveSessionImplSuite.scala | 73 + .../ui/HiveThriftServer2ListenerSuite.scala| 16 +++ .../hive/service/cli/session/HiveSessionImpl.java | 6 +- .../hive/service/cli/session/HiveSessionImpl.java | 6 +- 5 files changed, 170 insertions(+), 51 deletions(-) create mode 100644 sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveSessionImplSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31387] Handle unknown operation/session ID in HiveThriftServer2Listener
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 512cb2f [SPARK-31387] Handle unknown operation/session ID in HiveThriftServer2Listener 512cb2f is described below commit 512cb2f0246a0d020f0ba726b4596555b15797c6 Author: Ali Smesseim AuthorDate: Tue May 12 09:14:34 2020 -0700 [SPARK-31387] Handle unknown operation/session ID in HiveThriftServer2Listener ### What changes were proposed in this pull request? The update methods in HiveThriftServer2Listener now check if the parameter operation/session ID actually exist in the `sessionList` and `executionList` respectively. This prevents NullPointerExceptions if the operation or session ID is unknown. Instead, a warning is written to the log. Also, in HiveSessionImpl.close(), we catch any exception thrown by `operationManager.closeOperation`. If for any reason this throws an exception, other operations are not prevented from being closed. ### Why are the changes needed? The listener's update methods would throw an exception if the operation or session ID is unknown. In Spark 2, where the listener is called directly, this hampers with the caller's control flow. In Spark 3, the exception is caught by the ListenerBus but results in an uninformative NullPointerException. In HiveSessionImpl.close(), if an exception is thrown when closing an operation, all following operations are not closed. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Unit tests Closes #28155 from alismess-db/hive-thriftserver-listener-update-safer. Authored-by: Ali Smesseim Signed-off-by: gatorsmile (cherry picked from commit 6994c64efd5770a8fd33220cbcaddc1d96fed886) Signed-off-by: gatorsmile --- .../ui/HiveThriftServer2Listener.scala | 120 - .../hive/thriftserver/HiveSessionImplSuite.scala | 73 + .../ui/HiveThriftServer2ListenerSuite.scala| 16 +++ .../hive/service/cli/session/HiveSessionImpl.java | 6 +- .../hive/service/cli/session/HiveSessionImpl.java | 6 +- 5 files changed, 170 insertions(+), 51 deletions(-) diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala index 6d0a506..20a8f2c 100644 --- a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala +++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala @@ -25,6 +25,7 @@ import scala.collection.mutable.ArrayBuffer import org.apache.hive.service.server.HiveServer2 import org.apache.spark.{SparkConf, SparkContext} +import org.apache.spark.internal.Logging import org.apache.spark.internal.config.Status.LIVE_ENTITY_UPDATE_PERIOD import org.apache.spark.scheduler._ import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.ExecutionState @@ -38,7 +39,7 @@ private[thriftserver] class HiveThriftServer2Listener( kvstore: ElementTrackingStore, sparkConf: SparkConf, server: Option[HiveServer2], -live: Boolean = true) extends SparkListener { +live: Boolean = true) extends SparkListener with Logging { private val sessionList = new ConcurrentHashMap[String, LiveSessionData]() private val executionList = new ConcurrentHashMap[String, LiveExecutionData]() @@ -131,60 +132,81 @@ private[thriftserver] class HiveThriftServer2Listener( updateLiveStore(session) } - private def onSessionClosed(e: SparkListenerThriftServerSessionClosed): Unit = { -val session = sessionList.get(e.sessionId) -session.finishTimestamp = e.finishTime -updateStoreWithTriggerEnabled(session) -sessionList.remove(e.sessionId) - } + private def onSessionClosed(e: SparkListenerThriftServerSessionClosed): Unit = +Option(sessionList.get(e.sessionId)) match { + case None => logWarning(s"onSessionClosed called with unknown session id: ${e.sessionId}") + case Some(sessionData) => +val session = sessionData +session.finishTimestamp = e.finishTime +updateStoreWithTriggerEnabled(session) +sessionList.remove(e.sessionId) +} - private def onOperationStart(e: SparkListenerThriftServerOperationStart): Unit = { -val info = getOrCreateExecution( - e.id, - e.statement, - e.sessionId, - e.startTime, - e.userName) - -info.state = ExecutionState.STARTED -executionList.put(e.id, info) -sessionList.get(e.sessionId).totalExecution += 1 -executionList.get
[spark] branch master updated (e248bc7 -> 6994c64)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e248bc7 [SPARK-31610][SPARK-31668][ML] Address hashingTF saving&loading bug and expose hashFunc property in HashingTF add 6994c64 [SPARK-31387] Handle unknown operation/session ID in HiveThriftServer2Listener No new revisions were added by this update. Summary of changes: .../ui/HiveThriftServer2Listener.scala | 120 - .../hive/thriftserver/HiveSessionImplSuite.scala | 73 + .../ui/HiveThriftServer2ListenerSuite.scala| 16 +++ .../hive/service/cli/session/HiveSessionImpl.java | 6 +- .../hive/service/cli/session/HiveSessionImpl.java | 6 +- 5 files changed, 170 insertions(+), 51 deletions(-) create mode 100644 sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveSessionImplSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e248bc7 -> 6994c64)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e248bc7 [SPARK-31610][SPARK-31668][ML] Address hashingTF saving&loading bug and expose hashFunc property in HashingTF add 6994c64 [SPARK-31387] Handle unknown operation/session ID in HiveThriftServer2Listener No new revisions were added by this update. Summary of changes: .../ui/HiveThriftServer2Listener.scala | 120 - .../hive/thriftserver/HiveSessionImplSuite.scala | 73 + .../ui/HiveThriftServer2ListenerSuite.scala| 16 +++ .../hive/service/cli/session/HiveSessionImpl.java | 6 +- .../hive/service/cli/session/HiveSessionImpl.java | 6 +- 5 files changed, 170 insertions(+), 51 deletions(-) create mode 100644 sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveSessionImplSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31658][SQL] Fix SQL UI not showing write commands of AQE plan
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new ba43922 [SPARK-31658][SQL] Fix SQL UI not showing write commands of AQE plan ba43922 is described below commit ba4392217b461d20bfd10dbc00714dbb7268d71a Author: manuzhang AuthorDate: Fri May 8 10:24:13 2020 -0700 [SPARK-31658][SQL] Fix SQL UI not showing write commands of AQE plan Show write commands on SQL UI of an AQE plan Currently the leaf node of an AQE plan is always a `AdaptiveSparkPlan` which is not true when it's a child of a write command. Hence, the node of the write command as well as its metrics are not shown on the SQL UI. ![image](https://user-images.githubusercontent.com/1191767/81288918-1893f580-9098-11ea-9771-e3d0820ba806.png) ![image](https://user-images.githubusercontent.com/1191767/81289008-3a8d7800-9098-11ea-93ec-516bbaf25d2d.png) No Add UT. Closes #28474 from manuzhang/aqe-ui. Lead-authored-by: manuzhang Co-authored-by: Xiao Li Signed-off-by: gatorsmile (cherry picked from commit 77c690a7252b22c9dd8f3cb7ac32f79fd6845cad) Signed-off-by: gatorsmile --- .../execution/adaptive/AdaptiveSparkPlanExec.scala | 4 +-- .../adaptive/AdaptiveQueryExecSuite.scala | 35 -- 2 files changed, 34 insertions(+), 5 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala index cd6936b..90d1db9 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala @@ -526,8 +526,8 @@ case class AdaptiveSparkPlanExec( } else { context.session.sparkContext.listenerBus.post(SparkListenerSQLAdaptiveExecutionUpdate( executionId, -SQLExecution.getQueryExecution(executionId).toString, -SparkPlanInfo.fromSparkPlan(this))) +context.qe.toString, +SparkPlanInfo.fromSparkPlan(context.qe.executedPlan))) } } diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala index f30d1e9..29b9755 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala @@ -805,9 +805,11 @@ class AdaptiveQueryExecSuite test("SPARK-30953: InsertAdaptiveSparkPlan should apply AQE on child plan of write commands") { withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true", SQLConf.ADAPTIVE_EXECUTION_FORCE_APPLY.key -> "true") { - val plan = sql("CREATE TABLE t1 AS SELECT 1 col").queryExecution.executedPlan - assert(plan.isInstanceOf[DataWritingCommandExec]) - assert(plan.asInstanceOf[DataWritingCommandExec].child.isInstanceOf[AdaptiveSparkPlanExec]) + withTable("t1") { +val plan = sql("CREATE TABLE t1 AS SELECT 1 col").queryExecution.executedPlan +assert(plan.isInstanceOf[DataWritingCommandExec]) + assert(plan.asInstanceOf[DataWritingCommandExec].child.isInstanceOf[AdaptiveSparkPlanExec]) + } } } @@ -847,4 +849,31 @@ class AdaptiveQueryExecSuite } } } + + test("SPARK-31658: SQL UI should show write commands") { +withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true", + SQLConf.ADAPTIVE_EXECUTION_FORCE_APPLY.key -> "true") { + withTable("t1") { +var checkDone = false +val listener = new SparkListener { + override def onOtherEvent(event: SparkListenerEvent): Unit = { +event match { + case SparkListenerSQLAdaptiveExecutionUpdate(_, _, planInfo) => +assert(planInfo.nodeName == "Execute CreateDataSourceTableAsSelectCommand") +checkDone = true + case _ => // ignore other events +} + } +} +spark.sparkContext.addSparkListener(listener) +try { + sql("CREATE TABLE t1 AS SELECT 1 col").collect() + spark.sparkContext.listenerBus.waitUntilEmpty() + assert(checkDone) +} finally { + spark.sparkContext.removeSparkListener(listener) +} + } +} + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (0fb607e -> 77c690a)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 0fb607e [SPARK-30385][WEBUI] WebUI occasionally throw IOException on stop() add 77c690a [SPARK-31658][SQL] Fix SQL UI not showing write commands of AQE plan No new revisions were added by this update. Summary of changes: .../execution/adaptive/AdaptiveSparkPlanExec.scala | 4 +-- .../adaptive/AdaptiveQueryExecSuite.scala | 35 -- 2 files changed, 34 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (0fb607e -> 77c690a)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 0fb607e [SPARK-30385][WEBUI] WebUI occasionally throw IOException on stop() add 77c690a [SPARK-31658][SQL] Fix SQL UI not showing write commands of AQE plan No new revisions were added by this update. Summary of changes: .../execution/adaptive/AdaptiveSparkPlanExec.scala | 4 +-- .../adaptive/AdaptiveQueryExecSuite.scala | 35 -- 2 files changed, 34 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (348fd53 -> 75da050)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 348fd53 [SPARK-31307][ML][EXAMPLES] Add examples for ml.fvalue add 75da050 [MINOR][SQL][DOCS] Remove two leading spaces from sql tables No new revisions were added by this update. Summary of changes: docs/sql-ref-ansi-compliance.md | 40 +- docs/sql-ref-functions-udf-hive.md| 82 ++-- docs/sql-ref-null-semantics.md| 512 +++--- docs/sql-ref-syntax-aux-analyze-table.md | 88 ++-- docs/sql-ref-syntax-aux-conf-mgmt-set.md | 10 +- docs/sql-ref-syntax-aux-describe-database.md | 44 +- docs/sql-ref-syntax-aux-describe-function.md | 84 ++-- docs/sql-ref-syntax-aux-describe-query.md | 60 +-- docs/sql-ref-syntax-aux-describe-table.md | 164 +++ docs/sql-ref-syntax-aux-show-columns.md | 42 +- docs/sql-ref-syntax-aux-show-create-table.md | 20 +- docs/sql-ref-syntax-aux-show-databases.md | 40 +- docs/sql-ref-syntax-aux-show-functions.md | 96 ++-- docs/sql-ref-syntax-aux-show-partitions.md| 60 +-- docs/sql-ref-syntax-aux-show-table.md | 178 docs/sql-ref-syntax-aux-show-tables.md| 64 +-- docs/sql-ref-syntax-aux-show-tblproperties.md | 48 +- docs/sql-ref-syntax-aux-show-views.md | 68 +-- docs/sql-ref-syntax-ddl-alter-database.md | 16 +- docs/sql-ref-syntax-ddl-alter-table.md| 252 +-- docs/sql-ref-syntax-ddl-alter-view.md | 112 ++--- docs/sql-ref-syntax-ddl-create-database.md| 16 +- docs/sql-ref-syntax-ddl-create-function.md| 46 +- docs/sql-ref-syntax-ddl-drop-function.md | 32 +- docs/sql-ref-syntax-ddl-repair-table.md | 18 +- docs/sql-ref-syntax-ddl-truncate-table.md | 32 +- docs/sql-ref-syntax-dml-insert-into.md| 164 +++ docs/sql-ref-syntax-dml-insert-overwrite-table.md | 124 +++--- docs/sql-ref-syntax-dml-load.md | 44 +- docs/sql-ref-syntax-qry-aggregation.md| 22 - docs/sql-ref-syntax-qry-explain.md| 100 ++--- docs/sql-ref-syntax-qry-sampling.md | 82 ++-- docs/sql-ref-syntax-qry-select-clusterby.md | 40 +- docs/sql-ref-syntax-qry-select-cte.md | 60 +-- docs/sql-ref-syntax-qry-select-distinct.md| 22 - docs/sql-ref-syntax-qry-select-distribute-by.md | 40 +- docs/sql-ref-syntax-qry-select-groupby.md | 216 - docs/sql-ref-syntax-qry-select-having.md | 68 +-- docs/sql-ref-syntax-qry-select-inline-table.md| 36 +- docs/sql-ref-syntax-qry-select-join.md| 175 docs/sql-ref-syntax-qry-select-limit.md | 50 +-- docs/sql-ref-syntax-qry-select-orderby.md | 90 ++-- docs/sql-ref-syntax-qry-select-setops.md | 190 docs/sql-ref-syntax-qry-select-sortby.md | 132 +++--- docs/sql-ref-syntax-qry-select-tvf.md | 68 +-- docs/sql-ref-syntax-qry-select-where.md | 82 ++-- docs/sql-ref-syntax-qry-window.md | 168 +++ 47 files changed, 2076 insertions(+), 2121 deletions(-) delete mode 100644 docs/sql-ref-syntax-qry-aggregation.md delete mode 100644 docs/sql-ref-syntax-qry-select-distinct.md - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (348fd53 -> 75da050)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 348fd53 [SPARK-31307][ML][EXAMPLES] Add examples for ml.fvalue add 75da050 [MINOR][SQL][DOCS] Remove two leading spaces from sql tables No new revisions were added by this update. Summary of changes: docs/sql-ref-ansi-compliance.md | 40 +- docs/sql-ref-functions-udf-hive.md| 82 ++-- docs/sql-ref-null-semantics.md| 512 +++--- docs/sql-ref-syntax-aux-analyze-table.md | 88 ++-- docs/sql-ref-syntax-aux-conf-mgmt-set.md | 10 +- docs/sql-ref-syntax-aux-describe-database.md | 44 +- docs/sql-ref-syntax-aux-describe-function.md | 84 ++-- docs/sql-ref-syntax-aux-describe-query.md | 60 +-- docs/sql-ref-syntax-aux-describe-table.md | 164 +++ docs/sql-ref-syntax-aux-show-columns.md | 42 +- docs/sql-ref-syntax-aux-show-create-table.md | 20 +- docs/sql-ref-syntax-aux-show-databases.md | 40 +- docs/sql-ref-syntax-aux-show-functions.md | 96 ++-- docs/sql-ref-syntax-aux-show-partitions.md| 60 +-- docs/sql-ref-syntax-aux-show-table.md | 178 docs/sql-ref-syntax-aux-show-tables.md| 64 +-- docs/sql-ref-syntax-aux-show-tblproperties.md | 48 +- docs/sql-ref-syntax-aux-show-views.md | 68 +-- docs/sql-ref-syntax-ddl-alter-database.md | 16 +- docs/sql-ref-syntax-ddl-alter-table.md| 252 +-- docs/sql-ref-syntax-ddl-alter-view.md | 112 ++--- docs/sql-ref-syntax-ddl-create-database.md| 16 +- docs/sql-ref-syntax-ddl-create-function.md| 46 +- docs/sql-ref-syntax-ddl-drop-function.md | 32 +- docs/sql-ref-syntax-ddl-repair-table.md | 18 +- docs/sql-ref-syntax-ddl-truncate-table.md | 32 +- docs/sql-ref-syntax-dml-insert-into.md| 164 +++ docs/sql-ref-syntax-dml-insert-overwrite-table.md | 124 +++--- docs/sql-ref-syntax-dml-load.md | 44 +- docs/sql-ref-syntax-qry-aggregation.md| 22 - docs/sql-ref-syntax-qry-explain.md| 100 ++--- docs/sql-ref-syntax-qry-sampling.md | 82 ++-- docs/sql-ref-syntax-qry-select-clusterby.md | 40 +- docs/sql-ref-syntax-qry-select-cte.md | 60 +-- docs/sql-ref-syntax-qry-select-distinct.md| 22 - docs/sql-ref-syntax-qry-select-distribute-by.md | 40 +- docs/sql-ref-syntax-qry-select-groupby.md | 216 - docs/sql-ref-syntax-qry-select-having.md | 68 +-- docs/sql-ref-syntax-qry-select-inline-table.md| 36 +- docs/sql-ref-syntax-qry-select-join.md| 175 docs/sql-ref-syntax-qry-select-limit.md | 50 +-- docs/sql-ref-syntax-qry-select-orderby.md | 90 ++-- docs/sql-ref-syntax-qry-select-setops.md | 190 docs/sql-ref-syntax-qry-select-sortby.md | 132 +++--- docs/sql-ref-syntax-qry-select-tvf.md | 68 +-- docs/sql-ref-syntax-qry-select-where.md | 82 ++-- docs/sql-ref-syntax-qry-window.md | 168 +++ 47 files changed, 2076 insertions(+), 2121 deletions(-) delete mode 100644 docs/sql-ref-syntax-qry-aggregation.md delete mode 100644 docs/sql-ref-syntax-qry-select-distinct.md - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-28806][DOCS][FOLLOW-UP] Remove unneeded HTML from the MD file
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new f5e018e [SPARK-28806][DOCS][FOLLOW-UP] Remove unneeded HTML from the MD file f5e018e is described below commit f5e018edc71fd5ddf5ce4f82d02ac777bc3d7280 Author: Xiao Li AuthorDate: Thu Apr 30 09:34:56 2020 -0700 [SPARK-28806][DOCS][FOLLOW-UP] Remove unneeded HTML from the MD file ### What changes were proposed in this pull request? This PR is to clean up the markdown file in SHOW COLUMNS page. - remove the unneeded embedded inline HTML markup by using the basic markdown syntax. - use the ``` sql for highlighting the SQL syntax. ### Why are the changes needed? Make the doc cleaner and easily editable by MD editors. ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? **Before** ![Screen Shot 2020-04-29 at 5 20 11 PM](https://user-images.githubusercontent.com/11567269/80661963-fa4d4a80-8a44-11ea-9dea-c43cda6de010.png) **After** ![Screen Shot 2020-04-29 at 6 03 50 PM](https://user-images.githubusercontent.com/11567269/80661940-f15c7900-8a44-11ea-9943-a83e8d8618fb.png) Closes #28414 from gatorsmile/cleanupShowColumns. Lead-authored-by: Xiao Li Co-authored-by: gatorsmile Signed-off-by: gatorsmile (cherry picked from commit b5ecc41c73018bbc742186d2e752101a99cfe852) Signed-off-by: gatorsmile --- docs/sql-ref-syntax-aux-show-columns.md | 51 ++--- 1 file changed, 22 insertions(+), 29 deletions(-) diff --git a/docs/sql-ref-syntax-aux-show-columns.md b/docs/sql-ref-syntax-aux-show-columns.md index 8f73aac..c8c90a9 100644 --- a/docs/sql-ref-syntax-aux-show-columns.md +++ b/docs/sql-ref-syntax-aux-show-columns.md @@ -25,41 +25,34 @@ Return the list of columns in a table. If the table does not exist, an exception ### Syntax -{% highlight sql %} +```sql SHOW COLUMNS table_identifier [ database ] -{% endhighlight %} +``` ### Parameters - - table_identifier - +* **table_identifier** + Specifies the table name of an existing table. The table may be optionally qualified -with a database name. -Syntax: - -{ IN | FROM } [ database_name . ] table_name - -Note: -Keywords IN and FROM are interchangeable. - - database - +with a database name. + +**Syntax:** `{ IN | FROM } [ database_name . ] table_name` + +**Note:** Keywords `IN` and `FROM` are interchangeable. + +* **database** + Specifies an optional database name. The table is resolved from this database when it -is specified. Please note that when this parameter is specified then table -name should not be qualified with a different database name. -Syntax: - -{ IN | FROM } database_name - -Note: -Keywords IN and FROM are interchangeable. - - +is specified. When this parameter is specified then table +name should not be qualified with a different database name. + +**Syntax:** `{ IN | FROM } database_name` + +**Note:** Keywords `IN` and `FROM` are interchangeable. ### Examples -{% highlight sql %} +```sql -- Create `customer` table in `salesdb` database; USE salesdb; CREATE TABLE customer( @@ -96,9 +89,9 @@ SHOW COLUMNS IN customer IN salesdb; | name| |cust_addr| +-+ -{% endhighlight %} +``` ### Related Statements - * [DESCRIBE TABLE](sql-ref-syntax-aux-describe-table.html) - * [SHOW TABLE](sql-ref-syntax-aux-show-table.html) +* [DESCRIBE TABLE](sql-ref-syntax-aux-describe-table.html) +* [SHOW TABLE](sql-ref-syntax-aux-show-table.html) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-28806][DOCS][FOLLOW-UP] Remove unneeded HTML from the MD file
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new f5e018e [SPARK-28806][DOCS][FOLLOW-UP] Remove unneeded HTML from the MD file f5e018e is described below commit f5e018edc71fd5ddf5ce4f82d02ac777bc3d7280 Author: Xiao Li AuthorDate: Thu Apr 30 09:34:56 2020 -0700 [SPARK-28806][DOCS][FOLLOW-UP] Remove unneeded HTML from the MD file ### What changes were proposed in this pull request? This PR is to clean up the markdown file in SHOW COLUMNS page. - remove the unneeded embedded inline HTML markup by using the basic markdown syntax. - use the ``` sql for highlighting the SQL syntax. ### Why are the changes needed? Make the doc cleaner and easily editable by MD editors. ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? **Before** ![Screen Shot 2020-04-29 at 5 20 11 PM](https://user-images.githubusercontent.com/11567269/80661963-fa4d4a80-8a44-11ea-9dea-c43cda6de010.png) **After** ![Screen Shot 2020-04-29 at 6 03 50 PM](https://user-images.githubusercontent.com/11567269/80661940-f15c7900-8a44-11ea-9943-a83e8d8618fb.png) Closes #28414 from gatorsmile/cleanupShowColumns. Lead-authored-by: Xiao Li Co-authored-by: gatorsmile Signed-off-by: gatorsmile (cherry picked from commit b5ecc41c73018bbc742186d2e752101a99cfe852) Signed-off-by: gatorsmile --- docs/sql-ref-syntax-aux-show-columns.md | 51 ++--- 1 file changed, 22 insertions(+), 29 deletions(-) diff --git a/docs/sql-ref-syntax-aux-show-columns.md b/docs/sql-ref-syntax-aux-show-columns.md index 8f73aac..c8c90a9 100644 --- a/docs/sql-ref-syntax-aux-show-columns.md +++ b/docs/sql-ref-syntax-aux-show-columns.md @@ -25,41 +25,34 @@ Return the list of columns in a table. If the table does not exist, an exception ### Syntax -{% highlight sql %} +```sql SHOW COLUMNS table_identifier [ database ] -{% endhighlight %} +``` ### Parameters - - table_identifier - +* **table_identifier** + Specifies the table name of an existing table. The table may be optionally qualified -with a database name. -Syntax: - -{ IN | FROM } [ database_name . ] table_name - -Note: -Keywords IN and FROM are interchangeable. - - database - +with a database name. + +**Syntax:** `{ IN | FROM } [ database_name . ] table_name` + +**Note:** Keywords `IN` and `FROM` are interchangeable. + +* **database** + Specifies an optional database name. The table is resolved from this database when it -is specified. Please note that when this parameter is specified then table -name should not be qualified with a different database name. -Syntax: - -{ IN | FROM } database_name - -Note: -Keywords IN and FROM are interchangeable. - - +is specified. When this parameter is specified then table +name should not be qualified with a different database name. + +**Syntax:** `{ IN | FROM } database_name` + +**Note:** Keywords `IN` and `FROM` are interchangeable. ### Examples -{% highlight sql %} +```sql -- Create `customer` table in `salesdb` database; USE salesdb; CREATE TABLE customer( @@ -96,9 +89,9 @@ SHOW COLUMNS IN customer IN salesdb; | name| |cust_addr| +-+ -{% endhighlight %} +``` ### Related Statements - * [DESCRIBE TABLE](sql-ref-syntax-aux-describe-table.html) - * [SHOW TABLE](sql-ref-syntax-aux-show-table.html) +* [DESCRIBE TABLE](sql-ref-syntax-aux-describe-table.html) +* [SHOW TABLE](sql-ref-syntax-aux-show-table.html) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c09cfb9 -> b5ecc41)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c09cfb9 [SPARK-31557][SQL] Fix timestamps rebasing in legacy parsers add b5ecc41 [SPARK-28806][DOCS][FOLLOW-UP] Remove unneeded HTML from the MD file No new revisions were added by this update. Summary of changes: docs/sql-ref-syntax-aux-show-columns.md | 51 ++--- 1 file changed, 22 insertions(+), 29 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-28806][DOCS][FOLLOW-UP] Remove unneeded HTML from the MD file
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new f5e018e [SPARK-28806][DOCS][FOLLOW-UP] Remove unneeded HTML from the MD file f5e018e is described below commit f5e018edc71fd5ddf5ce4f82d02ac777bc3d7280 Author: Xiao Li AuthorDate: Thu Apr 30 09:34:56 2020 -0700 [SPARK-28806][DOCS][FOLLOW-UP] Remove unneeded HTML from the MD file ### What changes were proposed in this pull request? This PR is to clean up the markdown file in SHOW COLUMNS page. - remove the unneeded embedded inline HTML markup by using the basic markdown syntax. - use the ``` sql for highlighting the SQL syntax. ### Why are the changes needed? Make the doc cleaner and easily editable by MD editors. ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? **Before** ![Screen Shot 2020-04-29 at 5 20 11 PM](https://user-images.githubusercontent.com/11567269/80661963-fa4d4a80-8a44-11ea-9dea-c43cda6de010.png) **After** ![Screen Shot 2020-04-29 at 6 03 50 PM](https://user-images.githubusercontent.com/11567269/80661940-f15c7900-8a44-11ea-9943-a83e8d8618fb.png) Closes #28414 from gatorsmile/cleanupShowColumns. Lead-authored-by: Xiao Li Co-authored-by: gatorsmile Signed-off-by: gatorsmile (cherry picked from commit b5ecc41c73018bbc742186d2e752101a99cfe852) Signed-off-by: gatorsmile --- docs/sql-ref-syntax-aux-show-columns.md | 51 ++--- 1 file changed, 22 insertions(+), 29 deletions(-) diff --git a/docs/sql-ref-syntax-aux-show-columns.md b/docs/sql-ref-syntax-aux-show-columns.md index 8f73aac..c8c90a9 100644 --- a/docs/sql-ref-syntax-aux-show-columns.md +++ b/docs/sql-ref-syntax-aux-show-columns.md @@ -25,41 +25,34 @@ Return the list of columns in a table. If the table does not exist, an exception ### Syntax -{% highlight sql %} +```sql SHOW COLUMNS table_identifier [ database ] -{% endhighlight %} +``` ### Parameters - - table_identifier - +* **table_identifier** + Specifies the table name of an existing table. The table may be optionally qualified -with a database name. -Syntax: - -{ IN | FROM } [ database_name . ] table_name - -Note: -Keywords IN and FROM are interchangeable. - - database - +with a database name. + +**Syntax:** `{ IN | FROM } [ database_name . ] table_name` + +**Note:** Keywords `IN` and `FROM` are interchangeable. + +* **database** + Specifies an optional database name. The table is resolved from this database when it -is specified. Please note that when this parameter is specified then table -name should not be qualified with a different database name. -Syntax: - -{ IN | FROM } database_name - -Note: -Keywords IN and FROM are interchangeable. - - +is specified. When this parameter is specified then table +name should not be qualified with a different database name. + +**Syntax:** `{ IN | FROM } database_name` + +**Note:** Keywords `IN` and `FROM` are interchangeable. ### Examples -{% highlight sql %} +```sql -- Create `customer` table in `salesdb` database; USE salesdb; CREATE TABLE customer( @@ -96,9 +89,9 @@ SHOW COLUMNS IN customer IN salesdb; | name| |cust_addr| +-+ -{% endhighlight %} +``` ### Related Statements - * [DESCRIBE TABLE](sql-ref-syntax-aux-describe-table.html) - * [SHOW TABLE](sql-ref-syntax-aux-show-table.html) +* [DESCRIBE TABLE](sql-ref-syntax-aux-describe-table.html) +* [SHOW TABLE](sql-ref-syntax-aux-show-table.html) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c09cfb9 -> b5ecc41)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c09cfb9 [SPARK-31557][SQL] Fix timestamps rebasing in legacy parsers add b5ecc41 [SPARK-28806][DOCS][FOLLOW-UP] Remove unneeded HTML from the MD file No new revisions were added by this update. Summary of changes: docs/sql-ref-syntax-aux-show-columns.md | 51 ++--- 1 file changed, 22 insertions(+), 29 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31234][SQL][FOLLOW-UP] ResetCommand should not affect static SQL Configuration
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 1701f78 [SPARK-31234][SQL][FOLLOW-UP] ResetCommand should not affect static SQL Configuration 1701f78 is described below commit 1701f7882aac9e3efaa36c628815edfad09b62fa Author: gatorsmile AuthorDate: Mon Apr 20 13:08:55 2020 -0700 [SPARK-31234][SQL][FOLLOW-UP] ResetCommand should not affect static SQL Configuration ### What changes were proposed in this pull request? This PR is the follow-up PR of https://github.com/apache/spark/pull/28003 - add a migration guide - add an end-to-end test case. ### Why are the changes needed? The original PR made the major behavior change in the user-facing RESET command. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Added a new end-to-end test Closes #28265 from gatorsmile/spark-31234followup. Authored-by: gatorsmile Signed-off-by: gatorsmile (cherry picked from commit 6c792a79c10e7b01bd040ef14c848a2a2378e28c) Signed-off-by: gatorsmile --- docs/core-migration-guide.md | 2 +- docs/sql-migration-guide.md | 4 .../org/apache/spark/sql/internal/StaticSQLConf.scala| 3 +++ .../org/apache/spark/sql/internal/SharedState.scala | 3 --- .../org/apache/spark/sql/SparkSessionBuilderSuite.scala | 16 .../org/apache/spark/sql/internal/SQLConfSuite.scala | 2 +- 6 files changed, 25 insertions(+), 5 deletions(-) diff --git a/docs/core-migration-guide.md b/docs/core-migration-guide.md index cde6e07..33406d0 100644 --- a/docs/core-migration-guide.md +++ b/docs/core-migration-guide.md @@ -25,7 +25,7 @@ license: | ## Upgrading from Core 2.4 to 3.0 - The `org.apache.spark.ExecutorPlugin` interface and related configuration has been replaced with - `org.apache.spark.plugin.SparkPlugin`, which adds new functionality. Plugins using the old + `org.apache.spark.api.plugin.SparkPlugin`, which adds new functionality. Plugins using the old interface must be modified to extend the new interfaces. Check the [Monitoring](monitoring.html) guide for more details. diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index f5c81e9..8945c13 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -210,6 +210,10 @@ license: | * The decimal string representation can be different between Hive 1.2 and Hive 2.3 when using `TRANSFORM` operator in SQL for script transformation, which depends on hive's behavior. In Hive 1.2, the string representation omits trailing zeroes. But in Hive 2.3, it is always padded to 18 digits with trailing zeroes if necessary. +## Upgrading from Spark SQL 2.4.5 to 2.4.6 + + - In Spark 2.4.6, the `RESET` command does not reset the static SQL configuration values to the default. It only clears the runtime SQL configuration values. + ## Upgrading from Spark SQL 2.4.4 to 2.4.5 - Since Spark 2.4.5, `TRUNCATE TABLE` command tries to set back original permission and ACLs during re-creating the table/partition paths. To restore the behaviour of earlier versions, set `spark.sql.truncateTable.ignorePermissionAcl.enabled` to `true`. diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala index d202528..9618ff6 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala @@ -47,6 +47,9 @@ object StaticSQLConf { .internal() .version("2.1.0") .stringConf +// System preserved database should not exists in metastore. However it's hard to guarantee it +// for every session, because case-sensitivity differs. Here we always lowercase it to make our +// life easier. .transform(_.toLowerCase(Locale.ROOT)) .createWithDefault("global_temp") diff --git a/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala b/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala index 14b8ea6..47119ab 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala @@ -153,9 +153,6 @@ private[sql] class SharedState( * A manager for global temporary views. */ lazy val globalTempViewManager: GlobalTempViewManager = { -// System preserved database should not exists in metastore. However it's hard to guarantee it -// for every session, because case-sensitivity differs. Here we always lowercase
[spark] branch master updated (44d370d -> 6c792a7)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 44d370d [SPARK-31475][SQL] Broadcast stage in AQE did not timeout add 6c792a7 [SPARK-31234][SQL][FOLLOW-UP] ResetCommand should not affect static SQL Configuration No new revisions were added by this update. Summary of changes: docs/core-migration-guide.md | 2 +- docs/sql-migration-guide.md | 4 .../org/apache/spark/sql/internal/StaticSQLConf.scala| 3 +++ .../org/apache/spark/sql/internal/SharedState.scala | 3 --- .../org/apache/spark/sql/SparkSessionBuilderSuite.scala | 16 .../org/apache/spark/sql/internal/SQLConfSuite.scala | 2 +- 6 files changed, 25 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31475][SQL] Broadcast stage in AQE did not timeout
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 2e32160 [SPARK-31475][SQL] Broadcast stage in AQE did not timeout 2e32160 is described below commit 2e3216012e8ad85d4cd88671493dd6e4d0e6a668 Author: Maryann Xue AuthorDate: Mon Apr 20 11:55:48 2020 -0700 [SPARK-31475][SQL] Broadcast stage in AQE did not timeout ### What changes were proposed in this pull request? This PR adds a timeout for the Future of a BroadcastQueryStageExec to make sure it can have the same timeout behavior as a non-AQE broadcast exchange. ### Why are the changes needed? This is to make the broadcast timeout behavior in AQE consistent with that in non-AQE. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Added UT. Closes #28250 from maryannxue/aqe-broadcast-timeout. Authored-by: Maryann Xue Signed-off-by: gatorsmile (cherry picked from commit 44d370dd4501f0a4abb7194f7cff0d346aac0992) Signed-off-by: gatorsmile --- .../execution/adaptive/AdaptiveSparkPlanExec.scala | 2 +- .../sql/execution/adaptive/QueryStageExec.scala| 35 ++ .../execution/exchange/BroadcastExchangeExec.scala | 8 ++--- .../sql/execution/joins/BroadcastJoinSuite.scala | 23 -- 4 files changed, 56 insertions(+), 12 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala index 2b46724..0ec8b5f 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala @@ -546,7 +546,7 @@ case class AdaptiveSparkPlanExec( } object AdaptiveSparkPlanExec { - private val executionContext = ExecutionContext.fromExecutorService( + private[adaptive] val executionContext = ExecutionContext.fromExecutorService( ThreadUtils.newDaemonCachedThreadPool("QueryStageCreator", 16)) /** diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala index beaa972..f414f85 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala @@ -17,9 +17,11 @@ package org.apache.spark.sql.execution.adaptive -import scala.concurrent.Future +import java.util.concurrent.TimeUnit -import org.apache.spark.{FutureAction, MapOutputStatistics} +import scala.concurrent.{Future, Promise} + +import org.apache.spark.{FutureAction, MapOutputStatistics, SparkException} import org.apache.spark.broadcast.Broadcast import org.apache.spark.rdd.RDD import org.apache.spark.sql.catalyst.InternalRow @@ -28,6 +30,8 @@ import org.apache.spark.sql.catalyst.plans.logical.Statistics import org.apache.spark.sql.catalyst.plans.physical.Partitioning import org.apache.spark.sql.execution._ import org.apache.spark.sql.execution.exchange._ +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.util.ThreadUtils /** * A query stage is an independent subgraph of the query plan. Query stage materializes its output @@ -100,8 +104,8 @@ abstract class QueryStageExec extends LeafExecNode { override def executeTail(n: Int): Array[InternalRow] = plan.executeTail(n) override def executeToIterator(): Iterator[InternalRow] = plan.executeToIterator() - override def doPrepare(): Unit = plan.prepare() - override def doExecute(): RDD[InternalRow] = plan.execute() + protected override def doPrepare(): Unit = plan.prepare() + protected override def doExecute(): RDD[InternalRow] = plan.execute() override def doExecuteBroadcast[T](): Broadcast[T] = plan.executeBroadcast() override def doCanonicalize(): SparkPlan = plan.canonicalized @@ -187,8 +191,24 @@ case class BroadcastQueryStageExec( throw new IllegalStateException("wrong plan for broadcast stage:\n " + plan.treeString) } + @transient private lazy val materializeWithTimeout = { +val broadcastFuture = broadcast.completionFuture +val timeout = SQLConf.get.broadcastTimeout +val promise = Promise[Any]() +val fail = BroadcastQueryStageExec.scheduledExecutor.schedule(new Runnable() { + override def run(): Unit = { +promise.tryFailure(new SparkException(s"Could not execute broadcast in $timeout secs. " + + s"You can increase the timeout for broadcasts via ${SQLConf.BROADCAST_TIMEOUT.key} or " + +
[spark] branch branch-3.0 updated: [SPARK-31475][SQL] Broadcast stage in AQE did not timeout
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 2e32160 [SPARK-31475][SQL] Broadcast stage in AQE did not timeout 2e32160 is described below commit 2e3216012e8ad85d4cd88671493dd6e4d0e6a668 Author: Maryann Xue AuthorDate: Mon Apr 20 11:55:48 2020 -0700 [SPARK-31475][SQL] Broadcast stage in AQE did not timeout ### What changes were proposed in this pull request? This PR adds a timeout for the Future of a BroadcastQueryStageExec to make sure it can have the same timeout behavior as a non-AQE broadcast exchange. ### Why are the changes needed? This is to make the broadcast timeout behavior in AQE consistent with that in non-AQE. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Added UT. Closes #28250 from maryannxue/aqe-broadcast-timeout. Authored-by: Maryann Xue Signed-off-by: gatorsmile (cherry picked from commit 44d370dd4501f0a4abb7194f7cff0d346aac0992) Signed-off-by: gatorsmile --- .../execution/adaptive/AdaptiveSparkPlanExec.scala | 2 +- .../sql/execution/adaptive/QueryStageExec.scala| 35 ++ .../execution/exchange/BroadcastExchangeExec.scala | 8 ++--- .../sql/execution/joins/BroadcastJoinSuite.scala | 23 -- 4 files changed, 56 insertions(+), 12 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala index 2b46724..0ec8b5f 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala @@ -546,7 +546,7 @@ case class AdaptiveSparkPlanExec( } object AdaptiveSparkPlanExec { - private val executionContext = ExecutionContext.fromExecutorService( + private[adaptive] val executionContext = ExecutionContext.fromExecutorService( ThreadUtils.newDaemonCachedThreadPool("QueryStageCreator", 16)) /** diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala index beaa972..f414f85 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala @@ -17,9 +17,11 @@ package org.apache.spark.sql.execution.adaptive -import scala.concurrent.Future +import java.util.concurrent.TimeUnit -import org.apache.spark.{FutureAction, MapOutputStatistics} +import scala.concurrent.{Future, Promise} + +import org.apache.spark.{FutureAction, MapOutputStatistics, SparkException} import org.apache.spark.broadcast.Broadcast import org.apache.spark.rdd.RDD import org.apache.spark.sql.catalyst.InternalRow @@ -28,6 +30,8 @@ import org.apache.spark.sql.catalyst.plans.logical.Statistics import org.apache.spark.sql.catalyst.plans.physical.Partitioning import org.apache.spark.sql.execution._ import org.apache.spark.sql.execution.exchange._ +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.util.ThreadUtils /** * A query stage is an independent subgraph of the query plan. Query stage materializes its output @@ -100,8 +104,8 @@ abstract class QueryStageExec extends LeafExecNode { override def executeTail(n: Int): Array[InternalRow] = plan.executeTail(n) override def executeToIterator(): Iterator[InternalRow] = plan.executeToIterator() - override def doPrepare(): Unit = plan.prepare() - override def doExecute(): RDD[InternalRow] = plan.execute() + protected override def doPrepare(): Unit = plan.prepare() + protected override def doExecute(): RDD[InternalRow] = plan.execute() override def doExecuteBroadcast[T](): Broadcast[T] = plan.executeBroadcast() override def doCanonicalize(): SparkPlan = plan.canonicalized @@ -187,8 +191,24 @@ case class BroadcastQueryStageExec( throw new IllegalStateException("wrong plan for broadcast stage:\n " + plan.treeString) } + @transient private lazy val materializeWithTimeout = { +val broadcastFuture = broadcast.completionFuture +val timeout = SQLConf.get.broadcastTimeout +val promise = Promise[Any]() +val fail = BroadcastQueryStageExec.scheduledExecutor.schedule(new Runnable() { + override def run(): Unit = { +promise.tryFailure(new SparkException(s"Could not execute broadcast in $timeout secs. " + + s"You can increase the timeout for broadcasts via ${SQLConf.BROADCAST_TIMEOUT.key} or " + +
[spark] branch master updated: [SPARK-31475][SQL] Broadcast stage in AQE did not timeout
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 44d370d [SPARK-31475][SQL] Broadcast stage in AQE did not timeout 44d370d is described below commit 44d370dd4501f0a4abb7194f7cff0d346aac0992 Author: Maryann Xue AuthorDate: Mon Apr 20 11:55:48 2020 -0700 [SPARK-31475][SQL] Broadcast stage in AQE did not timeout ### What changes were proposed in this pull request? This PR adds a timeout for the Future of a BroadcastQueryStageExec to make sure it can have the same timeout behavior as a non-AQE broadcast exchange. ### Why are the changes needed? This is to make the broadcast timeout behavior in AQE consistent with that in non-AQE. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Added UT. Closes #28250 from maryannxue/aqe-broadcast-timeout. Authored-by: Maryann Xue Signed-off-by: gatorsmile --- .../execution/adaptive/AdaptiveSparkPlanExec.scala | 2 +- .../sql/execution/adaptive/QueryStageExec.scala| 35 ++ .../execution/exchange/BroadcastExchangeExec.scala | 8 ++--- .../sql/execution/joins/BroadcastJoinSuite.scala | 23 -- 4 files changed, 56 insertions(+), 12 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala index 3ac4ea5..f819937 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala @@ -547,7 +547,7 @@ case class AdaptiveSparkPlanExec( } object AdaptiveSparkPlanExec { - private val executionContext = ExecutionContext.fromExecutorService( + private[adaptive] val executionContext = ExecutionContext.fromExecutorService( ThreadUtils.newDaemonCachedThreadPool("QueryStageCreator", 16)) /** diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala index beaa972..f414f85 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala @@ -17,9 +17,11 @@ package org.apache.spark.sql.execution.adaptive -import scala.concurrent.Future +import java.util.concurrent.TimeUnit -import org.apache.spark.{FutureAction, MapOutputStatistics} +import scala.concurrent.{Future, Promise} + +import org.apache.spark.{FutureAction, MapOutputStatistics, SparkException} import org.apache.spark.broadcast.Broadcast import org.apache.spark.rdd.RDD import org.apache.spark.sql.catalyst.InternalRow @@ -28,6 +30,8 @@ import org.apache.spark.sql.catalyst.plans.logical.Statistics import org.apache.spark.sql.catalyst.plans.physical.Partitioning import org.apache.spark.sql.execution._ import org.apache.spark.sql.execution.exchange._ +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.util.ThreadUtils /** * A query stage is an independent subgraph of the query plan. Query stage materializes its output @@ -100,8 +104,8 @@ abstract class QueryStageExec extends LeafExecNode { override def executeTail(n: Int): Array[InternalRow] = plan.executeTail(n) override def executeToIterator(): Iterator[InternalRow] = plan.executeToIterator() - override def doPrepare(): Unit = plan.prepare() - override def doExecute(): RDD[InternalRow] = plan.execute() + protected override def doPrepare(): Unit = plan.prepare() + protected override def doExecute(): RDD[InternalRow] = plan.execute() override def doExecuteBroadcast[T](): Broadcast[T] = plan.executeBroadcast() override def doCanonicalize(): SparkPlan = plan.canonicalized @@ -187,8 +191,24 @@ case class BroadcastQueryStageExec( throw new IllegalStateException("wrong plan for broadcast stage:\n " + plan.treeString) } + @transient private lazy val materializeWithTimeout = { +val broadcastFuture = broadcast.completionFuture +val timeout = SQLConf.get.broadcastTimeout +val promise = Promise[Any]() +val fail = BroadcastQueryStageExec.scheduledExecutor.schedule(new Runnable() { + override def run(): Unit = { +promise.tryFailure(new SparkException(s"Could not execute broadcast in $timeout secs. " + + s"You can increase the timeout for broadcasts via ${SQLConf.BROADCAST_TIMEOUT.key} or " + + s"disable broadcast join by setting ${SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key} to -1&
[spark] branch master updated: [SPARK-31475][SQL] Broadcast stage in AQE did not timeout
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 44d370d [SPARK-31475][SQL] Broadcast stage in AQE did not timeout 44d370d is described below commit 44d370dd4501f0a4abb7194f7cff0d346aac0992 Author: Maryann Xue AuthorDate: Mon Apr 20 11:55:48 2020 -0700 [SPARK-31475][SQL] Broadcast stage in AQE did not timeout ### What changes were proposed in this pull request? This PR adds a timeout for the Future of a BroadcastQueryStageExec to make sure it can have the same timeout behavior as a non-AQE broadcast exchange. ### Why are the changes needed? This is to make the broadcast timeout behavior in AQE consistent with that in non-AQE. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Added UT. Closes #28250 from maryannxue/aqe-broadcast-timeout. Authored-by: Maryann Xue Signed-off-by: gatorsmile --- .../execution/adaptive/AdaptiveSparkPlanExec.scala | 2 +- .../sql/execution/adaptive/QueryStageExec.scala| 35 ++ .../execution/exchange/BroadcastExchangeExec.scala | 8 ++--- .../sql/execution/joins/BroadcastJoinSuite.scala | 23 -- 4 files changed, 56 insertions(+), 12 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala index 3ac4ea5..f819937 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala @@ -547,7 +547,7 @@ case class AdaptiveSparkPlanExec( } object AdaptiveSparkPlanExec { - private val executionContext = ExecutionContext.fromExecutorService( + private[adaptive] val executionContext = ExecutionContext.fromExecutorService( ThreadUtils.newDaemonCachedThreadPool("QueryStageCreator", 16)) /** diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala index beaa972..f414f85 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala @@ -17,9 +17,11 @@ package org.apache.spark.sql.execution.adaptive -import scala.concurrent.Future +import java.util.concurrent.TimeUnit -import org.apache.spark.{FutureAction, MapOutputStatistics} +import scala.concurrent.{Future, Promise} + +import org.apache.spark.{FutureAction, MapOutputStatistics, SparkException} import org.apache.spark.broadcast.Broadcast import org.apache.spark.rdd.RDD import org.apache.spark.sql.catalyst.InternalRow @@ -28,6 +30,8 @@ import org.apache.spark.sql.catalyst.plans.logical.Statistics import org.apache.spark.sql.catalyst.plans.physical.Partitioning import org.apache.spark.sql.execution._ import org.apache.spark.sql.execution.exchange._ +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.util.ThreadUtils /** * A query stage is an independent subgraph of the query plan. Query stage materializes its output @@ -100,8 +104,8 @@ abstract class QueryStageExec extends LeafExecNode { override def executeTail(n: Int): Array[InternalRow] = plan.executeTail(n) override def executeToIterator(): Iterator[InternalRow] = plan.executeToIterator() - override def doPrepare(): Unit = plan.prepare() - override def doExecute(): RDD[InternalRow] = plan.execute() + protected override def doPrepare(): Unit = plan.prepare() + protected override def doExecute(): RDD[InternalRow] = plan.execute() override def doExecuteBroadcast[T](): Broadcast[T] = plan.executeBroadcast() override def doCanonicalize(): SparkPlan = plan.canonicalized @@ -187,8 +191,24 @@ case class BroadcastQueryStageExec( throw new IllegalStateException("wrong plan for broadcast stage:\n " + plan.treeString) } + @transient private lazy val materializeWithTimeout = { +val broadcastFuture = broadcast.completionFuture +val timeout = SQLConf.get.broadcastTimeout +val promise = Promise[Any]() +val fail = BroadcastQueryStageExec.scheduledExecutor.schedule(new Runnable() { + override def run(): Unit = { +promise.tryFailure(new SparkException(s"Could not execute broadcast in $timeout secs. " + + s"You can increase the timeout for broadcasts via ${SQLConf.BROADCAST_TIMEOUT.key} or " + + s"disable broadcast join by setting ${SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key} to -1&
[spark] branch master updated (55dea9b -> 2c39502)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 55dea9b [SPARK-29153][CORE] Add ability to merge resource profiles within a stage with Stage Level Scheduling add 2c39502 [SPARK-31253][SQL][FOLLOWUP] Add metrics to AQE shuffle reader No new revisions were added by this update. Summary of changes: .../adaptive/CustomShuffleReaderExec.scala | 27 ++ .../execution/adaptive/OptimizeSkewedJoin.scala| 14 ++- 2 files changed, 20 insertions(+), 21 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (55dea9b -> 2c39502)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 55dea9b [SPARK-29153][CORE] Add ability to merge resource profiles within a stage with Stage Level Scheduling add 2c39502 [SPARK-31253][SQL][FOLLOWUP] Add metrics to AQE shuffle reader No new revisions were added by this update. Summary of changes: .../adaptive/CustomShuffleReaderExec.scala | 27 ++ .../execution/adaptive/OptimizeSkewedJoin.scala| 14 ++- 2 files changed, 20 insertions(+), 21 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (590b9a0 -> 34c7ec8)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 590b9a0 [SPARK-31010][SQL][FOLLOW-UP] Add Java UDF suggestion in error message of untyped Scala UDF add 34c7ec8 [SPARK-31253][SQL] Add metrics to AQE shuffle reader No new revisions were added by this update. Summary of changes: .../spark/sql/execution/ShuffledRowRDD.scala | 16 ++- .../adaptive/CoalesceShufflePartitions.scala | 6 +- .../adaptive/CustomShuffleReaderExec.scala | 114 ++--- .../adaptive/OptimizeLocalShuffleReader.scala | 15 ++- .../execution/adaptive/OptimizeSkewedJoin.scala| 82 --- .../sql/execution/adaptive/QueryStageExec.scala| 5 + .../execution/CoalesceShufflePartitionsSuite.scala | 23 +++-- .../adaptive/AdaptiveQueryExecSuite.scala | 74 - 8 files changed, 229 insertions(+), 106 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31087] [SQL] Add Back Multiple Removed APIs
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new f375930 [SPARK-31087] [SQL] Add Back Multiple Removed APIs f375930 is described below commit f375930d81337f2facbe5da71bb126d4d935e49d Author: gatorsmile AuthorDate: Sat Mar 28 22:05:16 2020 -0700 [SPARK-31087] [SQL] Add Back Multiple Removed APIs ### What changes were proposed in this pull request? Based on the discussion in the mailing list [[Proposal] Modification to Spark's Semantic Versioning Policy](http://apache-spark-developers-list.1001551.n3.nabble.com/Proposal-Modification-to-Spark-s-Semantic-Versioning-Policy-td28938.html) , this PR is to add back the following APIs whose maintenance cost are relatively small. - functions.toDegrees/toRadians - functions.approxCountDistinct - functions.monotonicallyIncreasingId - Column.!== - Dataset.explode - Dataset.registerTempTable - SQLContext.getOrCreate, setActive, clearActive, constructors Below is the other removed APIs in the original PR, but not added back in this PR [https://issues.apache.org/jira/browse/SPARK-25908]: - Remove some AccumulableInfo .apply() methods - Remove non-label-specific multiclass precision/recall/fScore in favor of accuracy - Remove unused Python StorageLevel constants - Remove unused multiclass option in libsvm parsing - Remove references to deprecated spark configs like spark.yarn.am.port - Remove TaskContext.isRunningLocally - Remove ShuffleMetrics.shuffle* methods - Remove BaseReadWrite.context in favor of session ### Why are the changes needed? Avoid breaking the APIs that are commonly used. ### Does this PR introduce any user-facing change? Adding back the APIs that were removed in 3.0 branch does not introduce the user-facing changes, because Spark 3.0 has not been released. ### How was this patch tested? Added a new test suite for these APIs. Author: gatorsmile Author: yi.wu Closes #27821 from gatorsmile/addAPIBackV2. (cherry picked from commit 3884455780a214c620f309e00d5a083039746755) Signed-off-by: gatorsmile --- project/MimaExcludes.scala | 8 -- python/pyspark/sql/dataframe.py| 19 python/pyspark/sql/functions.py| 11 ++ .../main/scala/org/apache/spark/sql/Column.scala | 18 .../main/scala/org/apache/spark/sql/Dataset.scala | 98 ++ .../scala/org/apache/spark/sql/SQLContext.scala| 50 - .../scala/org/apache/spark/sql/functions.scala | 79 ++ .../org/apache/spark/sql/DataFrameSuite.scala | 46 + .../org/apache/spark/sql/DeprecatedAPISuite.scala | 114 + .../org/apache/spark/sql/SQLContextSuite.scala | 30 -- 10 files changed, 458 insertions(+), 15 deletions(-) diff --git a/project/MimaExcludes.scala b/project/MimaExcludes.scala index 9a5029e..d1ed48a 100644 --- a/project/MimaExcludes.scala +++ b/project/MimaExcludes.scala @@ -235,14 +235,6 @@ object MimaExcludes { ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.executor.ShuffleWriteMetrics.shuffleWriteTime"), ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.executor.ShuffleWriteMetrics.shuffleRecordsWritten"), ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.scheduler.AccumulableInfo.apply"), - ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.functions.approxCountDistinct"), - ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.functions.toRadians"), - ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.functions.toDegrees"), - ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.functions.monotonicallyIncreasingId"), - ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.SQLContext.clearActive"), - ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.SQLContext.getOrCreate"), - ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.SQLContext.setActive"), - ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.sql.SQLContext.this"), ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.mllib.evaluation.MulticlassMetrics.fMeasure"), ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.mllib.evaluation.MulticlassMetrics.recall"), ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.mllib.eva
[spark] branch master updated: [SPARK-31087] [SQL] Add Back Multiple Removed APIs
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3884455 [SPARK-31087] [SQL] Add Back Multiple Removed APIs 3884455 is described below commit 3884455780a214c620f309e00d5a083039746755 Author: gatorsmile AuthorDate: Sat Mar 28 22:05:16 2020 -0700 [SPARK-31087] [SQL] Add Back Multiple Removed APIs ### What changes were proposed in this pull request? Based on the discussion in the mailing list [[Proposal] Modification to Spark's Semantic Versioning Policy](http://apache-spark-developers-list.1001551.n3.nabble.com/Proposal-Modification-to-Spark-s-Semantic-Versioning-Policy-td28938.html) , this PR is to add back the following APIs whose maintenance cost are relatively small. - functions.toDegrees/toRadians - functions.approxCountDistinct - functions.monotonicallyIncreasingId - Column.!== - Dataset.explode - Dataset.registerTempTable - SQLContext.getOrCreate, setActive, clearActive, constructors Below is the other removed APIs in the original PR, but not added back in this PR [https://issues.apache.org/jira/browse/SPARK-25908]: - Remove some AccumulableInfo .apply() methods - Remove non-label-specific multiclass precision/recall/fScore in favor of accuracy - Remove unused Python StorageLevel constants - Remove unused multiclass option in libsvm parsing - Remove references to deprecated spark configs like spark.yarn.am.port - Remove TaskContext.isRunningLocally - Remove ShuffleMetrics.shuffle* methods - Remove BaseReadWrite.context in favor of session ### Why are the changes needed? Avoid breaking the APIs that are commonly used. ### Does this PR introduce any user-facing change? Adding back the APIs that were removed in 3.0 branch does not introduce the user-facing changes, because Spark 3.0 has not been released. ### How was this patch tested? Added a new test suite for these APIs. Author: gatorsmile Author: yi.wu Closes #27821 from gatorsmile/addAPIBackV2. --- project/MimaExcludes.scala | 8 -- python/pyspark/sql/dataframe.py| 19 python/pyspark/sql/functions.py| 11 ++ .../main/scala/org/apache/spark/sql/Column.scala | 18 .../main/scala/org/apache/spark/sql/Dataset.scala | 98 ++ .../scala/org/apache/spark/sql/SQLContext.scala| 50 - .../scala/org/apache/spark/sql/functions.scala | 79 ++ .../org/apache/spark/sql/DataFrameSuite.scala | 46 + .../org/apache/spark/sql/DeprecatedAPISuite.scala | 114 + .../org/apache/spark/sql/SQLContextSuite.scala | 30 -- 10 files changed, 458 insertions(+), 15 deletions(-) diff --git a/project/MimaExcludes.scala b/project/MimaExcludes.scala index 3f521e6..f28ae56 100644 --- a/project/MimaExcludes.scala +++ b/project/MimaExcludes.scala @@ -242,14 +242,6 @@ object MimaExcludes { ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.executor.ShuffleWriteMetrics.shuffleWriteTime"), ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.executor.ShuffleWriteMetrics.shuffleRecordsWritten"), ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.scheduler.AccumulableInfo.apply"), - ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.functions.approxCountDistinct"), - ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.functions.toRadians"), - ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.functions.toDegrees"), - ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.functions.monotonicallyIncreasingId"), - ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.SQLContext.clearActive"), - ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.SQLContext.getOrCreate"), - ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.SQLContext.setActive"), - ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.sql.SQLContext.this"), ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.mllib.evaluation.MulticlassMetrics.fMeasure"), ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.mllib.evaluation.MulticlassMetrics.recall"), ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.mllib.evaluation.MulticlassMetrics.precision"), diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/
[spark] branch master updated (b7e4cc7 -> b9eafcb)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b7e4cc7 [SPARK-31086][SQL] Add Back the Deprecated SQLContext methods add b9eafcb [SPARK-31088][SQL] Add back HiveContext and createExternalTable No new revisions were added by this update. Summary of changes: docs/sql-migration-guide.md| 4 - project/MimaExcludes.scala | 2 - python/pyspark/__init__.py | 2 +- python/pyspark/sql/__init__.py | 4 +- python/pyspark/sql/catalog.py | 20 python/pyspark/sql/context.py | 67 +- .../scala/org/apache/spark/sql/SQLContext.scala| 91 ++ .../org/apache/spark/sql/catalog/Catalog.scala | 102 +++- .../DeprecatedCreateExternalTableSuite.scala | 85 + .../org/apache/spark/sql/hive/HiveContext.scala| 63 + .../sql/hive/HiveContextCompatibilitySuite.scala | 103 + 11 files changed, 532 insertions(+), 11 deletions(-) create mode 100644 sql/core/src/test/scala/org/apache/spark/sql/internal/DeprecatedCreateExternalTableSuite.scala create mode 100644 sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala create mode 100644 sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveContextCompatibilitySuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31088][SQL] Add back HiveContext and createExternalTable
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 2a449df [SPARK-31088][SQL] Add back HiveContext and createExternalTable 2a449df is described below commit 2a449df305d5f8495959fd71d937e0f5f4fff87d Author: gatorsmile AuthorDate: Thu Mar 26 23:51:15 2020 -0700 [SPARK-31088][SQL] Add back HiveContext and createExternalTable ### What changes were proposed in this pull request? Based on the discussion in the mailing list [[Proposal] Modification to Spark's Semantic Versioning Policy](http://apache-spark-developers-list.1001551.n3.nabble.com/Proposal-Modification-to-Spark-s-Semantic-Versioning-Policy-td28938.html) , this PR is to add back the following APIs whose maintenance cost are relatively small. - HiveContext - createExternalTable APIs ### Why are the changes needed? Avoid breaking the APIs that are commonly used. ### Does this PR introduce any user-facing change? Adding back the APIs that were removed in 3.0 branch does not introduce the user-facing changes, because Spark 3.0 has not been released. ### How was this patch tested? add a new test suite for createExternalTable APIs. Closes #27815 from gatorsmile/addAPIsBack. Lead-authored-by: gatorsmile Co-authored-by: yi.wu Signed-off-by: gatorsmile (cherry picked from commit b9eafcb52658b7f5ec60bb4ebcc9da0fde94e105) Signed-off-by: gatorsmile --- docs/sql-migration-guide.md| 4 - project/MimaExcludes.scala | 2 - python/pyspark/__init__.py | 2 +- python/pyspark/sql/__init__.py | 4 +- python/pyspark/sql/catalog.py | 20 python/pyspark/sql/context.py | 67 +- .../scala/org/apache/spark/sql/SQLContext.scala| 91 ++ .../org/apache/spark/sql/catalog/Catalog.scala | 102 +++- .../DeprecatedCreateExternalTableSuite.scala | 85 + .../org/apache/spark/sql/hive/HiveContext.scala| 63 + .../sql/hive/HiveContextCompatibilitySuite.scala | 103 + 11 files changed, 532 insertions(+), 11 deletions(-) diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index d2773d8..ab35e1f 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -309,10 +309,6 @@ license: | ### Others - - In Spark 3.0, the deprecated methods `SQLContext.createExternalTable` and `SparkSession.createExternalTable` have been removed in favor of its replacement, `createTable`. - - - In Spark 3.0, the deprecated `HiveContext` class has been removed. Use `SparkSession.builder.enableHiveSupport()` instead. - - In Spark version 2.4, when a spark session is created via `cloneSession()`, the newly created spark session inherits its configuration from its parent `SparkContext` even though the same configuration may exist with a different value in its parent spark session. Since Spark 3.0, the configurations of a parent `SparkSession` have a higher precedence over the parent `SparkContext`. The old behavior can be restored by setting `spark.sql.legacy.sessionInitWithConfigDefaults` to `true`. - Since Spark 3.0, if `hive.default.fileformat` is not found in `Spark SQL configuration` then it will fallback to hive-site.xml present in the `Hadoop configuration` of `SparkContext`. diff --git a/project/MimaExcludes.scala b/project/MimaExcludes.scala index f8ad60b..9a5029e 100644 --- a/project/MimaExcludes.scala +++ b/project/MimaExcludes.scala @@ -48,8 +48,6 @@ object MimaExcludes { ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.ExecutorPlugin"), // [SPARK-28980][SQL][CORE][MLLIB] Remove more old deprecated items in Spark 3 - ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.SQLContext.createExternalTable"), - ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.catalog.Catalog.createExternalTable"), ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.mllib.clustering.KMeans.train"), ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.mllib.clustering.KMeans.train"), ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.mllib.classification.LogisticRegressionWithSGD$"), diff --git a/python/pyspark/__init__.py b/python/pyspark/__init__.py index 76a5bd0..70c0b27 100644 --- a/python/pyspark/__init__.py +++ b/python/pyspark/__init__.py @@ -113,7 +113,7 @@ def keyword_only(func): # for back compatibility -from pyspark.sql import SQLContext, Row +from pyspark.sql i
[spark] branch master updated (cb0db21 -> b7e4cc7)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from cb0db21 [SPARK-25556][SPARK-17636][SPARK-31026][SPARK-31060][SQL][TEST-HIVE1.2] Nested Column Predicate Pushdown for Parquet add b7e4cc7 [SPARK-31086][SQL] Add Back the Deprecated SQLContext methods No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/SQLContext.scala| 283 + .../org/apache/spark/sql/DeprecatedAPISuite.scala | 106 .../org/apache/spark/sql/jdbc/JDBCSuite.scala | 14 + 3 files changed, 403 insertions(+) create mode 100644 sql/core/src/test/scala/org/apache/spark/sql/DeprecatedAPISuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31086][SQL] Add Back the Deprecated SQLContext methods
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new ebc358c [SPARK-31086][SQL] Add Back the Deprecated SQLContext methods ebc358c is described below commit ebc358c8d2b6d67c7319be006452c9c993b7a098 Author: gatorsmile AuthorDate: Thu Mar 26 23:49:24 2020 -0700 [SPARK-31086][SQL] Add Back the Deprecated SQLContext methods ### What changes were proposed in this pull request? Based on the discussion in the mailing list [[Proposal] Modification to Spark's Semantic Versioning Policy](http://apache-spark-developers-list.1001551.n3.nabble.com/Proposal-Modification-to-Spark-s-Semantic-Versioning-Policy-td28938.html) , this PR is to add back the following APIs whose maintenance cost are relatively small. - SQLContext.applySchema - SQLContext.parquetFile - SQLContext.jsonFile - SQLContext.jsonRDD - SQLContext.load - SQLContext.jdbc ### Why are the changes needed? Avoid breaking the APIs that are commonly used. ### Does this PR introduce any user-facing change? Adding back the APIs that were removed in 3.0 branch does not introduce the user-facing changes, because Spark 3.0 has not been released. ### How was this patch tested? The existing tests. Closes #27839 from gatorsmile/addAPIBackV3. Lead-authored-by: gatorsmile Co-authored-by: yi.wu Signed-off-by: gatorsmile (cherry picked from commit b7e4cc775b7eac68606d1f385911613f5139db1b) Signed-off-by: gatorsmile --- .../scala/org/apache/spark/sql/SQLContext.scala| 283 + .../org/apache/spark/sql/DeprecatedAPISuite.scala | 106 .../org/apache/spark/sql/jdbc/JDBCSuite.scala | 14 + 3 files changed, 403 insertions(+) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala b/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala index 2054874..592c64c 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala @@ -611,6 +611,289 @@ class SQLContext private[sql](val sparkSession: SparkSession) sessionState.catalog.listTables(databaseName).map(_.table).toArray } + + + // Deprecated methods + + + + /** + * @deprecated As of 1.3.0, replaced by `createDataFrame()`. + */ + @deprecated("Use createDataFrame instead.", "1.3.0") + def applySchema(rowRDD: RDD[Row], schema: StructType): DataFrame = { +createDataFrame(rowRDD, schema) + } + + /** + * @deprecated As of 1.3.0, replaced by `createDataFrame()`. + */ + @deprecated("Use createDataFrame instead.", "1.3.0") + def applySchema(rowRDD: JavaRDD[Row], schema: StructType): DataFrame = { +createDataFrame(rowRDD, schema) + } + + /** + * @deprecated As of 1.3.0, replaced by `createDataFrame()`. + */ + @deprecated("Use createDataFrame instead.", "1.3.0") + def applySchema(rdd: RDD[_], beanClass: Class[_]): DataFrame = { +createDataFrame(rdd, beanClass) + } + + /** + * @deprecated As of 1.3.0, replaced by `createDataFrame()`. + */ + @deprecated("Use createDataFrame instead.", "1.3.0") + def applySchema(rdd: JavaRDD[_], beanClass: Class[_]): DataFrame = { +createDataFrame(rdd, beanClass) + } + + /** + * Loads a Parquet file, returning the result as a `DataFrame`. This function returns an empty + * `DataFrame` if no paths are passed in. + * + * @group specificdata + * @deprecated As of 1.4.0, replaced by `read().parquet()`. + */ + @deprecated("Use read.parquet() instead.", "1.4.0") + @scala.annotation.varargs + def parquetFile(paths: String*): DataFrame = { +if (paths.isEmpty) { + emptyDataFrame +} else { + read.parquet(paths : _*) +} + } + + /** + * Loads a JSON file (one object per line), returning the result as a `DataFrame`. + * It goes through the entire dataset once to determine the schema. + * + * @group specificdata + * @deprecated As of 1.4.0, replaced by `read().json()`. + */ + @deprecated("Use read.json() instead.", "1.4.0") + def jsonFile(path: String): DataFrame = { +read.json(path) + } + + /** + * Loads a JSON file (one object per line) and applies the given schema, + * returning the result as a `DataFrame`. + * + * @group specificdata + * @deprecated As of 1.4.0, replaced by
[spark] branch master updated (1369a97 -> 30d9535)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 1369a97 [SPARK-31164][SQL] Inconsistent rdd and output partitioning for bucket table when output doesn't contain all bucket columns add 30d9535 [SPARK-31134][SQL] optimize skew join after shuffle partitions are coalesced No new revisions were added by this update. Summary of changes: .../execution/adaptive/AdaptiveSparkPlanExec.scala | 9 +- .../adaptive/CoalesceShufflePartitions.scala | 2 - .../execution/adaptive/OptimizeSkewedJoin.scala| 272 ++--- .../execution/adaptive/ShufflePartitionsUtil.scala | 18 +- .../sql/execution/ShufflePartitionsUtilSuite.scala | 2 - 5 files changed, 146 insertions(+), 157 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31134][SQL] optimize skew join after shuffle partitions are coalesced
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 0512b3f [SPARK-31134][SQL] optimize skew join after shuffle partitions are coalesced 0512b3f is described below commit 0512b3f427274c8bda249fba02cd16f5694a4ea5 Author: Wenchen Fan AuthorDate: Tue Mar 17 00:23:16 2020 -0700 [SPARK-31134][SQL] optimize skew join after shuffle partitions are coalesced ### What changes were proposed in this pull request? Run the `OptimizeSkewedJoin` rule after the `CoalesceShufflePartitions` rule. ### Why are the changes needed? Remove duplicated coalescing code in `OptimizeSkewedJoin`. ### Does this PR introduce any user-facing change? No ### How was this patch tested? existing tests Closes #27893 from cloud-fan/aqe. Authored-by: Wenchen Fan Signed-off-by: gatorsmile (cherry picked from commit 30d95356f1881c32eb39e51525d2bcb331fcf867) Signed-off-by: gatorsmile --- .../execution/adaptive/AdaptiveSparkPlanExec.scala | 9 +- .../adaptive/CoalesceShufflePartitions.scala | 2 - .../execution/adaptive/OptimizeSkewedJoin.scala| 272 ++--- .../execution/adaptive/ShufflePartitionsUtil.scala | 18 +- .../sql/execution/ShufflePartitionsUtilSuite.scala | 2 - 5 files changed, 146 insertions(+), 157 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala index 68da06d..b54a32f 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala @@ -96,13 +96,10 @@ case class AdaptiveSparkPlanExec( // optimizations should be stage-independent. @transient private val queryStageOptimizerRules: Seq[Rule[SparkPlan]] = Seq( ReuseAdaptiveSubquery(conf, context.subqueryCache), -// Here the 'OptimizeSkewedJoin' rule should be executed -// before 'CoalesceShufflePartitions', as the skewed partition handled -// in 'OptimizeSkewedJoin' rule, should be omitted in 'CoalesceShufflePartitions'. -OptimizeSkewedJoin(conf), CoalesceShufflePartitions(context.session), -// The rule of 'OptimizeLocalShuffleReader' need to make use of the 'partitionStartIndices' -// in 'CoalesceShufflePartitions' rule. So it must be after 'CoalesceShufflePartitions' rule. +// The following two rules need to make use of 'CustomShuffleReaderExec.partitionSpecs' +// added by `CoalesceShufflePartitions`. So they must be executed after it. +OptimizeSkewedJoin(conf), OptimizeLocalShuffleReader(conf), ApplyColumnarRulesAndInsertTransitions(conf, context.session.sessionState.columnarRules), CollapseCodegenStages(conf) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala index d2a7f6a..226d692 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala @@ -74,8 +74,6 @@ case class CoalesceShufflePartitions(session: SparkSession) extends Rule[SparkPl .getOrElse(session.sparkContext.defaultParallelism) val partitionSpecs = ShufflePartitionsUtil.coalescePartitions( validMetrics.toArray, - firstPartitionIndex = 0, - lastPartitionIndex = distinctNumPreShufflePartitions.head, advisoryTargetSize = conf.getConf(SQLConf.ADVISORY_PARTITION_SIZE_IN_BYTES), minNumPartitions = minPartitionNum) // This transformation adds new nodes, so we must use `transformUp` here. diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala index db65af6..e02b9af 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala @@ -21,7 +21,7 @@ import scala.collection.mutable import org.apache.commons.io.FileUtils -import org.apache.spark.{MapOutputStatistics, MapOutputTrackerMaster, SparkContext, SparkEnv} +import org.apache.spark.{MapOutputStatistics, MapOutputTrackerMaster, SparkEnv} import org.apache.spark.sql.catalyst.plans._ import org.apache.spark
[spark-website] branch asf-site updated: Add "Amend Spark's Semantic Versioning Policy" #263
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 6f1e0de Add "Amend Spark's Semantic Versioning Policy" #263 6f1e0de is described below commit 6f1e0deb6632f75ad0492ffba372f1ebb828ddfb Author: Xiao Li AuthorDate: Sat Mar 14 17:40:30 2020 -0700 Add "Amend Spark's Semantic Versioning Policy" #263 The vote of "Amend Spark's Semantic Versioning Policy" passed in the dev mailing list http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Amend-Spark-s-Semantic-Versioning-Policy-td28988.html This PR is to add it to the versioning-policy page. ![image](https://user-images.githubusercontent.com/11567269/76592244-063e7680-64b0-11ea-9875-c0e8573d7321.png) --- site/versioning-policy.html | 77 + versioning-policy.md| 47 +++ 2 files changed, 124 insertions(+) diff --git a/site/versioning-policy.html b/site/versioning-policy.html index 34547e8..679e9b2 100644 --- a/site/versioning-policy.html +++ b/site/versioning-policy.html @@ -245,6 +245,83 @@ maximum compatibility. Code should not be merged into the project as “expe a plan to change the API later, because users expect the maximum compatibility from all available APIs. +Considerations When Breaking APIs + +The Spark project strives to avoid breaking APIs or silently changing behavior, even at major versions. While this is not always possible, the balance of the following factors should be considered before choosing to break an API. + +Cost of Breaking an API + +Breaking an API almost always has a non-trivial cost to the users of Spark. A broken API means that Spark programs need to be rewritten before they can be upgraded. However, there are a few considerations when thinking about what the cost will be: + + + Usage - an API that is actively used in many different places, is always very costly to break. While it is hard to know usage for sure, there are a bunch of ways that we can estimate: + + +How long has the API been in Spark? + + +Is the API common even for basic programs? + + +How often do we see recent questions in JIRA or mailing lists? + + +How often does it appear in StackOverflow or blogs? + + + + +Behavior after the break - How will a program that works today, work after the break? The following are listed roughly in order of increasing severity: + + + +Will there be a compiler or linker error? + + +Will there be a runtime exception? + + +Will that exception happen after significant processing has been done? + + +Will we silently return different answers? (very hard to debug, might not even notice!) + + + + + +Cost of Maintaining an API + +Of course, the above does not mean that we will never break any APIs. We must also consider the cost both to the project and to our users of keeping the API in question. + + + +Project Costs - Every API we have needs to be tested and needs to keep working as other parts of the project changes. These costs are significantly exacerbated when external dependencies change (the JVM, Scala, etc). In some cases, while not completely technically infeasible, the cost of maintaining a particular API can become too high. + + +User Costs - APIs also have a cognitive cost to users learning Spark or trying to understand Spark programs. This cost becomes even higher when the API in question has confusing or undefined semantics. + + + +Alternatives to Breaking an API + +In cases where there is a “Bad API”, but where the cost of removal is also high, there are alternatives that should be considered that do not hurt existing users but do address some of the maintenance costs. + + + +Avoid Bad APIs - While this is a bit obvious, it is an important point. Anytime we are adding a new interface to Spark we should consider that we might be stuck with this API forever. Think deeply about how new APIs relate to existing ones, as well as how you expect them to evolve over time. + + +Deprecation Warnings - All deprecation warnings should point to a clear alternative and should never just say that an API is deprecated. + + +Updated Docs - Documentation should point to the “best” recommended way of performing a given task. In the cases where we maintain legacy documentation, we should clearly point to newer APIs and suggest to users the “right” way. + + +Community Work - Many people learn Spark by reading blogs and other sites such as StackOverflow. However, many of these resources are out of date.
[spark] branch branch-3.0 updated: [SPARK-31070][SQL] make skew join split skewed partitions more evenly
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 3f23529 [SPARK-31070][SQL] make skew join split skewed partitions more evenly 3f23529 is described below commit 3f23529cac3a306afe0ed175b8034d4f24b08acb Author: Wenchen Fan AuthorDate: Tue Mar 10 21:50:44 2020 -0700 [SPARK-31070][SQL] make skew join split skewed partitions more evenly ### What changes were proposed in this pull request? There are two problems when splitting skewed partitions: 1. It's impossible that we can't split the skewed partition, then we shouldn't create a skew join. 2. When splitting, it's possible that we create a partition for very small amount of data.. This PR fixes them 1. don't create `PartialReducerPartitionSpec` if we can't split. 2. merge small partitions to the previous partition. ### Why are the changes needed? make skew join split skewed partitions more evenly ### Does this PR introduce any user-facing change? no ### How was this patch tested? updated test Closes #27833 from cloud-fan/aqe. Authored-by: Wenchen Fan Signed-off-by: gatorsmile (cherry picked from commit d5f5720efa7232f1339976462d462a7360978ab5) Signed-off-by: gatorsmile --- .../adaptive/CoalesceShufflePartitions.scala | 2 +- .../execution/adaptive/OptimizeSkewedJoin.scala| 44 +++ ...Coalescer.scala => ShufflePartitionsUtil.scala} | 50 +- ...uite.scala => ShufflePartitionsUtilSuite.scala} | 32 -- .../adaptive/AdaptiveQueryExecSuite.scala | 14 +++--- 5 files changed, 102 insertions(+), 40 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala index a8e2d8e..d779a20 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala @@ -66,7 +66,7 @@ case class CoalesceShufflePartitions(conf: SQLConf) extends Rule[SparkPlan] { val distinctNumPreShufflePartitions = validMetrics.map(stats => stats.bytesByPartitionId.length).distinct if (validMetrics.nonEmpty && distinctNumPreShufflePartitions.length == 1) { -val partitionSpecs = ShufflePartitionsCoalescer.coalescePartitions( +val partitionSpecs = ShufflePartitionsUtil.coalescePartitions( validMetrics.toArray, firstPartitionIndex = 0, lastPartitionIndex = distinctNumPreShufflePartitions.head, diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala index 4387409..7f52393 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala @@ -18,7 +18,6 @@ package org.apache.spark.sql.execution.adaptive import scala.collection.mutable -import scala.collection.mutable.ArrayBuffer import org.apache.commons.io.FileUtils @@ -111,22 +110,7 @@ case class OptimizeSkewedJoin(conf: SQLConf) extends Rule[SparkPlan] { targetSize: Long): Seq[Int] = { val shuffleId = stage.shuffle.shuffleDependency.shuffleHandle.shuffleId val mapPartitionSizes = getMapSizesForReduceId(shuffleId, partitionId) -val partitionStartIndices = ArrayBuffer[Int]() -partitionStartIndices += 0 -var i = 0 -var postMapPartitionSize = 0L -while (i < mapPartitionSizes.length) { - val nextMapPartitionSize = mapPartitionSizes(i) - if (i > 0 && postMapPartitionSize + nextMapPartitionSize > targetSize) { -partitionStartIndices += i -postMapPartitionSize = nextMapPartitionSize - } else { -postMapPartitionSize += nextMapPartitionSize - } - i += 1 -} - -partitionStartIndices +ShufflePartitionsUtil.splitSizeListByTargetSize(mapPartitionSizes, targetSize) } private def getStatistics(stage: ShuffleQueryStageExec): MapOutputStatistics = { @@ -211,21 +195,25 @@ case class OptimizeSkewedJoin(conf: SQLConf) extends Rule[SparkPlan] { } val leftParts = if (isLeftSkew) { -leftSkewDesc.addPartitionSize(leftSize) -createSkewPartitions( - partitionIndex, - getMapStartIndices(left, partitionIndex, leftTargetSize), - getNumMappers(left)) +
[spark] branch branch-3.0 updated: [SPARK-31070][SQL] make skew join split skewed partitions more evenly
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 3f23529 [SPARK-31070][SQL] make skew join split skewed partitions more evenly 3f23529 is described below commit 3f23529cac3a306afe0ed175b8034d4f24b08acb Author: Wenchen Fan AuthorDate: Tue Mar 10 21:50:44 2020 -0700 [SPARK-31070][SQL] make skew join split skewed partitions more evenly ### What changes were proposed in this pull request? There are two problems when splitting skewed partitions: 1. It's impossible that we can't split the skewed partition, then we shouldn't create a skew join. 2. When splitting, it's possible that we create a partition for very small amount of data.. This PR fixes them 1. don't create `PartialReducerPartitionSpec` if we can't split. 2. merge small partitions to the previous partition. ### Why are the changes needed? make skew join split skewed partitions more evenly ### Does this PR introduce any user-facing change? no ### How was this patch tested? updated test Closes #27833 from cloud-fan/aqe. Authored-by: Wenchen Fan Signed-off-by: gatorsmile (cherry picked from commit d5f5720efa7232f1339976462d462a7360978ab5) Signed-off-by: gatorsmile --- .../adaptive/CoalesceShufflePartitions.scala | 2 +- .../execution/adaptive/OptimizeSkewedJoin.scala| 44 +++ ...Coalescer.scala => ShufflePartitionsUtil.scala} | 50 +- ...uite.scala => ShufflePartitionsUtilSuite.scala} | 32 -- .../adaptive/AdaptiveQueryExecSuite.scala | 14 +++--- 5 files changed, 102 insertions(+), 40 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala index a8e2d8e..d779a20 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala @@ -66,7 +66,7 @@ case class CoalesceShufflePartitions(conf: SQLConf) extends Rule[SparkPlan] { val distinctNumPreShufflePartitions = validMetrics.map(stats => stats.bytesByPartitionId.length).distinct if (validMetrics.nonEmpty && distinctNumPreShufflePartitions.length == 1) { -val partitionSpecs = ShufflePartitionsCoalescer.coalescePartitions( +val partitionSpecs = ShufflePartitionsUtil.coalescePartitions( validMetrics.toArray, firstPartitionIndex = 0, lastPartitionIndex = distinctNumPreShufflePartitions.head, diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala index 4387409..7f52393 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala @@ -18,7 +18,6 @@ package org.apache.spark.sql.execution.adaptive import scala.collection.mutable -import scala.collection.mutable.ArrayBuffer import org.apache.commons.io.FileUtils @@ -111,22 +110,7 @@ case class OptimizeSkewedJoin(conf: SQLConf) extends Rule[SparkPlan] { targetSize: Long): Seq[Int] = { val shuffleId = stage.shuffle.shuffleDependency.shuffleHandle.shuffleId val mapPartitionSizes = getMapSizesForReduceId(shuffleId, partitionId) -val partitionStartIndices = ArrayBuffer[Int]() -partitionStartIndices += 0 -var i = 0 -var postMapPartitionSize = 0L -while (i < mapPartitionSizes.length) { - val nextMapPartitionSize = mapPartitionSizes(i) - if (i > 0 && postMapPartitionSize + nextMapPartitionSize > targetSize) { -partitionStartIndices += i -postMapPartitionSize = nextMapPartitionSize - } else { -postMapPartitionSize += nextMapPartitionSize - } - i += 1 -} - -partitionStartIndices +ShufflePartitionsUtil.splitSizeListByTargetSize(mapPartitionSizes, targetSize) } private def getStatistics(stage: ShuffleQueryStageExec): MapOutputStatistics = { @@ -211,21 +195,25 @@ case class OptimizeSkewedJoin(conf: SQLConf) extends Rule[SparkPlan] { } val leftParts = if (isLeftSkew) { -leftSkewDesc.addPartitionSize(leftSize) -createSkewPartitions( - partitionIndex, - getMapStartIndices(left, partitionIndex, leftTargetSize), - getNumMappers(left)) +
[spark] branch master updated (93def95 -> d5f5720)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 93def95 [SPARK-31095][BUILD] Upgrade netty-all to 4.1.47.Final add d5f5720 [SPARK-31070][SQL] make skew join split skewed partitions more evenly No new revisions were added by this update. Summary of changes: .../adaptive/CoalesceShufflePartitions.scala | 2 +- .../execution/adaptive/OptimizeSkewedJoin.scala| 44 +++ ...Coalescer.scala => ShufflePartitionsUtil.scala} | 50 +- ...uite.scala => ShufflePartitionsUtilSuite.scala} | 32 -- .../adaptive/AdaptiveQueryExecSuite.scala | 14 +++--- 5 files changed, 102 insertions(+), 40 deletions(-) rename sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/{ShufflePartitionsCoalescer.scala => ShufflePartitionsUtil.scala} (73%) rename sql/core/src/test/scala/org/apache/spark/sql/execution/{ShufflePartitionsCoalescerSuite.scala => ShufflePartitionsUtilSuite.scala} (88%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (93def95 -> d5f5720)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 93def95 [SPARK-31095][BUILD] Upgrade netty-all to 4.1.47.Final add d5f5720 [SPARK-31070][SQL] make skew join split skewed partitions more evenly No new revisions were added by this update. Summary of changes: .../adaptive/CoalesceShufflePartitions.scala | 2 +- .../execution/adaptive/OptimizeSkewedJoin.scala| 44 +++ ...Coalescer.scala => ShufflePartitionsUtil.scala} | 50 +- ...uite.scala => ShufflePartitionsUtilSuite.scala} | 32 -- .../adaptive/AdaptiveQueryExecSuite.scala | 14 +++--- 5 files changed, 102 insertions(+), 40 deletions(-) rename sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/{ShufflePartitionsCoalescer.scala => ShufflePartitionsUtil.scala} (73%) rename sql/core/src/test/scala/org/apache/spark/sql/execution/{ShufflePartitionsCoalescerSuite.scala => ShufflePartitionsUtilSuite.scala} (88%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-30999][SQL] Don't cancel a QueryStageExec which failed before call doMaterialize
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 2732980 [SPARK-30999][SQL] Don't cancel a QueryStageExec which failed before call doMaterialize 2732980 is described below commit 27329806c36d0b403153fe1ad0077acb72d92606 Author: yi.wu AuthorDate: Tue Mar 3 13:40:51 2020 -0800 [SPARK-30999][SQL] Don't cancel a QueryStageExec which failed before call doMaterialize ### What changes were proposed in this pull request? This PR proposes to not cancel a `QueryStageExec` which failed before calling `doMaterialize`. Besides, this PR also includes 2 minor improvements: * fail fast when stage failed before calling `doMaterialize` * format Exception with Cause ### Why are the changes needed? For a stage which failed before materializing the lazy value (e.g. `inputRDD`), calling `cancel` on it could re-trigger the same failure again, e.g. executing child node again(see `AdaptiveQueryExecSuite`.`SPARK-30291: AQE should catch the exceptions when doing materialize` for example). And finally, the same failure will be counted 2 times, one is for materialize error and another is for cancel error. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Updated test. Closes #27752 from Ngone51/avoid_cancel_finished_stage. Authored-by: yi.wu Signed-off-by: gatorsmile (cherry picked from commit 380e8876316d6ef5a74358be2a04ab20e8b6e7ca) Signed-off-by: gatorsmile --- .../execution/adaptive/AdaptiveSparkPlanExec.scala | 23 +- .../adaptive/AdaptiveQueryExecSuite.scala | 3 ++- 2 files changed, 16 insertions(+), 10 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala index 4036424..c018ca4 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala @@ -165,7 +165,7 @@ case class AdaptiveSparkPlanExec( stagesToReplace = result.newStages ++ stagesToReplace executionId.foreach(onUpdatePlan) - // Start materialization of all new stages. + // Start materialization of all new stages and fail fast if any stages failed eagerly result.newStages.foreach { stage => try { stage.materialize().onComplete { res => @@ -176,7 +176,10 @@ case class AdaptiveSparkPlanExec( } }(AdaptiveSparkPlanExec.executionContext) } catch { - case e: Throwable => events.offer(StageFailure(stage, e)) + case e: Throwable => +val ex = new SparkException( + s"Early failed query stage found: ${stage.treeString}", e) +cleanUpAndThrowException(Seq(ex), Some(stage.id)) } } } @@ -192,13 +195,12 @@ case class AdaptiveSparkPlanExec( stage.resultOption = Some(res) case StageFailure(stage, ex) => errors.append( - new SparkException(s"Failed to materialize query stage: ${stage.treeString}." + -s" and the cause is ${ex.getMessage}", ex)) + new SparkException(s"Failed to materialize query stage: ${stage.treeString}.", ex)) } // In case of errors, we cancel all running stages and throw exception. if (errors.nonEmpty) { - cleanUpAndThrowException(errors) + cleanUpAndThrowException(errors, None) } // Try re-optimizing and re-planning. Adopt the new plan if its cost is equal to or less @@ -522,9 +524,13 @@ case class AdaptiveSparkPlanExec( * Cancel all running stages with best effort and throw an Exception containing all stage * materialization errors and stage cancellation errors. */ - private def cleanUpAndThrowException(errors: Seq[SparkException]): Unit = { + private def cleanUpAndThrowException( + errors: Seq[SparkException], + earlyFailedStage: Option[Int]): Unit = { val runningStages = currentPhysicalPlan.collect { - case s: QueryStageExec => s + // earlyFailedStage is the stage which failed before calling doMaterialize, + // so we should avoid calling cancel on it to re-trigger the failure again. + case s: QueryStageExec if !earlyFailedStage.contains(s.id) => s } val cancelErrors = new mutable.ArrayBuffer[SparkException]() try { @@ -539,8 +545,7 @@ ca
[spark] branch master updated (4a1d273 -> 380e887)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4a1d273 [SPARK-30997][SQL] Fix an analysis failure in generators with aggregate functions add 380e887 [SPARK-30999][SQL] Don't cancel a QueryStageExec which failed before call doMaterialize No new revisions were added by this update. Summary of changes: .../execution/adaptive/AdaptiveSparkPlanExec.scala | 23 +- .../adaptive/AdaptiveQueryExecSuite.scala | 3 ++- 2 files changed, 16 insertions(+), 10 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-30991] Refactor AQE readers and RDDs
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 597 [SPARK-30991] Refactor AQE readers and RDDs 597 is described below commit 597b5507448980e4fadbad85ffb104808081 Author: maryannxue AuthorDate: Mon Mar 2 16:04:00 2020 -0800 [SPARK-30991] Refactor AQE readers and RDDs ### What changes were proposed in this pull request? This PR combines `CustomShuffledRowRDD` and `LocalShuffledRowRDD` into `ShuffledRowRDD`, and creates `CustomShuffleReaderExec` to unify and replace all existing AQE readers: `CoalescedShuffleReaderExec`, `LocalShuffleReaderExec` and `SkewJoinShuffleReaderExec`. ### Why are the changes needed? To reduce code redundancy. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Passed existing UTs. Closes #27742 from maryannxue/aqe-readers. Authored-by: maryannxue Signed-off-by: gatorsmile (cherry picked from commit 473a28c1d032993c7fa515b39f2cb1e3105d65d3) Signed-off-by: gatorsmile --- .../spark/sql/execution/ShuffledRowRDD.scala | 142 - .../apache/spark/sql/execution/SparkPlanInfo.scala | 2 +- .../adaptive/CustomShuffleReaderExec.scala | 81 .../execution/adaptive/CustomShuffledRowRDD.scala | 113 .../execution/adaptive/LocalShuffledRowRDD.scala | 112 .../adaptive/OptimizeLocalShuffleReader.scala | 88 +++-- .../execution/adaptive/OptimizeSkewedJoin.scala| 72 ++- .../adaptive/ReduceNumShufflePartitions.scala | 49 ++- .../adaptive/ShufflePartitionsCoalescer.scala | 23 ++-- .../execution/exchange/ShuffleExchangeExec.scala | 12 +- .../ReduceNumShufflePartitionsSuite.scala | 28 ++-- .../ShufflePartitionsCoalescerSuite.scala | 101 ++- .../adaptive/AdaptiveQueryExecSuite.scala | 23 ++-- 13 files changed, 317 insertions(+), 529 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/ShuffledRowRDD.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/ShuffledRowRDD.scala index 4c19f95..eb02259 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/ShuffledRowRDD.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/ShuffledRowRDD.scala @@ -26,17 +26,28 @@ import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.execution.metric.{SQLMetric, SQLShuffleReadMetricsReporter} import org.apache.spark.sql.internal.SQLConf +sealed trait ShufflePartitionSpec + +// A partition that reads data of one or more reducers, from `startReducerIndex` (inclusive) to +// `endReducerIndex` (exclusive). +case class CoalescedPartitionSpec( + startReducerIndex: Int, endReducerIndex: Int) extends ShufflePartitionSpec + +// A partition that reads partial data of one reducer, from `startMapIndex` (inclusive) to +// `endMapIndex` (exclusive). +case class PartialReducerPartitionSpec( + reducerIndex: Int, startMapIndex: Int, endMapIndex: Int) extends ShufflePartitionSpec + +// A partition that reads partial data of one mapper, from `startReducerIndex` (inclusive) to +// `endReducerIndex` (exclusive). +case class PartialMapperPartitionSpec( + mapIndex: Int, startReducerIndex: Int, endReducerIndex: Int) extends ShufflePartitionSpec + /** - * The [[Partition]] used by [[ShuffledRowRDD]]. A post-shuffle partition - * (identified by `postShufflePartitionIndex`) contains a range of pre-shuffle partitions - * (`startPreShufflePartitionIndex` to `endPreShufflePartitionIndex - 1`, inclusive). + * The [[Partition]] used by [[ShuffledRowRDD]]. */ -private final class ShuffledRowRDDPartition( -val postShufflePartitionIndex: Int, -val startPreShufflePartitionIndex: Int, -val endPreShufflePartitionIndex: Int) extends Partition { - override val index: Int = postShufflePartitionIndex -} +private final case class ShuffledRowRDDPartition( + index: Int, spec: ShufflePartitionSpec) extends Partition /** * A dummy partitioner for use with records whose partition ids have been pre-computed (i.e. for @@ -94,8 +105,7 @@ class CoalescedPartitioner(val parent: Partitioner, val partitionStartIndices: A * interfaces / internals. * * This RDD takes a [[ShuffleDependency]] (`dependency`), - * and an optional array of partition start indices as input arguments - * (`specifiedPartitionStartIndices`). + * and an array of [[ShufflePartitionSpec]] as input arguments. * * The `dependency` has the parent RDD of this RDD, which represents the dataset before shuffle * (i.e. map output). Elements of this RDD are (partitionId, Row) pairs. @@ -103,79 +113,97 @@ class CoalescedPartitioner(val parent: Partitioner
[spark] branch master updated (f0010c8 -> 473a28c)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f0010c8 [SPARK-31003][TESTS] Fix incorrect uses of assume() in tests add 473a28c [SPARK-30991] Refactor AQE readers and RDDs No new revisions were added by this update. Summary of changes: .../spark/sql/execution/ShuffledRowRDD.scala | 142 - .../apache/spark/sql/execution/SparkPlanInfo.scala | 2 +- .../adaptive/CustomShuffleReaderExec.scala | 81 .../execution/adaptive/CustomShuffledRowRDD.scala | 113 .../execution/adaptive/LocalShuffledRowRDD.scala | 112 .../adaptive/OptimizeLocalShuffleReader.scala | 88 +++-- .../execution/adaptive/OptimizeSkewedJoin.scala| 72 ++- .../adaptive/ReduceNumShufflePartitions.scala | 49 ++- .../adaptive/ShufflePartitionsCoalescer.scala | 23 ++-- .../execution/exchange/ShuffleExchangeExec.scala | 12 +- .../ReduceNumShufflePartitionsSuite.scala | 28 ++-- .../ShufflePartitionsCoalescerSuite.scala | 101 ++- .../adaptive/AdaptiveQueryExecSuite.scala | 23 ++-- 13 files changed, 317 insertions(+), 529 deletions(-) create mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffledRowRDD.scala delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/LocalShuffledRowRDD.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-30918][SQL] improve the splitting of skewed partitions
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new b968cd3 [SPARK-30918][SQL] improve the splitting of skewed partitions b968cd3 is described below commit b968cd37796a5730fe5c2318d23a38416f550957 Author: Wenchen Fan AuthorDate: Tue Feb 25 14:10:29 2020 -0800 [SPARK-30918][SQL] improve the splitting of skewed partitions ### What changes were proposed in this pull request? Use the average size of the non-skewed partitions as the target size when splitting skewed partitions, instead of ADAPTIVE_EXECUTION_SKEWED_PARTITION_SIZE_THRESHOLD ### Why are the changes needed? The goal of skew join optimization is to make the data distribution move even. So it makes more sense the use the average size of the non-skewed partitions as the target size. ### Does this PR introduce any user-facing change? no ### How was this patch tested? existing tests Closes #27669 from cloud-fan/aqe. Authored-by: Wenchen Fan Signed-off-by: Xiao Li (cherry picked from commit 8f247e5d3682ad765bdbb9ea5a4315862c5a383c) Signed-off-by: Xiao Li --- .../org/apache/spark/sql/internal/SQLConf.scala| 10 +--- .../execution/adaptive/OptimizeSkewedJoin.scala| 62 ++ .../adaptive/AdaptiveQueryExecSuite.scala | 4 +- 3 files changed, 54 insertions(+), 22 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index 674c6df..e6f7cfd 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -432,19 +432,13 @@ object SQLConf { .booleanConf .createWithDefault(true) - val ADAPTIVE_EXECUTION_SKEWED_PARTITION_SIZE_THRESHOLD = - buildConf("spark.sql.adaptive.skewedJoinOptimization.skewedPartitionSizeThreshold") - .doc("Configures the minimum size in bytes for a partition that is considered as a skewed " + -"partition in adaptive skewed join.") - .bytesConf(ByteUnit.BYTE) - .createWithDefaultString("64MB") - val ADAPTIVE_EXECUTION_SKEWED_PARTITION_FACTOR = buildConf("spark.sql.adaptive.skewedJoinOptimization.skewedPartitionFactor") .doc("A partition is considered as a skewed partition if its size is larger than" + " this factor multiple the median partition size and also larger than " + -s" ${ADAPTIVE_EXECUTION_SKEWED_PARTITION_SIZE_THRESHOLD.key}") +s" ${SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE.key}") .intConf + .checkValue(_ > 0, "The skew factor must be positive.") .createWithDefault(10) val NON_EMPTY_PARTITION_RATIO_FOR_BROADCAST_JOIN = diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala index 578d2d7..d3cb864 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala @@ -34,6 +34,30 @@ import org.apache.spark.sql.execution.exchange.{EnsureRequirements, ShuffleExcha import org.apache.spark.sql.execution.joins.SortMergeJoinExec import org.apache.spark.sql.internal.SQLConf +/** + * A rule to optimize skewed joins to avoid straggler tasks whose share of data are significantly + * larger than those of the rest of the tasks. + * + * The general idea is to divide each skew partition into smaller partitions and replicate its + * matching partition on the other side of the join so that they can run in parallel tasks. + * Note that when matching partitions from the left side and the right side both have skew, + * it will become a cartesian product of splits from left and right joining together. + * + * For example, assume the Sort-Merge join has 4 partitions: + * left: [L1, L2, L3, L4] + * right: [R1, R2, R3, R4] + * + * Let's say L2, L4 and R3, R4 are skewed, and each of them get split into 2 sub-partitions. This + * is scheduled to run 4 tasks at the beginning: (L1, R1), (L2, R2), (L2, R2), (L2, R2). + * This rule expands it to 9 tasks to increase parallelism: + * (L1, R1), + * (L2-1, R2), (L2-2, R2), + * (L3, R3-1), (L3, R3-2), + * (L4-1, R4-1), (L4-2, R4-1), (L4-1, R4-2), (L4-2, R4-2) + * + * Note that, when this rule is enabled, it also coalesces non-skewed partitions like + * `ReduceNumShufflePartitions` does. + */ case class OptimizeSkewedJoin(conf: SQLConf) extends Rul
[spark] branch master updated (e086a78 -> 8f247e5)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e086a78 [MINOR][ML] ML cleanup add 8f247e5 [SPARK-30918][SQL] improve the splitting of skewed partitions No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/internal/SQLConf.scala| 10 +--- .../execution/adaptive/OptimizeSkewedJoin.scala| 62 ++ .../adaptive/AdaptiveQueryExecSuite.scala | 4 +- 3 files changed, 54 insertions(+), 22 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e086a78 -> 8f247e5)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e086a78 [MINOR][ML] ML cleanup add 8f247e5 [SPARK-30918][SQL] improve the splitting of skewed partitions No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/internal/SQLConf.scala| 10 +--- .../execution/adaptive/OptimizeSkewedJoin.scala| 62 ++ .../adaptive/AdaptiveQueryExecSuite.scala | 4 +- 3 files changed, 54 insertions(+), 22 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-30779][SS] Fix some API issues found when reviewing Structured Streaming API docs
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 45d834c [SPARK-30779][SS] Fix some API issues found when reviewing Structured Streaming API docs 45d834c is described below commit 45d834cb8cc2c30f902d0dec1cdf561b993521d0 Author: Shixiong Zhu AuthorDate: Mon Feb 10 14:26:14 2020 -0800 [SPARK-30779][SS] Fix some API issues found when reviewing Structured Streaming API docs ### What changes were proposed in this pull request? - Fix the scope of `Logging.initializeForcefully` so that it doesn't appear in subclasses' public methods. Right now, `sc.initializeForcefully(false, false)` is allowed to called. - Don't show classes under `org.apache.spark.internal` package in API docs. - Add missing `since` annotation. - Fix the scope of `ArrowUtils` to remove it from the API docs. ### Why are the changes needed? Avoid leaking APIs unintentionally in Spark 3.0.0. ### Does this PR introduce any user-facing change? No. All these changes are to avoid leaking APIs unintentionally in Spark 3.0.0. ### How was this patch tested? Manually generated the API docs and verified the above issues have been fixed. Closes #27528 from zsxwing/audit-ss-apis. Authored-by: Shixiong Zhu Signed-off-by: Xiao Li --- core/src/main/scala/org/apache/spark/internal/Logging.scala | 2 +- project/SparkBuild.scala| 1 + .../spark/sql/connector/read/streaming/ContinuousPartitionReader.java | 2 ++ .../sql/connector/read/streaming/ContinuousPartitionReaderFactory.java | 2 ++ .../org/apache/spark/sql/connector/read/streaming/ContinuousStream.java | 2 ++ .../org/apache/spark/sql/connector/read/streaming/MicroBatchStream.java | 2 ++ .../main/java/org/apache/spark/sql/connector/read/streaming/Offset.java | 2 ++ .../org/apache/spark/sql/connector/read/streaming/PartitionOffset.java | 2 ++ .../java/org/apache/spark/sql/connector/read/streaming/ReadLimit.java | 1 + .../org/apache/spark/sql/connector/read/streaming/SparkDataStream.java | 2 ++ .../spark/sql/connector/write/streaming/StreamingDataWriterFactory.java | 2 ++ .../org/apache/spark/sql/connector/write/streaming/StreamingWrite.java | 2 ++ sql/catalyst/src/main/scala/org/apache/spark/sql/util/ArrowUtils.scala | 2 +- 13 files changed, 22 insertions(+), 2 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/internal/Logging.scala b/core/src/main/scala/org/apache/spark/internal/Logging.scala index 2e4846b..0c1d963 100644 --- a/core/src/main/scala/org/apache/spark/internal/Logging.scala +++ b/core/src/main/scala/org/apache/spark/internal/Logging.scala @@ -117,7 +117,7 @@ trait Logging { } // For testing - def initializeForcefully(isInterpreter: Boolean, silent: Boolean): Unit = { + private[spark] def initializeForcefully(isInterpreter: Boolean, silent: Boolean): Unit = { initializeLogging(isInterpreter, silent) } diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala index 707c31d..9d0af3a 100644 --- a/project/SparkBuild.scala +++ b/project/SparkBuild.scala @@ -819,6 +819,7 @@ object Unidoc { .map(_.filterNot(_.getName.contains("$"))) .map(_.filterNot(_.getCanonicalPath.contains("org/apache/spark/deploy"))) .map(_.filterNot(_.getCanonicalPath.contains("org/apache/spark/examples"))) + .map(_.filterNot(_.getCanonicalPath.contains("org/apache/spark/internal"))) .map(_.filterNot(_.getCanonicalPath.contains("org/apache/spark/memory"))) .map(_.filterNot(_.getCanonicalPath.contains("org/apache/spark/network"))) .map(_.filterNot(f => diff --git a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/ContinuousPartitionReader.java b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/ContinuousPartitionReader.java index 8bd5273..c2ad9ec 100644 --- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/ContinuousPartitionReader.java +++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/ContinuousPartitionReader.java @@ -22,6 +22,8 @@ import org.apache.spark.sql.connector.read.PartitionReader; /** * A variation on {@link PartitionReader} for use with continuous streaming processing. + * + * @since 3.0.0 */ @Evolving public interface ContinuousPartitionReader extends PartitionReader { diff --git a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/ContinuousPartitionReaderFactory.java b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/ContinuousPartitionReaderFactory.java index 962864d..385c6f
[spark] branch master updated (a6b91d2 -> e2ebca7)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a6b91d2 [SPARK-30556][SQL][FOLLOWUP] Reset the status changed in SQLExecution.withThreadLocalCaptured add e2ebca7 [SPARK-30779][SS] Fix some API issues found when reviewing Structured Streaming API docs No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/internal/Logging.scala | 2 +- project/SparkBuild.scala| 1 + .../spark/sql/connector/read/streaming/ContinuousPartitionReader.java | 2 ++ .../sql/connector/read/streaming/ContinuousPartitionReaderFactory.java | 2 ++ .../org/apache/spark/sql/connector/read/streaming/ContinuousStream.java | 2 ++ .../org/apache/spark/sql/connector/read/streaming/MicroBatchStream.java | 2 ++ .../main/java/org/apache/spark/sql/connector/read/streaming/Offset.java | 2 ++ .../org/apache/spark/sql/connector/read/streaming/PartitionOffset.java | 2 ++ .../java/org/apache/spark/sql/connector/read/streaming/ReadLimit.java | 1 + .../org/apache/spark/sql/connector/read/streaming/SparkDataStream.java | 2 ++ .../spark/sql/connector/write/streaming/StreamingDataWriterFactory.java | 2 ++ .../org/apache/spark/sql/connector/write/streaming/StreamingWrite.java | 2 ++ sql/catalyst/src/main/scala/org/apache/spark/sql/util/ArrowUtils.scala | 2 +- 13 files changed, 22 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a6b91d2 -> e2ebca7)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a6b91d2 [SPARK-30556][SQL][FOLLOWUP] Reset the status changed in SQLExecution.withThreadLocalCaptured add e2ebca7 [SPARK-30779][SS] Fix some API issues found when reviewing Structured Streaming API docs No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/internal/Logging.scala | 2 +- project/SparkBuild.scala| 1 + .../spark/sql/connector/read/streaming/ContinuousPartitionReader.java | 2 ++ .../sql/connector/read/streaming/ContinuousPartitionReaderFactory.java | 2 ++ .../org/apache/spark/sql/connector/read/streaming/ContinuousStream.java | 2 ++ .../org/apache/spark/sql/connector/read/streaming/MicroBatchStream.java | 2 ++ .../main/java/org/apache/spark/sql/connector/read/streaming/Offset.java | 2 ++ .../org/apache/spark/sql/connector/read/streaming/PartitionOffset.java | 2 ++ .../java/org/apache/spark/sql/connector/read/streaming/ReadLimit.java | 1 + .../org/apache/spark/sql/connector/read/streaming/SparkDataStream.java | 2 ++ .../spark/sql/connector/write/streaming/StreamingDataWriterFactory.java | 2 ++ .../org/apache/spark/sql/connector/write/streaming/StreamingWrite.java | 2 ++ sql/catalyst/src/main/scala/org/apache/spark/sql/util/ArrowUtils.scala | 2 +- 13 files changed, 22 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (acfdb46 -> 4439b29)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from acfdb46 [SPARK-27946][SQL][FOLLOW-UP] Change doc and error message for SHOW CREATE TABLE add 4439b29 Revert "[SPARK-30245][SQL] Add cache for Like and RLike when pattern is not static" No new revisions were added by this update. Summary of changes: .../catalyst/expressions/regexpExpressions.scala| 21 ++--- 1 file changed, 6 insertions(+), 15 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b877aac -> 9f8172e)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b877aac [SPARK-30684 ][WEBUI][FollowUp] A new approach for SPARK-30684 add 9f8172e Revert "[SPARK-29721][SQL] Prune unnecessary nested fields from Generate without Project No new revisions were added by this update. Summary of changes: .../catalyst/optimizer/NestedColumnAliasing.scala | 47 -- .../spark/sql/catalyst/optimizer/Optimizer.scala | 43 +++- .../execution/datasources/SchemaPruningSuite.scala | 32 --- 3 files changed, 25 insertions(+), 97 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-30719][SQL] do not log warning if AQE is intentionally skipped and add a config to force apply
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new b29cb1a [SPARK-30719][SQL] do not log warning if AQE is intentionally skipped and add a config to force apply b29cb1a is described below commit b29cb1a82b1a1facf1dd040025db93d998dad4cd Author: Wenchen Fan AuthorDate: Thu Feb 6 09:16:14 2020 -0800 [SPARK-30719][SQL] do not log warning if AQE is intentionally skipped and add a config to force apply ### What changes were proposed in this pull request? Update `InsertAdaptiveSparkPlan` to not log warning if AQE is skipped intentionally. This PR also add a config to not skip AQE. ### Why are the changes needed? It's not a warning at all if we intentionally skip AQE. ### Does this PR introduce any user-facing change? no ### How was this patch tested? run `AdaptiveQueryExecSuite` locally and verify that there is no warning logs. Closes #27452 from cloud-fan/aqe. Authored-by: Wenchen Fan Signed-off-by: Xiao Li (cherry picked from commit 8ce58627ebe4f0372fba9a30d8cd4213611acd9b) Signed-off-by: Xiao Li --- .../org/apache/spark/sql/internal/SQLConf.scala| 9 +++ .../adaptive/InsertAdaptiveSparkPlan.scala | 83 -- .../adaptive/AdaptiveQueryExecSuite.scala | 9 +++ 3 files changed, 65 insertions(+), 36 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index acc0922..bed8410 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -358,6 +358,15 @@ object SQLConf { .booleanConf .createWithDefault(false) + val ADAPTIVE_EXECUTION_FORCE_APPLY = buildConf("spark.sql.adaptive.forceApply") +.internal() +.doc("Adaptive query execution is skipped when the query does not have exchanges or " + + "sub-queries. By setting this config to true (together with " + + s"'${ADAPTIVE_EXECUTION_ENABLED.key}' enabled), Spark will force apply adaptive query " + + "execution for all supported queries.") +.booleanConf +.createWithDefault(false) + val REDUCE_POST_SHUFFLE_PARTITIONS_ENABLED = buildConf("spark.sql.adaptive.shuffle.reducePostShufflePartitions.enabled") .doc(s"When true and '${ADAPTIVE_EXECUTION_ENABLED.key}' is enabled, this enables reducing " + diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/InsertAdaptiveSparkPlan.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/InsertAdaptiveSparkPlan.scala index 9252827..621c063 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/InsertAdaptiveSparkPlan.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/InsertAdaptiveSparkPlan.scala @@ -40,49 +40,60 @@ case class InsertAdaptiveSparkPlan( private val conf = adaptiveExecutionContext.session.sessionState.conf - def containShuffle(plan: SparkPlan): Boolean = { -plan.find { - case _: Exchange => true - case s: SparkPlan => !s.requiredChildDistribution.forall(_ == UnspecifiedDistribution) -}.isDefined - } - - def containSubQuery(plan: SparkPlan): Boolean = { -plan.find(_.expressions.exists(_.find { - case _: SubqueryExpression => true - case _ => false -}.isDefined)).isDefined - } - override def apply(plan: SparkPlan): SparkPlan = applyInternal(plan, false) private def applyInternal(plan: SparkPlan, isSubquery: Boolean): SparkPlan = plan match { +case _ if !conf.adaptiveExecutionEnabled => plan case _: ExecutedCommandExec => plan -case _ if conf.adaptiveExecutionEnabled && supportAdaptive(plan) - && (isSubquery || containShuffle(plan) || containSubQuery(plan)) => - try { -// Plan sub-queries recursively and pass in the shared stage cache for exchange reuse. Fall -// back to non-adaptive mode if adaptive execution is supported in any of the sub-queries. -val subqueryMap = buildSubqueryMap(plan) -val planSubqueriesRule = PlanAdaptiveSubqueries(subqueryMap) -val preprocessingRules = Seq( - planSubqueriesRule) -// Run pre-processing rules. -val newPlan = AdaptiveSparkPlanExec.applyPhysicalRules(plan, preprocessingRules) -logDebug(s"Adaptive execution enabled for plan: $plan") -AdaptiveSparkPlanExec(newPlan, adaptiveExecutionContext, preprocessingRules, isSubquery) - } catch {
[spark] branch master updated (d861357 -> 8ce5862)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d861357 [SPARK-26700][CORE][FOLLOWUP] Add config `spark.network.maxRemoteBlockSizeFetchToMem` add 8ce5862 [SPARK-30719][SQL] do not log warning if AQE is intentionally skipped and add a config to force apply No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/internal/SQLConf.scala| 9 +++ .../adaptive/InsertAdaptiveSparkPlan.scala | 83 -- .../adaptive/AdaptiveQueryExecSuite.scala | 9 +++ 3 files changed, 65 insertions(+), 36 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d861357 -> 8ce5862)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d861357 [SPARK-26700][CORE][FOLLOWUP] Add config `spark.network.maxRemoteBlockSizeFetchToMem` add 8ce5862 [SPARK-30719][SQL] do not log warning if AQE is intentionally skipped and add a config to force apply No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/internal/SQLConf.scala| 9 +++ .../adaptive/InsertAdaptiveSparkPlan.scala | 83 -- .../adaptive/AdaptiveQueryExecSuite.scala | 9 +++ 3 files changed, 65 insertions(+), 36 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 created (now da32d1e)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. at da32d1e [SPARK-30700][ML] NaiveBayesModel predict optimization No new revisions were added by this update. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 created (now da32d1e)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. at da32d1e [SPARK-30700][ML] NaiveBayesModel predict optimization No new revisions were added by this update. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d0c3e9f -> 8eecc20)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d0c3e9f [SPARK-30660][ML][PYSPARK] LinearRegression blockify input vectors add 8eecc20 [SPARK-27946][SQL] Hive DDL to Spark DDL conversion USING "show create table" No new revisions were added by this update. Summary of changes: docs/sql-migration-guide.md| 2 + .../apache/spark/sql/catalyst/parser/SqlBase.g4| 2 +- .../spark/sql/catalyst/parser/AstBuilder.scala | 2 +- .../sql/catalyst/plans/logical/statements.scala| 4 +- .../catalyst/analysis/ResolveSessionCatalog.scala | 6 +- .../spark/sql/execution/command/tables.scala | 285 -- .../org/apache/spark/sql/internal/HiveSerDe.scala | 16 + .../sql-tests/inputs/show-create-table.sql | 11 +- .../sql-tests/results/show-create-table.sql.out| 34 ++- .../apache/spark/sql/ShowCreateTableSuite.scala| 16 +- .../spark/sql/hive/HiveShowCreateTableSuite.scala | 327 - 11 files changed, 581 insertions(+), 124 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (2d4b5ea -> 82b4f75)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2d4b5ea [SPARK-30676][CORE][TESTS] Eliminate warnings from deprecated constructors of java.lang.Integer and java.lang.Double add 82b4f75 [SPARK-30508][SQL] Add SparkSession.executeCommand API for external datasource No new revisions were added by this update. Summary of changes: ...upportsRead.java => ExternalCommandRunner.java} | 30 +++-- .../scala/org/apache/spark/sql/SparkSession.scala | 31 +- .../spark/sql/execution/command/commands.scala | 30 ++--- .../sql/sources/ExternalCommandRunnerSuite.scala | 50 ++ 4 files changed, 120 insertions(+), 21 deletions(-) copy sql/catalyst/src/main/java/org/apache/spark/sql/connector/{catalog/SupportsRead.java => ExternalCommandRunner.java} (51%) create mode 100644 sql/core/src/test/scala/org/apache/spark/sql/sources/ExternalCommandRunnerSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4847f73 -> 3f76bd4)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4847f73 [SPARK-30298][SQL] Respect aliases in output partitioning of projects and aggregates add 3f76bd4 [SPARK-27083][SQL][FOLLOW-UP] Rename spark.sql.subquery.reuse to spark.sql.execution.subquery.reuse.enabled No new revisions were added by this update. Summary of changes: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d2bca8f -> db528e4)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d2bca8f [SPARK-30609] Allow default merge command resolution to be bypassed by DSv2 tables add db528e4 [SPARK-30535][SQL] Revert "[] Migrate ALTER TABLE commands to the new framework No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/Analyzer.scala | 25 +-- .../sql/catalyst/analysis/CheckAnalysis.scala | 41 ++--- .../sql/catalyst/analysis/ResolveCatalogs.scala| 67 +++- .../spark/sql/catalyst/analysis/unresolved.scala | 23 +++ .../sql/catalyst/analysis/v2ResolutionPlans.scala | 14 +- .../spark/sql/catalyst/parser/AstBuilder.scala | 50 +++--- .../sql/catalyst/plans/logical/statements.scala| 56 +++ .../sql/catalyst/plans/logical/v2Commands.scala| 138 +++- .../sql/connector/catalog/CatalogV2Util.scala | 14 +- .../spark/sql/catalyst/parser/DDLParserSuite.scala | 90 +-- .../catalyst/analysis/ResolveSessionCatalog.scala | 178 +++-- .../spark/sql/execution/command/tables.scala | 8 + .../datasources/v2/DataSourceV2Strategy.scala | 14 +- .../sql-tests/results/change-column.sql.out| 4 +- .../spark/sql/connector/DataSourceV2SQLSuite.scala | 2 +- .../apache/spark/sql/execution/SQLViewSuite.scala | 8 +- .../spark/sql/execution/command/DDLSuite.scala | 5 +- .../execution/command/PlanResolutionSuite.scala| 47 +++--- .../sql/hive/execution/HiveCommandSuite.scala | 2 +- 19 files changed, 462 insertions(+), 324 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d2bca8f -> db528e4)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d2bca8f [SPARK-30609] Allow default merge command resolution to be bypassed by DSv2 tables add db528e4 [SPARK-30535][SQL] Revert "[] Migrate ALTER TABLE commands to the new framework No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/Analyzer.scala | 25 +-- .../sql/catalyst/analysis/CheckAnalysis.scala | 41 ++--- .../sql/catalyst/analysis/ResolveCatalogs.scala| 67 +++- .../spark/sql/catalyst/analysis/unresolved.scala | 23 +++ .../sql/catalyst/analysis/v2ResolutionPlans.scala | 14 +- .../spark/sql/catalyst/parser/AstBuilder.scala | 50 +++--- .../sql/catalyst/plans/logical/statements.scala| 56 +++ .../sql/catalyst/plans/logical/v2Commands.scala| 138 +++- .../sql/connector/catalog/CatalogV2Util.scala | 14 +- .../spark/sql/catalyst/parser/DDLParserSuite.scala | 90 +-- .../catalyst/analysis/ResolveSessionCatalog.scala | 178 +++-- .../spark/sql/execution/command/tables.scala | 8 + .../datasources/v2/DataSourceV2Strategy.scala | 14 +- .../sql-tests/results/change-column.sql.out| 4 +- .../spark/sql/connector/DataSourceV2SQLSuite.scala | 2 +- .../apache/spark/sql/execution/SQLViewSuite.scala | 8 +- .../spark/sql/execution/command/DDLSuite.scala | 5 +- .../execution/command/PlanResolutionSuite.scala| 47 +++--- .../sql/hive/execution/HiveCommandSuite.scala | 2 +- 19 files changed, 462 insertions(+), 324 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8e280ce -> 6dfaa07)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8e280ce [SPARK-30592][SQL] Interval support for csv and json funtions add 6dfaa07 [SPARK-30549][SQL] Fix the subquery shown issue in UI When enable AQE No new revisions were added by this update. Summary of changes: .../execution/adaptive/AdaptiveSparkPlanExec.scala | 51 +- .../sql/execution/ui/SQLAppStatusListener.scala| 9 .../spark/sql/execution/ui/SQLListener.scala | 6 +++ 3 files changed, 55 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8e280ce -> 6dfaa07)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8e280ce [SPARK-30592][SQL] Interval support for csv and json funtions add 6dfaa07 [SPARK-30549][SQL] Fix the subquery shown issue in UI When enable AQE No new revisions were added by this update. Summary of changes: .../execution/adaptive/AdaptiveSparkPlanExec.scala | 51 +- .../sql/execution/ui/SQLAppStatusListener.scala| 9 .../spark/sql/execution/ui/SQLListener.scala | 6 +++ 3 files changed, 55 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8a926e4 -> 883ae33)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8a926e4 [SPARK-26736][SQL] Partition pruning through nondeterministic expressions in Hive tables add 883ae33 [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework No new revisions were added by this update. Summary of changes: .../apache/spark/sql/catalyst/parser/SqlBase.g4| 2 +- .../spark/sql/catalyst/analysis/Analyzer.scala | 45 - .../sql/catalyst/analysis/CheckAnalysis.scala | 10 +-- .../sql/catalyst/analysis/ResolveCatalogs.scala| 8 --- .../spark/sql/catalyst/analysis/namespace.scala| 33 -- .../apache/spark/sql/catalyst/analysis/table.scala | 33 -- .../sql/catalyst/analysis/v2ResolutionPlans.scala | 76 ++ .../spark/sql/catalyst/parser/AstBuilder.scala | 8 +-- .../sql/catalyst/plans/logical/statements.scala| 8 --- .../sql/catalyst/plans/logical/v2Commands.scala| 12 ++-- .../sql/connector/catalog/CatalogV2Implicits.scala | 8 +++ .../spark/sql/catalyst/parser/DDLParserSuite.scala | 10 +-- .../catalyst/analysis/ResolveSessionCatalog.scala | 25 ++- .../datasources/v2/DataSourceV2Strategy.scala | 9 ++- .../datasources/v2/V2SessionCatalog.scala | 2 +- .../resources/sql-tests/results/describe.sql.out | 3 +- .../org/apache/spark/sql/SQLQueryTestSuite.scala | 2 +- .../spark/sql/connector/DataSourceV2SQLSuite.scala | 5 ++ .../spark/sql/execution/SparkSqlParserSuite.scala | 4 +- .../execution/command/PlanResolutionSuite.scala| 62 -- .../sql/hive/execution/HiveComparisonTest.scala| 2 +- 21 files changed, 186 insertions(+), 181 deletions(-) delete mode 100644 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/namespace.scala delete mode 100644 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/table.scala create mode 100644 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/v2ResolutionPlans.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c49abf8 -> af2d3d0)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c49abf8 [SPARK-30417][CORE] Task speculation numTaskThreshold should be greater than 0 even EXECUTOR_CORES is not set under Standalone mode add af2d3d0 [SPARK-30315][SQL] Add adaptive execution context No new revisions were added by this update. Summary of changes: .../spark/sql/execution/QueryExecution.scala | 4 +-- .../execution/adaptive/AdaptiveSparkPlanExec.scala | 42 +++--- .../adaptive/InsertAdaptiveSparkPlan.scala | 20 --- 3 files changed, 38 insertions(+), 28 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e645125 -> be4faaf)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e645125 [SPARK-30267][SQL] Avro arrays can be of any List add be4faaf Revert "[SPARK-23264][SQL] Make INTERVAL keyword optional when ANSI enabled" No new revisions were added by this update. Summary of changes: docs/sql-keywords.md | 14 +- .../apache/spark/sql/catalyst/parser/SqlBase.g4| 52 +--- .../catalyst/parser/ExpressionParserSuite.scala| 36 + .../parser/TableIdentifierParserSuite.scala| 28 +--- .../resources/sql-tests/inputs/ansi/interval.sql | 18 +-- .../sql-tests/results/ansi/interval.sql.out| 148 + .../resources/sql-tests/results/interval.sql.out | 8 +- 7 files changed, 16 insertions(+), 288 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e645125 -> be4faaf)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e645125 [SPARK-30267][SQL] Avro arrays can be of any List add be4faaf Revert "[SPARK-23264][SQL] Make INTERVAL keyword optional when ANSI enabled" No new revisions were added by this update. Summary of changes: docs/sql-keywords.md | 14 +- .../apache/spark/sql/catalyst/parser/SqlBase.g4| 52 +--- .../catalyst/parser/ExpressionParserSuite.scala| 36 + .../parser/TableIdentifierParserSuite.scala| 28 +--- .../resources/sql-tests/inputs/ansi/interval.sql | 18 +-- .../sql-tests/results/ansi/interval.sql.out| 148 + .../resources/sql-tests/results/interval.sql.out | 8 +- 7 files changed, 16 insertions(+), 288 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-website] branch asf-site updated: Update the code freeze date of SPARK 3.0 (#247)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 03347c3 Update the code freeze date of SPARK 3.0 (#247) 03347c3 is described below commit 03347c31d283d86c7f3c7fe046678f7f0c0603da Author: Xiao Li AuthorDate: Thu Jan 2 16:03:55 2020 -0800 Update the code freeze date of SPARK 3.0 (#247) This PR is to update the code freeze date of SPARK 3.0 based on the [discussion](http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-3-0-branch-cut-and-code-freeze-on-Jan-31-td28575.html) in the mailing list --- site/versioning-policy.html | 6 +++--- versioning-policy.md| 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/site/versioning-policy.html b/site/versioning-policy.html index 9336430..d9d98bd 100644 --- a/site/versioning-policy.html +++ b/site/versioning-policy.html @@ -266,15 +266,15 @@ in between feature releases. Major releases do not happen according to a fixed s Preview release - Early Dec 2019 + 01/31/2020 Code freeze. Release branch cut. - Late Dec 2019 + Early Feb 2020 QA period. Focus on bug fixes, tests, stability and docs. Generally, no new features merged. - Jan 2020 + Mid Feb 2020 Release candidates (RC), voting, etc. until final release passes diff --git a/versioning-policy.md b/versioning-policy.md index c8ae5ce..8037a59 100644 --- a/versioning-policy.md +++ b/versioning-policy.md @@ -61,9 +61,9 @@ in between feature releases. Major releases do not happen according to a fixed s | Date | Event | | - | - | | Late Oct 2019 | Preview release | -| Early Dec 2019 | Code freeze. Release branch cut.| -| Late Dec 2019 | QA period. Focus on bug fixes, tests, stability and docs. Generally, no new features merged.| -| Jan 2020 | Release candidates (RC), voting, etc. until final release passes| +| 01/31/2020 | Code freeze. Release branch cut.| +| Early Feb 2020 | QA period. Focus on bug fixes, tests, stability and docs. Generally, no new features merged.| +| Mid Feb 2020 | Release candidates (RC), voting, etc. until final release passes| Maintenance Releases and EOL - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (724dcf0 -> 919d551)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 724dcf0 [SPARK-30342][SQL][DOC] Update LIST FILE/JAR command Documentation add 919d551 Revert "[SPARK-29390][SQL] Add the justify_days(), justify_hours() and justif_interval() functions" No new revisions were added by this update. Summary of changes: .../sql/catalyst/analysis/FunctionRegistry.scala | 3 - .../catalyst/expressions/intervalExpressions.scala | 68 --- .../spark/sql/catalyst/util/IntervalUtils.scala| 36 -- .../sql/catalyst/util/IntervalUtilsSuite.scala | 25 - .../test/resources/sql-tests/inputs/interval.sql | 14 - .../sql-tests/inputs/postgreSQL/interval.sql | 8 +- .../sql-tests/results/ansi/interval.sql.out| 570 + .../resources/sql-tests/results/interval.sql.out | 498 -- .../sql-tests/results/postgreSQL/interval.sql.out | 186 +++ 9 files changed, 524 insertions(+), 884 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-website] branch asf-site updated: Update the release note of Spark 3.0 preview-2 (#246)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new e0c5ca5 Update the release note of Spark 3.0 preview-2 (#246) e0c5ca5 is described below commit e0c5ca50df47227d890106d8a3ab33af005b0a87 Author: Xiao Li AuthorDate: Tue Dec 24 15:21:27 2019 -0800 Update the release note of Spark 3.0 preview-2 (#246) This PR is to address the comments in https://github.com/apache/spark-website/pull/245 and update the news of Spark 3.0 preview-2 release. --- news/_posts/2019-12-23-spark-3.0.0-preview2.md | 4 ++-- site/news/spark-3.0.0-preview2.html| 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/news/_posts/2019-12-23-spark-3.0.0-preview2.md b/news/_posts/2019-12-23-spark-3.0.0-preview2.md index d6ae930..63af801 100644 --- a/news/_posts/2019-12-23-spark-3.0.0-preview2.md +++ b/news/_posts/2019-12-23-spark-3.0.0-preview2.md @@ -11,6 +11,6 @@ meta: _edit_last: '4' _wpas_done_all: '1' --- -To enable wide-scale community testing of the upcoming Spark 3.0 release, the Apache Spark community has posted a https://archive.apache.org/dist/spark/spark-3.0.0-preview2/";>Spark 3.0.0 preview2 released. This preview is not a stable release in terms of either API or functionality, but it is meant to give the community early access to try the code that will become Spark 3.0. If you would like to test the release, please download it, and send feedback using either the [...] +To enable wide-scale community testing of the upcoming Spark 3.0 release, the Apache Spark community has posted a https://archive.apache.org/dist/spark/spark-3.0.0-preview2/";>Spark 3.0.0 preview2 release. This preview is not a stable release in terms of either API or functionality, but it is meant to give the community early access to try the code that will become Spark 3.0. If you would like to test the release, please download it, and send feedback using either the [...] -The Spark issue tracker already contains a list of https://issues.apache.org/jira/browse/SPARK-26078?jql=statusCategory%20%3D%20done%20AND%20project%20%3D%2012315420%20AND%20fixVersion%20%3D%2012339177%20ORDER%20BY%20priority%20DESC%2C%20key%20ASC";>features in 3.0. \ No newline at end of file +The Spark issue tracker already contains a list of https://issues.apache.org/jira/sr/jira.issueviews:searchrequest-printable/temp/SearchRequest.html?jqlQuery=statusCategory+%3D+done+AND+project+%3D+12315420+AND+fixVersion+%3D+12339177+ORDER+BY+priority+DESC%2C+key+ASC&tempMax=1000";>features in 3.0. \ No newline at end of file diff --git a/site/news/spark-3.0.0-preview2.html b/site/news/spark-3.0.0-preview2.html index 7cca75e..a6ebb52 100644 --- a/site/news/spark-3.0.0-preview2.html +++ b/site/news/spark-3.0.0-preview2.html @@ -203,9 +203,9 @@ Preview release of Spark 3.0 -To enable wide-scale community testing of the upcoming Spark 3.0 release, the Apache Spark community has posted a https://archive.apache.org/dist/spark/spark-3.0.0-preview2/";>Spark 3.0.0 preview2 released. This preview is not a stable release in terms of either API or functionality, but it is meant to give the community early access to try the code that will become Spark 3.0. If you would like to test the release, please download it, and send feedback using either [...] +To enable wide-scale community testing of the upcoming Spark 3.0 release, the Apache Spark community has posted a https://archive.apache.org/dist/spark/spark-3.0.0-preview2/";>Spark 3.0.0 preview2 release. This preview is not a stable release in terms of either API or functionality, but it is meant to give the community early access to try the code that will become Spark 3.0. If you would like to test the release, please download it, and send feedback using either t [...] -The Spark issue tracker already contains a list of https://issues.apache.org/jira/browse/SPARK-26078?jql=statusCategory%20%3D%20done%20AND%20project%20%3D%2012315420%20AND%20fixVersion%20%3D%2012339177%20ORDER%20BY%20priority%20DESC%2C%20key%20ASC";>features in 3.0. +The Spark issue tracker already contains a list of https://issues.apache.org/jira/sr/jira.issueviews:searchrequest-printable/temp/SearchRequest.html?jqlQuery=statusCategory+%3D+done+AND+project+%3D+12315420+AND+fixVersion+%3D+12339177+ORDER+BY+priority+DESC%2C+key+ASC&tempMax=1000";>features in 3.0. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (18e8d1d -> a296d15)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 18e8d1d [SPARK-30307][SQL] remove ReusedQueryStageExec add a296d15 [SPARK-30291] catch the exception when doing materialize in AQE No new revisions were added by this update. Summary of changes: .../execution/adaptive/AdaptiveSparkPlanExec.scala | 18 ++-- .../adaptive/AdaptiveQueryExecSuite.scala | 25 ++ 2 files changed, 36 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (18e8d1d -> a296d15)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 18e8d1d [SPARK-30307][SQL] remove ReusedQueryStageExec add a296d15 [SPARK-30291] catch the exception when doing materialize in AQE No new revisions were added by this update. Summary of changes: .../execution/adaptive/AdaptiveSparkPlanExec.scala | 18 ++-- .../adaptive/AdaptiveQueryExecSuite.scala | 25 ++ 2 files changed, 36 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (726f6d3 -> 18e8d1d)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 726f6d3 [SPARK-30184][SQL] Implement a helper method for aliasing functions add 18e8d1d [SPARK-30307][SQL] remove ReusedQueryStageExec No new revisions were added by this update. Summary of changes: .../execution/adaptive/AdaptiveSparkPlanExec.scala | 17 +-- .../adaptive/DemoteBroadcastHashJoin.scala | 4 +- .../adaptive/LogicalQueryStageStrategy.scala | 4 +- .../adaptive/OptimizeLocalShuffleReader.scala | 56 ++ .../sql/execution/adaptive/QueryStageExec.scala| 116 - .../adaptive/ReduceNumShufflePartitions.scala | 22 ++-- .../spark/sql/execution/exchange/Exchange.scala| 2 +- .../ReduceNumShufflePartitionsSuite.scala | 9 +- .../adaptive/AdaptiveQueryExecSuite.scala | 9 +- 9 files changed, 108 insertions(+), 131 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9cd174a -> 9459833)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9cd174a Revert "[SPARK-28461][SQL] Pad Decimal numbers with trailing zeros to the scale of the column" add 9459833 [SPARK-29989][INFRA] Add `hadoop-2.7/hive-2.3` pre-built distribution No new revisions were added by this update. Summary of changes: dev/create-release/release-build.sh | 1 + 1 file changed, 1 insertion(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (2dd6807 -> 6e581cf)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2dd6807 [SPARK-28023][SQL] Add trim logic in UTF8String's toInt/toLong to make it consistent with other string-numeric casting add 6e581cf [SPARK-29893][SQL][FOLLOWUP] code cleanup for local shuffle reader No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/internal/SQLConf.scala| 4 +- .../execution/adaptive/LocalShuffledRowRDD.scala | 32 ++- .../adaptive/OptimizeLocalShuffleReader.scala | 98 +++--- .../execution/exchange/ShuffleExchangeExec.scala | 4 +- .../adaptive/AdaptiveQueryExecSuite.scala | 13 ++- 5 files changed, 87 insertions(+), 64 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (2dd6807 -> 6e581cf)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2dd6807 [SPARK-28023][SQL] Add trim logic in UTF8String's toInt/toLong to make it consistent with other string-numeric casting add 6e581cf [SPARK-29893][SQL][FOLLOWUP] code cleanup for local shuffle reader No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/internal/SQLConf.scala| 4 +- .../execution/adaptive/LocalShuffledRowRDD.scala | 32 ++- .../adaptive/OptimizeLocalShuffleReader.scala | 98 +++--- .../execution/exchange/ShuffleExchangeExec.scala | 4 +- .../adaptive/AdaptiveQueryExecSuite.scala | 13 ++- 5 files changed, 87 insertions(+), 64 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (6fb8b86 -> 3d2a6f4)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 6fb8b86 [SPARK-29913][SQL] Improve Exception in postgreCastToBoolean add 3d2a6f4 [SPARK-29906][SQL] AQE should not introduce extra shuffle for outermost limit No new revisions were added by this update. Summary of changes: .../execution/adaptive/AdaptiveSparkPlanExec.scala | 23 ++ .../adaptive/AdaptiveQueryExecSuite.scala | 21 2 files changed, 36 insertions(+), 8 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (6fb8b86 -> 3d2a6f4)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 6fb8b86 [SPARK-29913][SQL] Improve Exception in postgreCastToBoolean add 3d2a6f4 [SPARK-29906][SQL] AQE should not introduce extra shuffle for outermost limit No new revisions were added by this update. Summary of changes: .../execution/adaptive/AdaptiveSparkPlanExec.scala | 23 ++ .../adaptive/AdaptiveQueryExecSuite.scala | 21 2 files changed, 36 insertions(+), 8 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7cfd589 -> 1e2d76e)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7cfd589 [SPARK-28893][SQL] Support MERGE INTO in the parser and add the corresponding logical plan add 1e2d76e [HOT-FIX] Fix the SQLBase.g4 No new revisions were added by this update. Summary of changes: .../src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 | 4 1 file changed, 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (782992c -> 1f3863c)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 782992c [SPARK-29642][SS] Change the element type of underlying array to UnsafeRow for ContinuousRecordEndpoint add 1f3863c [SPARK-29759][SQL] LocalShuffleReaderExec.outputPartitioning should use the corrected attributes No new revisions were added by this update. Summary of changes: .../adaptive/OptimizeLocalShuffleReader.scala | 36 +- .../sql/execution/adaptive/QueryStageExec.scala| 8 +++-- .../adaptive/ReduceNumShufflePartitions.scala | 11 +-- 3 files changed, 28 insertions(+), 27 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4615769 -> 4110153)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4615769 [SPARK-29603][YARN] Support application priority for YARN priority scheduling add 4110153 [SPARK-29752][SQL][TEST] make AdaptiveQueryExecSuite more robust No new revisions were added by this update. Summary of changes: .../adaptive/AdaptiveQueryExecSuite.scala | 69 +++--- 1 file changed, 34 insertions(+), 35 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (948a6e8 -> ef1e849)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 948a6e8 [SPARK-28892][SQL][FOLLOWUP] add resolved logical plan for UPDATE TABLE add ef1e849 [SPARK-29366][SQL] Subqueries created for DPP are not printed in EXPLAIN FORMATTED No new revisions were added by this update. Summary of changes: .../apache/spark/sql/execution/ExplainUtils.scala | 4 +- .../scala/org/apache/spark/sql/ExplainSuite.scala | 49 ++ 2 files changed, 51 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ffddfc8 -> 948a6e8)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ffddfc8 [SPARK-29269][PYTHON][ML] Pyspark ALSModel support getters/setters add 948a6e8 [SPARK-28892][SQL][FOLLOWUP] add resolved logical plan for UPDATE TABLE No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/Analyzer.scala | 4 +- .../sql/catalyst/analysis/CheckAnalysis.scala | 8 +-- .../sql/catalyst/analysis/ResolveCatalogs.scala| 8 ++- .../spark/sql/catalyst/expressions/literals.scala | 10 +++ .../plans/logical/basicLogicalOperators.scala | 19 - .../plans/logical/sql/UpdateTableStatement.scala | 2 +- .../sql/catalyst/analysis/AnalysisErrorSuite.scala | 2 +- .../spark/sql/execution/SparkStrategies.scala | 2 + .../spark/sql/connector/DataSourceV2SQLSuite.scala | 76 +++- .../execution/command/PlanResolutionSuite.scala| 82 +++--- 10 files changed, 154 insertions(+), 59 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (aedf090a -> 8fabbab)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from aedf090a [SPARK-25468][WEBUI][FOLLOWUP] Current page index keep style with dataTable in the spark UI add 8fabbab [SPARK-29350] Fix BroadcastExchange reuse in Dynamic Partition Pruning No new revisions were added by this update. Summary of changes: .../spark/sql/execution/exchange/Exchange.scala| 18 +--- .../spark/sql/DynamicPartitionPruningSuite.scala | 25 +- 2 files changed, 35 insertions(+), 8 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (67d5b9b -> 3170011)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 67d5b9b [SPARK-29172][SQL] Fix some exception issue of explain commands add 3170011 [SPARK-28476][SQL] Support ALTER DATABASE SET LOCATION No new revisions were added by this update. Summary of changes: .../apache/spark/sql/catalyst/parser/SqlBase.g4| 2 ++ .../spark/sql/execution/SparkSqlParser.scala | 16 ++ .../apache/spark/sql/execution/command/ddl.scala | 21 ++ .../sql/execution/command/DDLParserSuite.scala | 9 .../spark/sql/execution/command/DDLSuite.scala | 25 +- .../spark/sql/hive/client/HiveClientImpl.scala | 7 ++ .../spark/sql/hive/client/VersionsSuite.scala | 17 +++ 7 files changed, 96 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (eee2e02 -> d3eb4c9)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from eee2e02 [SPARK-29165][SQL][TEST] Set log level of log generated code as ERROR in case of compile error on generated code in UT add d3eb4c9 [SPARK-28822][DOC][SQL] Document USE DATABASE in SQL Reference No new revisions were added by this update. Summary of changes: docs/_data/menu-sql.yaml| 2 ++ docs/sql-ref-syntax-qry-select-usedb.md | 60 + 2 files changed, 62 insertions(+) create mode 100644 docs/sql-ref-syntax-qry-select-usedb.md - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a6a663c -> b917a65)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a6a663c [SPARK-29141][SQL][TEST] Use SqlBasedBenchmark in SQL benchmarks add b917a65 [SPARK-28989][SQL] Add a SQLConf `spark.sql.ansi.enabled` No new revisions were added by this update. Summary of changes: docs/sql-keywords.md | 8 ++--- .../sql/catalyst/CatalystTypeConverters.scala | 2 +- .../spark/sql/catalyst/SerializerBuildHelper.scala | 2 +- .../sql/catalyst/analysis/DecimalPrecision.scala | 2 +- .../spark/sql/catalyst/encoders/RowEncoder.scala | 2 +- .../spark/sql/catalyst/expressions/Cast.scala | 8 ++--- .../sql/catalyst/expressions/aggregate/Sum.scala | 2 +- .../sql/catalyst/expressions/arithmetic.scala | 4 +-- .../catalyst/expressions/decimalExpressions.scala | 2 +- .../spark/sql/catalyst/parser/AstBuilder.scala | 2 +- .../spark/sql/catalyst/parser/ParseDriver.scala| 4 +-- .../org/apache/spark/sql/internal/SQLConf.scala| 41 ++ .../catalyst/encoders/ExpressionEncoderSuite.scala | 8 ++--- .../sql/catalyst/encoders/RowEncoderSuite.scala| 4 +-- .../expressions/ArithmeticExpressionSuite.scala| 24 ++--- .../spark/sql/catalyst/expressions/CastSuite.scala | 12 +++ .../expressions/DecimalExpressionSuite.scala | 4 +-- .../sql/catalyst/expressions/ScalaUDFSuite.scala | 4 +-- .../catalyst/parser/ExpressionParserSuite.scala| 10 +++--- .../parser/TableIdentifierParserSuite.scala| 2 +- .../resources/sql-tests/inputs/ansi/interval.sql | 4 +-- .../inputs/decimalArithmeticOperations.sql | 2 +- .../test/resources/sql-tests/inputs/pgSQL/text.sql | 6 ++-- .../sql-tests/results/ansi/interval.sql.out| 8 ++--- .../results/decimalArithmeticOperations.sql.out| 4 +-- .../resources/sql-tests/results/pgSQL/text.sql.out | 8 ++--- .../org/apache/spark/sql/DataFrameSuite.scala | 6 ++-- .../org/apache/spark/sql/SQLQueryTestSuite.scala | 8 ++--- .../thriftserver/ThriftServerQueryTestSuite.scala | 2 +- 29 files changed, 86 insertions(+), 109 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-28792][SQL][DOC] Document CREATE DATABASE statement in SQL Reference
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new dd32476 [SPARK-28792][SQL][DOC] Document CREATE DATABASE statement in SQL Reference dd32476 is described below commit dd32476a8250e82df554683e195c355459d10a5d Author: sharangk AuthorDate: Tue Sep 17 14:40:08 2019 -0700 [SPARK-28792][SQL][DOC] Document CREATE DATABASE statement in SQL Reference ### What changes were proposed in this pull request? Document CREATE DATABASE statement in SQL Reference Guide. ### Why are the changes needed? Currently Spark lacks documentation on the supported SQL constructs causing confusion among users who sometimes have to look at the code to understand the usage. This is aimed at addressing this issue. ### Does this PR introduce any user-facing change? Yes. ### Before: There was no documentation for this. ### After: ![image](https://user-images.githubusercontent.com/29914590/65037831-290e2900-d96c-11e9-8563-92e5379c3ad1.png) ![image](https://user-images.githubusercontent.com/29914590/64858915-55f9cd80-d646-11e9-91a9-16c52b1daa56.png) ### How was this patch tested? Manual Review and Tested using jykyll build --serve Closes #25595 from sharangk/createDbDoc. Lead-authored-by: sharangk Co-authored-by: Xiao Li Signed-off-by: Xiao Li --- docs/sql-ref-syntax-ddl-create-database.md | 59 +- 1 file changed, 58 insertions(+), 1 deletion(-) diff --git a/docs/sql-ref-syntax-ddl-create-database.md b/docs/sql-ref-syntax-ddl-create-database.md index bbcd34a..ed0bbf6 100644 --- a/docs/sql-ref-syntax-ddl-create-database.md +++ b/docs/sql-ref-syntax-ddl-create-database.md @@ -19,4 +19,61 @@ license: | limitations under the License. --- -**This page is under construction** +### Description +Creates a database with the specified name. If database with the same name already exists, an exception will be thrown. + +### Syntax +{% highlight sql %} +CREATE {DATABASE | SCHEMA} [ IF NOT EXISTS ] database_name + [ COMMENT database_comment ] + [ LOCATION database_directory ] + [ WITH DBPROPERTIES (property_name=property_value [ , ...]) ] +{% endhighlight %} + +### Parameters + +database_name +Specifies the name of the database to be created. + +IF NOT EXISTS +Creates a database with the given name if it doesn't exists. If a database with the same name already exists, nothing will happen. + +database_directory +Path of the file system in which the specified database is to be created. If the specified path does not exist in the underlying file system, this command creates a directory with the path. If the location is not specified, the database will be created in the default warehouse directory, whose path is configured by the static configuration spark.sql.warehouse.dir. + +database_comment +Specifies the description for the database. + +WITH DBPROPERTIES (property_name=property_value [ , ...]) +Specifies the properties for the database in key-value pairs. + + +### Examples +{% highlight sql %} +-- Create database `customer_db`. This throws exception if database with name customer_db +-- already exists. +CREATE DATABASE customer_db; + +-- Create database `customer_db` only if database with same name doesn't exist. +CREATE DATABASE IF NOT EXISTS customer_db; + +-- Create database `customer_db` only if database with same name doesn't exist with +-- `Comments`,`Specific Location` and `Database properties`. +CREATE DATABASE IF NOT EXISTS customer_db COMMENT 'This is customer database' LOCATION '/user' + WITH DBPROPERTIES (ID=001, Name='John'); + +-- Verify that properties are set. +DESCRIBE DATABASE EXTENDED customer_db; + ++-+ + | database_description_item | database_description_value | + ++-+ + | Database Name | customer_db | + | Description| This is customer database | + | Location | hdfs://hacluster/user | + | Properties | ((ID,001), (Name,John)) | + ++-+ +{% endhighlight %} + +### Related Statements +- [DESCRIBE DATABASE](sql-ref-syntax-aux-describe-database.html) +- [DROP DATABASE](sql-ref-syntax-ddl-drop-database.html) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3fc52b5 -> c6ca661)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3fc52b5 [SPARK-28950][SQL] Refine the code of DELETE add c6ca661 [SPARK-28814][SQL][DOC] Document SET/RESET in SQL Reference No new revisions were added by this update. Summary of changes: docs/sql-ref-syntax-aux-conf-mgmt-reset.md | 18 ++- docs/sql-ref-syntax-aux-conf-mgmt-set.md | 49 +- 2 files changed, 65 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (13b77e5 -> d334fee)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 13b77e5 Revert "[SPARK-29046][SQL] Fix NPE in SQLConf.get when active SparkContext is stopping" add d334fee [SPARK-28373][DOCS][WEBUI] JDBC/ODBC Server Tab No new revisions were added by this update. Summary of changes: docs/img/JDBCServer1.png | Bin 0 -> 14763 bytes docs/img/JDBCServer2.png | Bin 0 -> 45084 bytes docs/img/JDBCServer3.png | Bin 0 -> 108360 bytes docs/web-ui.md | 41 + 4 files changed, 41 insertions(+) create mode 100644 docs/img/JDBCServer1.png create mode 100644 docs/img/JDBCServer2.png create mode 100644 docs/img/JDBCServer3.png - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org