[spark] branch master updated (ebe7bb6 -> aae7310)
This is an automated email from the ASF dual-hosted git repository. yao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ebe7bb6 [SPARK-36614][CORE][UI] Correct executor loss reason caused by decommission in UI add aae7310 [SPARK-36611] Remove unused listener in HiveThriftServer2AppStatusStore No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala| 2 +- .../sql/hive/thriftserver/ui/HiveThriftServer2AppStatusStore.scala| 4 +--- .../sql/hive/thriftserver/ui/HiveThriftServer2ListenerSuite.scala | 2 +- .../apache/spark/sql/hive/thriftserver/ui/ThriftServerPageSuite.scala | 2 +- 4 files changed, 4 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch 3.0.1 created (now 3fdfce3)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch 3.0.1 in repository https://gitbox.apache.org/repos/asf/spark.git. at 3fdfce3 Preparing Spark release v3.0.0-rc3 No new revisions were added by this update. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated: [SPARK-36614][CORE][UI] Correct executor loss reason caused by decommission in UI
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new 7b64a75 [SPARK-36614][CORE][UI] Correct executor loss reason caused by decommission in UI 7b64a75 is described below commit 7b64a751db38f76efe698326c2fad288ca39918c Author: yi.wu AuthorDate: Mon Aug 30 09:09:22 2021 -0700 [SPARK-36614][CORE][UI] Correct executor loss reason caused by decommission in UI ### What changes were proposed in this pull request? Post the correct executor loss reason to UI. ### Why are the changes needed? To show the accurate loss reason. ### Does this PR introduce _any_ user-facing change? Yes. Users can see the difference from the UI. Before: https://user-images.githubusercontent.com/16397174/131341692-6f412607-87b8-405e-822d-0d28f07928da.png";> https://user-images.githubusercontent.com/16397174/131341699-f2c9de09-635f-49df-8e27-2495f34276c0.png";> After: https://user-images.githubusercontent.com/16397174/131341754-e3c93b5d-5252-4006-a4cc-94d76f41303b.png";> https://user-images.githubusercontent.com/16397174/131341761-e1e0644f-1e76-49c0-915a-26aad77ec272.png";> ### How was this patch tested? Manully tested. Closes #33868 from Ngone51/fix-executor-remove-reason. Authored-by: yi.wu Signed-off-by: Dongjoon Hyun (cherry picked from commit ebe7bb62176ac3c29b0c238e411a0dc989371c33) Signed-off-by: Dongjoon Hyun --- .../spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala b/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala index ccb5eb1..548cab9 100644 --- a/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala +++ b/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala @@ -409,8 +409,8 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: Rp totalCoreCount.addAndGet(-executorInfo.totalCores) totalRegisteredExecutors.addAndGet(-1) scheduler.executorLost(executorId, lossReason) - listenerBus.post( -SparkListenerExecutorRemoved(System.currentTimeMillis(), executorId, reason.toString)) + listenerBus.post(SparkListenerExecutorRemoved( +System.currentTimeMillis(), executorId, lossReason.toString)) case None => // SPARK-15262: If an executor is still alive even after the scheduler has removed // its metadata, we may receive a heartbeat from that executor and tell its block - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-36614][CORE][UI] Correct executor loss reason caused by decommission in UI
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 8be53c3 [SPARK-36614][CORE][UI] Correct executor loss reason caused by decommission in UI 8be53c3 is described below commit 8be53c3298c41c749663bc1dbbd5b2d65246c854 Author: yi.wu AuthorDate: Mon Aug 30 09:09:22 2021 -0700 [SPARK-36614][CORE][UI] Correct executor loss reason caused by decommission in UI ### What changes were proposed in this pull request? Post the correct executor loss reason to UI. ### Why are the changes needed? To show the accurate loss reason. ### Does this PR introduce _any_ user-facing change? Yes. Users can see the difference from the UI. Before: https://user-images.githubusercontent.com/16397174/131341692-6f412607-87b8-405e-822d-0d28f07928da.png";> https://user-images.githubusercontent.com/16397174/131341699-f2c9de09-635f-49df-8e27-2495f34276c0.png";> After: https://user-images.githubusercontent.com/16397174/131341754-e3c93b5d-5252-4006-a4cc-94d76f41303b.png";> https://user-images.githubusercontent.com/16397174/131341761-e1e0644f-1e76-49c0-915a-26aad77ec272.png";> ### How was this patch tested? Manully tested. Closes #33868 from Ngone51/fix-executor-remove-reason. Authored-by: yi.wu Signed-off-by: Dongjoon Hyun (cherry picked from commit ebe7bb62176ac3c29b0c238e411a0dc989371c33) Signed-off-by: Dongjoon Hyun --- .../spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala b/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala index 3c121c5..fd68dfa 100644 --- a/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala +++ b/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala @@ -428,8 +428,8 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: Rp totalCoreCount.addAndGet(-executorInfo.totalCores) totalRegisteredExecutors.addAndGet(-1) scheduler.executorLost(executorId, lossReason) - listenerBus.post( -SparkListenerExecutorRemoved(System.currentTimeMillis(), executorId, reason.toString)) + listenerBus.post(SparkListenerExecutorRemoved( +System.currentTimeMillis(), executorId, lossReason.toString)) case None => // SPARK-15262: If an executor is still alive even after the scheduler has removed // its metadata, we may receive a heartbeat from that executor and tell its block - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (fcc91cf -> ebe7bb6)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from fcc91cf [SPARK-36574][SQL] pushDownPredicate=false should prevent push down filters to JDBC data source add ebe7bb6 [SPARK-36614][CORE][UI] Correct executor loss reason caused by decommission in UI No new revisions were added by this update. Summary of changes: .../spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-36574][SQL] pushDownPredicate=false should prevent push down filters to JDBC data source
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new d42536a [SPARK-36574][SQL] pushDownPredicate=false should prevent push down filters to JDBC data source d42536a is described below commit d42536a6eedceba5d6572968a6dd0946cfaaffca Author: gengjiaan AuthorDate: Mon Aug 30 19:09:28 2021 +0800 [SPARK-36574][SQL] pushDownPredicate=false should prevent push down filters to JDBC data source ### What changes were proposed in this pull request? Spark SQL includes a data source that can read data from other databases using JDBC. Spark also supports the case-insensitive option `pushDownPredicate`. According to http://spark.apache.org/docs/latest/sql-data-sources-jdbc.html, If set `pushDownPredicate` to false, no filter will be pushed down to the JDBC data source and thus all filters will be handled by Spark. But I find it still be pushed down to JDBC data source. ### Why are the changes needed? Fix bug `pushDownPredicate`=false failed to prevent push down filters to JDBC data source. ### Does this PR introduce _any_ user-facing change? 'No'. The output of query will not change. ### How was this patch tested? Jenkins test. Closes #33822 from beliefer/SPARK-36574. Authored-by: gengjiaan Signed-off-by: Wenchen Fan (cherry picked from commit fcc91cfec4d939eeebfa8cd88f2791aca48645c6) Signed-off-by: Wenchen Fan --- .../sql/execution/datasources/jdbc/JDBCRDD.scala | 14 - .../execution/datasources/jdbc/JDBCRelation.scala | 8 - .../org/apache/spark/sql/jdbc/JDBCSuite.scala | 34 ++ 3 files changed, 46 insertions(+), 10 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala index f26897d..e024e4b 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala @@ -172,12 +172,12 @@ object JDBCRDD extends Logging { * * @param sc - Your SparkContext. * @param schema - The Catalyst schema of the underlying database table. - * @param requiredColumns - The names of the columns to SELECT. + * @param requiredColumns - The names of the columns or aggregate columns to SELECT. * @param filters - The filters to include in all WHERE clauses. * @param parts - An array of JDBCPartitions specifying partition ids and *per-partition WHERE clauses. * @param options - JDBC options that contains url, table and other information. - * @param outputSchema - The schema of the columns to SELECT. + * @param outputSchema - The schema of the columns or aggregate columns to SELECT. * @param groupByColumns - The pushed down group by columns. * * @return An RDD representing "SELECT requiredColumns FROM fqTable". @@ -213,8 +213,8 @@ object JDBCRDD extends Logging { } /** - * An RDD representing a table in a database accessed via JDBC. Both the - * driver code and the workers must be able to access the database; the driver + * An RDD representing a query is related to a table in a database accessed via JDBC. + * Both the driver code and the workers must be able to access the database; the driver * needs to fetch the schema while the workers need to fetch the data. */ private[jdbc] class JDBCRDD( @@ -237,11 +237,7 @@ private[jdbc] class JDBCRDD( /** * `columns`, but as a String suitable for injection into a SQL query. */ - private val columnList: String = { -val sb = new StringBuilder() -columns.foreach(x => sb.append(",").append(x)) -if (sb.isEmpty) "1" else sb.substring(1) - } + private val columnList: String = if (columns.isEmpty) "1" else columns.mkString(",") /** * `filters`, but as a WHERE clause suitable for injection into a SQL query. diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala index 60d88b6..8098fa0 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala @@ -278,12 +278,18 @@ private[sql] case class JDBCRelation( } override def buildScan(requiredColumns: Array[String], filters: Array[Filter]): RDD[Row] = { +// When pushDownPredicate is false, all Filters that need to be pushed down should be ignored +val pushedFilters = if (jdbcOptions.pushDownPredicate) { + filters +} else { + Array.empty
[spark] branch master updated: [SPARK-36574][SQL] pushDownPredicate=false should prevent push down filters to JDBC data source
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new fcc91cf [SPARK-36574][SQL] pushDownPredicate=false should prevent push down filters to JDBC data source fcc91cf is described below commit fcc91cfec4d939eeebfa8cd88f2791aca48645c6 Author: gengjiaan AuthorDate: Mon Aug 30 19:09:28 2021 +0800 [SPARK-36574][SQL] pushDownPredicate=false should prevent push down filters to JDBC data source ### What changes were proposed in this pull request? Spark SQL includes a data source that can read data from other databases using JDBC. Spark also supports the case-insensitive option `pushDownPredicate`. According to http://spark.apache.org/docs/latest/sql-data-sources-jdbc.html, If set `pushDownPredicate` to false, no filter will be pushed down to the JDBC data source and thus all filters will be handled by Spark. But I find it still be pushed down to JDBC data source. ### Why are the changes needed? Fix bug `pushDownPredicate`=false failed to prevent push down filters to JDBC data source. ### Does this PR introduce _any_ user-facing change? 'No'. The output of query will not change. ### How was this patch tested? Jenkins test. Closes #33822 from beliefer/SPARK-36574. Authored-by: gengjiaan Signed-off-by: Wenchen Fan --- .../sql/execution/datasources/jdbc/JDBCRDD.scala | 14 - .../execution/datasources/jdbc/JDBCRelation.scala | 8 - .../org/apache/spark/sql/jdbc/JDBCSuite.scala | 34 ++ 3 files changed, 46 insertions(+), 10 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala index f26897d..e024e4b 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala @@ -172,12 +172,12 @@ object JDBCRDD extends Logging { * * @param sc - Your SparkContext. * @param schema - The Catalyst schema of the underlying database table. - * @param requiredColumns - The names of the columns to SELECT. + * @param requiredColumns - The names of the columns or aggregate columns to SELECT. * @param filters - The filters to include in all WHERE clauses. * @param parts - An array of JDBCPartitions specifying partition ids and *per-partition WHERE clauses. * @param options - JDBC options that contains url, table and other information. - * @param outputSchema - The schema of the columns to SELECT. + * @param outputSchema - The schema of the columns or aggregate columns to SELECT. * @param groupByColumns - The pushed down group by columns. * * @return An RDD representing "SELECT requiredColumns FROM fqTable". @@ -213,8 +213,8 @@ object JDBCRDD extends Logging { } /** - * An RDD representing a table in a database accessed via JDBC. Both the - * driver code and the workers must be able to access the database; the driver + * An RDD representing a query is related to a table in a database accessed via JDBC. + * Both the driver code and the workers must be able to access the database; the driver * needs to fetch the schema while the workers need to fetch the data. */ private[jdbc] class JDBCRDD( @@ -237,11 +237,7 @@ private[jdbc] class JDBCRDD( /** * `columns`, but as a String suitable for injection into a SQL query. */ - private val columnList: String = { -val sb = new StringBuilder() -columns.foreach(x => sb.append(",").append(x)) -if (sb.isEmpty) "1" else sb.substring(1) - } + private val columnList: String = if (columns.isEmpty) "1" else columns.mkString(",") /** * `filters`, but as a WHERE clause suitable for injection into a SQL query. diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala index 60d88b6..8098fa0 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala @@ -278,12 +278,18 @@ private[sql] case class JDBCRelation( } override def buildScan(requiredColumns: Array[String], filters: Array[Filter]): RDD[Row] = { +// When pushDownPredicate is false, all Filters that need to be pushed down should be ignored +val pushedFilters = if (jdbcOptions.pushDownPredicate) { + filters +} else { + Array.empty[Filter] +} // Rely on a type erasure hack to pass RDD[InternalRow] back as RDD[Row] JDBCRDD.scanTa