[GitHub] spark issue #22907: [SPARK-25896][CORE][WIP] Accumulator should only be upda...
Github user carsonwang commented on the issue: https://github.com/apache/spark/pull/22907 What if there is a FetchFailure and Spark reruns some tasks in the previous succeeded shuffle map stage? That will be a new ShuffleMapStage and we will still double counting the accumulators, right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20303: [SPARK-23128][SQL] A new approach to do adaptive executi...
Github user carsonwang commented on the issue: https://github.com/apache/spark/pull/20303 @cloud-fan @gatorsmile , are you ready to start reviewing this? I can bring this update to date. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20303: [SPARK-23128][SQL] A new approach to do adaptive executi...
Github user carsonwang commented on the issue: https://github.com/apache/spark/pull/20303 @aaron-aa , the committers agreed to start reviewing the code after 2.4 release. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21754: [SPARK-24705][SQL] ExchangeCoordinator broken when dupli...
Github user carsonwang commented on the issue: https://github.com/apache/spark/pull/21754 This LGTM as a fix. However, ideally we should also support reusing an exchange used in different joins. There is no need to shuffle write the same table twice, we just need read it differently. For example in one stage, a reducer may read partition 0 to 2, while in another stage a reducer may read partition 0 to 3. We just need a different partitionStartIndices to form a different ShuffledRowRDD, then we can reuse the Exchange. I should have addressed this in my new implementation of adaptive execution, @cloud-fan, let's pay attention to it when we reviewing that pr. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20303: [SPARK-23128][SQL] A new approach to do adaptive executi...
Github user carsonwang commented on the issue: https://github.com/apache/spark/pull/20303 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20303: [SPARK-23128][SQL] A new approach to do adaptive executi...
Github user carsonwang commented on the issue: https://github.com/apache/spark/pull/20303 cc @cloud-fan , @gatorsmile , @yhuai --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20303: [SPARK-23128][SQL] A new approach to do adaptive ...
GitHub user carsonwang opened a pull request: https://github.com/apache/spark/pull/20303 [SPARK-23128][SQL] A new approach to do adaptive execution in Spark SQL ## What changes were proposed in this pull request? This is the co-work with @yucai , @gczsjdy , @chenghao-intel , @xuanyuanking We'd like to introduce a new approach to do adaptive execution in Spark SQL. The idea is described at https://docs.google.com/document/d/1mpVjvQZRAkD-Ggy6-hcjXtBPiQoVbZGe3dLnAKgtJ4k/edit?usp=sharing ## How was this patch tested? Updated ExchangeCoordinatorSuite. We also tested this with all queries in TPC-DS. You can merge this pull request into a Git repository by running: $ git pull https://github.com/carsonwang/spark AE_1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20303.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20303 commit 7e9e57859ae35f70b57ece2fd907464392d88c0f Author: Carson Wang <carson.wang@...> Date: 2017-11-22T08:49:01Z Introduce QueryStage for adapative execution commit 2aa185ddec4616b60e7596153c3dd62188e4d145 Author: Carson Wang <carson.wang@...> Date: 2017-12-12T02:19:53Z Add QueryPlan for ExecutedCommandExec commit 7c3d182f6b6c21f437fbef1ea5ebb2843934dbc6 Author: Carson Wang <carson.wang@...> Date: 2017-12-13T05:31:52Z Add config spark.sql.adaptive.maxNumPostShufflePartitions commit e9b3075c7799b8de6038cc6492cd9e54c12f108e Author: Carson Wang <carson.wang@...> Date: 2017-12-13T07:40:08Z update commetns commit 9666c5fa5b42ebabc4e2b099a1f545945e959e6f Author: Carson Wang <carson.wang@...> Date: 2017-12-18T08:38:32Z Fix style commit 9b29a3c5eb20b76c1ef7f897c07ccdd3a98a27b4 Author: Carson Wang <carson.wang@...> Date: 2017-12-26T02:22:45Z fix bug commit e0b98fbed96c1c07ebf2a1a6846576f705cf2c24 Author: Carson Wang <carson.wang@...> Date: 2018-01-17T05:59:51Z update doc and style --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19681: [SPARK-20652][sql] Store SQL UI data in the new a...
Github user carsonwang commented on a diff in the pull request: https://github.com/apache/spark/pull/19681#discussion_r158226032 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala --- @@ -0,0 +1,366 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution.ui + +import java.util.Date +import java.util.concurrent.ConcurrentHashMap +import java.util.function.Function + +import scala.collection.JavaConverters._ + +import org.apache.spark.{JobExecutionStatus, SparkConf} +import org.apache.spark.internal.Logging +import org.apache.spark.scheduler._ +import org.apache.spark.sql.execution.SQLExecution +import org.apache.spark.sql.execution.metric._ +import org.apache.spark.status.LiveEntity +import org.apache.spark.status.config._ +import org.apache.spark.ui.SparkUI +import org.apache.spark.util.kvstore.KVStore + +private[sql] class SQLAppStatusListener( +conf: SparkConf, +kvstore: KVStore, +live: Boolean, +ui: Option[SparkUI] = None) + extends SparkListener with Logging { + + // How often to flush intermediate state of a live execution to the store. When replaying logs, + // never flush (only do the very last write). + private val liveUpdatePeriodNs = if (live) conf.get(LIVE_ENTITY_UPDATE_PERIOD) else -1L + + // Live tracked data is needed by the SQL status store to calculate metrics for in-flight + // executions; that means arbitrary threads may be querying these maps, so they need to be + // thread-safe. + private val liveExecutions = new ConcurrentHashMap[Long, LiveExecutionData]() + private val stageMetrics = new ConcurrentHashMap[Int, LiveStageMetrics]() + + private var uiInitialized = false + + override def onJobStart(event: SparkListenerJobStart): Unit = { +val executionIdString = event.properties.getProperty(SQLExecution.EXECUTION_ID_KEY) +if (executionIdString == null) { + // This is not a job created by SQL + return +} + +val executionId = executionIdString.toLong +val jobId = event.jobId +val exec = getOrCreateExecution(executionId) + +// Record the accumulator IDs for the stages of this job, so that the code that keeps +// track of the metrics knows which accumulators to look at. +val accumIds = exec.metrics.map(_.accumulatorId).sorted.toList +event.stageIds.foreach { id => + stageMetrics.put(id, new LiveStageMetrics(id, 0, accumIds.toArray, new ConcurrentHashMap())) +} + +exec.jobs = exec.jobs + (jobId -> JobExecutionStatus.RUNNING) +exec.stages = event.stageIds.toSet --- End diff -- @vanzin , shall we add the stageIds to the existing stageIds? Otherwise we will lose the stageIds in previous jobs? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19877: [SPARK-22681]Accumulator should only be updated o...
Github user carsonwang commented on a diff in the pull request: https://github.com/apache/spark/pull/19877#discussion_r154833757 --- Diff: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala --- @@ -1832,6 +1832,27 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with TimeLi assertDataStructuresEmpty() } + test("accumulator not calculated for resubmitted task in result stage") { +// just for register --- End diff -- Just removed that. Thanks @vanzin for the review. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19877: [SPARK-22681]Accumulator should only be updated once for...
Github user carsonwang commented on the issue: https://github.com/apache/spark/pull/19877 cc @vanzin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19877: [SPARK-22681]Accumulator should only updated once...
GitHub user carsonwang opened a pull request: https://github.com/apache/spark/pull/19877 [SPARK-22681]Accumulator should only updated once for each task in result stage ## What changes were proposed in this pull request? As the doc says "For accumulator updates performed inside actions only, Spark guarantees that each taskâs update to the accumulator will only be applied once, i.e. restarted tasks will not update the value." But currently the code doesn't guarantee this. ## How was this patch tested? New added tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/carsonwang/spark fixAccum Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19877.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19877 commit 882126c2671e1733d572350af9749e9f8bdca1c2 Author: Carson Wang <carson.w...@intel.com> Date: 2017-12-04T12:23:14Z Do not update accumulator for resubmitted task in result stage --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19755: [SPARK-22524][SQL] Subquery shows reused on UI SQL tab e...
Github user carsonwang commented on the issue: https://github.com/apache/spark/pull/19755 Can you please show the UI before and after the change? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18169: [SPARK-20941] [SQL] Fix SubqueryExec Reuse
Github user carsonwang commented on the issue: https://github.com/apache/spark/pull/18169 @gatorsmile , why was this reverted? Are you going to open another PR to fix it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17535: [SPARK-20222][SQL] Bring back the Spark SQL UI wh...
Github user carsonwang closed the pull request at: https://github.com/apache/spark/pull/17535 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17535: [SPARK-20222][SQL] Bring back the Spark SQL UI when exec...
Github user carsonwang commented on the issue: https://github.com/apache/spark/pull/17535 This fix will be included in #17540 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operat...
Github user carsonwang commented on a diff in the pull request: https://github.com/apache/spark/pull/17540#discussion_r114238664 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala --- @@ -73,21 +99,35 @@ object SQLExecution { } r } else { - // Don't support nested `withNewExecutionId`. This is an example of the nested - // `withNewExecutionId`: + // Nesting `withNewExecutionId` may be incorrect; log a warning. + // + // This is an example of the nested `withNewExecutionId`: // // class DataFrame { + // // Note: `collect` will call withNewExecutionId // def foo: T = withNewExecutionId { something.createNewDataFrame().collect() } // } // - // Note: `collect` will call withNewExecutionId // In this case, only the "executedPlan" for "collect" will be executed. The "executedPlan" - // for the outer DataFrame won't be executed. So it's meaningless to create a new Execution - // for the outer DataFrame. Even if we track it, since its "executedPlan" doesn't run, + // for the outer Dataset won't be executed. So it's meaningless to create a new Execution + // for the outer Dataset. Even if we track it, since its "executedPlan" doesn't run, // all accumulator metrics will be 0. It will confuse people if we show them in Web UI. // - // A real case is the `DataFrame.count` method. - throw new IllegalArgumentException(s"$EXECUTION_ID_KEY is already set") + // Some operations will start nested executions. For example, CacheTableCommand will uses + // Dataset#count to materialize cached records when caching is not lazy. Because there are + // legitimate reasons to nest executions in withNewExecutionId, this logs a warning but does + // not throw an exception to avoid failing at runtime. Exceptions will be thrown for tests + // to ensure that nesting is avoided. + // + // To avoid this warning, use nested { ... } + if (!Option(sc.getLocalProperty(ALLOW_NESTED_EXECUTION)).exists(_.toBoolean)) { --- End diff -- just ignore this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...
Github user carsonwang commented on the issue: https://github.com/apache/spark/pull/17540 Yes, that's reasonable. I was asking because I noticed `withNewExecutionId` was added in `hiveResultString` method so it should have been fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operat...
Github user carsonwang commented on a diff in the pull request: https://github.com/apache/spark/pull/17540#discussion_r114235457 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala --- @@ -73,21 +99,35 @@ object SQLExecution { } r } else { - // Don't support nested `withNewExecutionId`. This is an example of the nested - // `withNewExecutionId`: + // Nesting `withNewExecutionId` may be incorrect; log a warning. + // + // This is an example of the nested `withNewExecutionId`: // // class DataFrame { + // // Note: `collect` will call withNewExecutionId // def foo: T = withNewExecutionId { something.createNewDataFrame().collect() } // } // - // Note: `collect` will call withNewExecutionId // In this case, only the "executedPlan" for "collect" will be executed. The "executedPlan" - // for the outer DataFrame won't be executed. So it's meaningless to create a new Execution - // for the outer DataFrame. Even if we track it, since its "executedPlan" doesn't run, + // for the outer Dataset won't be executed. So it's meaningless to create a new Execution + // for the outer Dataset. Even if we track it, since its "executedPlan" doesn't run, // all accumulator metrics will be 0. It will confuse people if we show them in Web UI. // - // A real case is the `DataFrame.count` method. - throw new IllegalArgumentException(s"$EXECUTION_ID_KEY is already set") + // Some operations will start nested executions. For example, CacheTableCommand will uses + // Dataset#count to materialize cached records when caching is not lazy. Because there are + // legitimate reasons to nest executions in withNewExecutionId, this logs a warning but does + // not throw an exception to avoid failing at runtime. Exceptions will be thrown for tests + // to ensure that nesting is avoided. + // + // To avoid this warning, use nested { ... } + if (!Option(sc.getLocalProperty(ALLOW_NESTED_EXECUTION)).exists(_.toBoolean)) { --- End diff -- We need check if the value is true instead of only if it exists? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operat...
Github user carsonwang commented on a diff in the pull request: https://github.com/apache/spark/pull/17540#discussion_r114234728 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala --- @@ -73,21 +99,35 @@ object SQLExecution { } r } else { - // Don't support nested `withNewExecutionId`. This is an example of the nested - // `withNewExecutionId`: + // Nesting `withNewExecutionId` may be incorrect; log a warning. + // + // This is an example of the nested `withNewExecutionId`: // // class DataFrame { + // // Note: `collect` will call withNewExecutionId // def foo: T = withNewExecutionId { something.createNewDataFrame().collect() } // } // - // Note: `collect` will call withNewExecutionId // In this case, only the "executedPlan" for "collect" will be executed. The "executedPlan" - // for the outer DataFrame won't be executed. So it's meaningless to create a new Execution - // for the outer DataFrame. Even if we track it, since its "executedPlan" doesn't run, + // for the outer Dataset won't be executed. So it's meaningless to create a new Execution + // for the outer Dataset. Even if we track it, since its "executedPlan" doesn't run, // all accumulator metrics will be 0. It will confuse people if we show them in Web UI. // - // A real case is the `DataFrame.count` method. - throw new IllegalArgumentException(s"$EXECUTION_ID_KEY is already set") + // Some operations will start nested executions. For example, CacheTableCommand will uses + // Dataset#count to materialize cached records when caching is not lazy. Because there are + // legitimate reasons to nest executions in withNewExecutionId, this logs a warning but does + // not throw an exception to avoid failing at runtime. Exceptions will be thrown for tests + // to ensure that nesting is avoided. + // + // To avoid this warning, use nested { ... } + if (!Option(sc.getLocalProperty(ALLOW_NESTED_EXECUTION)).exists(_.toBoolean)) { +if (testing) { + logWarning(s"$EXECUTION_ID_KEY is already set") --- End diff -- According to the comment, we should throw the exception here and log warning at runtime? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...
Github user carsonwang commented on the issue: https://github.com/apache/spark/pull/17540 Hi @rdblue , just wanted to confirm that this also fixed #17535 so we should have the UI when executing queries in Spark SQL CLI? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17535: [SPARK-20222][SQL] Bring back the Spark SQL UI when exec...
Github user carsonwang commented on the issue: https://github.com/apache/spark/pull/17535 Yes, it is closely related but two scenarios of adding `SQLExecution.withNewExecutionId`. Now some tests fail because `withNewExecutionId` is called twice. ``` java.lang.IllegalArgumentException: spark.sql.execution.id is already set [info] at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:90) [info] at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:164) [info] at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.run(InsertIntoHiveTable.scala:311) [info] at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) [info] at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) [info] at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67) [info] at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:295) [info] at org.apache.spark.sql.execution.QueryExecution$$anonfun$hiveResultString$3.apply(QueryExecution.scala:137) [info] at org.apache.spark.sql.execution.QueryExecution$$anonfun$hiveResultString$3.apply(QueryExecution.scala:136) [info] at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65) [info] at org.apache.spark.sql.execution.QueryExecution.hiveResultString(QueryExecution.scala:136) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17535: [SPARK-20222][SQL] Bring back the Spark SQL UI wh...
GitHub user carsonwang opened a pull request: https://github.com/apache/spark/pull/17535 [SPARK-20222][SQL] Bring back the Spark SQL UI when executing queries in Spark SQL CLI ## What changes were proposed in this pull request? There is no Spark SQL UI when executing queries in Spark SQL CLI. It worked before in 2.1.0. The method `hiveResultString` is used by `SparkSQLDriver`. It seems the line `SQLExecution.withNewExecutionId` was removed accidently. This PR is to bring the SQL UI back when using Spark SQL CLI. ## How was this patch tested? Manually ran bin/spark-sql and executed a query. You can merge this pull request into a Git repository by running: $ git pull https://github.com/carsonwang/spark sqlUIFix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17535.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17535 commit 1d843afb9e5a49f429f5723973d0a0457cf36b3b Author: Carson Wang <carson.w...@intel.com> Date: 2017-04-05T02:41:42Z Show Spark SQL UI when using spark sql cli --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16952: [SPARK-19620][SQL]Fix incorrect exchange coordinator id ...
Github user carsonwang commented on the issue: https://github.com/apache/spark/pull/16952 @gatorsmile @cloud-fan @yhuai , can you help review and merge this minor one line fix? The code change itself is straightforward. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17009: [SPARK-19674][SQL]Ignore non-existing driver accumulator...
Github user carsonwang commented on the issue: https://github.com/apache/spark/pull/17009 Thanks @cloud-fan . `driver accumulators don't belong to this execution` is more appropriate. I'll update the words. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17009: [SPARK-19674][SQL]Ignore non-existing driver accu...
Github user carsonwang commented on a diff in the pull request: https://github.com/apache/spark/pull/17009#discussion_r102656311 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLListenerSuite.scala --- @@ -147,6 +147,10 @@ class SQLListenerSuite extends SparkFunSuite with SharedSQLContext with JsonTest checkAnswer(listener.getExecutionMetrics(0), accumulatorUpdates.mapValues(_ * 2)) +// Non-existing driver accumulator updates should be filtered and no exception will be thrown. --- End diff -- I was doing some own experiments that adds physical operators at runtime. The metrics of the added operators are not registered to the execution so I got a NoSuchElementException. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17009: [SPARK-19674][SQL]Ignore non-existing driver accumulator...
Github user carsonwang commented on the issue: https://github.com/apache/spark/pull/17009 cc @cloud-fan @zsxwing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16952: [SPARK-19620][SQL]Fix incorrect exchange coordinator id ...
Github user carsonwang commented on the issue: https://github.com/apache/spark/pull/16952 cc @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16952: [SPARK-19620][SQL]Fix incorrect exchange coordinator id ...
Github user carsonwang commented on the issue: https://github.com/apache/spark/pull/16952 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16952: [SPARK-19620][SQL]Fix incorrect exchange coordina...
GitHub user carsonwang opened a pull request: https://github.com/apache/spark/pull/16952 [SPARK-19620][SQL]Fix incorrect exchange coordinator id in the physical plan ## What changes were proposed in this pull request? When adaptive execution is enabled, an exchange coordinator is used in the Exchange operators. For Join, the same exchange coordinator is used for its two Exchanges. But the physical plan shows two different coordinator Ids which is confusing. This PR is to fix the incorrect exchange coordinator id in the physical plan. The coordinator object instead of the `Option[ExchangeCoordinator]` should be used to generate the identity hash code of the same coordinator. ## How was this patch tested? manual tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/carsonwang/spark FixCoordinatorId Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16952.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16952 commit b2eb68afee9673274122f241f5d9eb64142a509f Author: Carson Wang <carson.w...@intel.com> Date: 2017-02-16T06:22:45Z Fix incorrect exchange coordinator id --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16419: [MINOR][DOC]Fix doc of ForeachWriter to use write...
GitHub user carsonwang opened a pull request: https://github.com/apache/spark/pull/16419 [MINOR][DOC]Fix doc of ForeachWriter to use writeStream ## What changes were proposed in this pull request? Fix the document of `ForeachWriter` to use `writeStream` instead of `write` for a streaming dataset. ## How was this patch tested? Docs only. You can merge this pull request into a Git repository by running: $ git pull https://github.com/carsonwang/spark FixDoc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16419.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16419 commit 175d941370ba5585f3ba55b5f0998059e9e1b6d7 Author: Carson Wang <carson.w...@intel.com> Date: 2016-12-28T06:50:01Z Fix doc of ForeachWriter --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14232][WebUI]Fix event timeline display...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/12029#issuecomment-202772769 Just noticed another minor issue in the picture. It seems the container Id is too long to fit in the black rectangle. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14232][WebUI]Fix event timeline display...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/12029#issuecomment-202767219 After applying the patch, the event timeline can be showed without problems. Picture attached. ![afterpatch](https://cloud.githubusercontent.com/assets/9278199/14101400/cb9b9c62-f5c6-11e5-9022-5d6f580f59e4.jpg) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14232][WebUI]Fix event timeline display...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/12029#issuecomment-202746825 cc @sarutak --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14232][WebUI]Fix event timeline display...
GitHub user carsonwang opened a pull request: https://github.com/apache/spark/pull/12029 [SPARK-14232][WebUI]Fix event timeline display issue when an executor is removed with a multiple line reason. ## What changes were proposed in this pull request? The event timeline doesn't show on job page if an executor is removed with a multiple line reason. This PR replaces all new line characters with spaces. ![timelineerror](https://cloud.githubusercontent.com/assets/9278199/14100211/5fd4cd30-f5be-11e5-9cea-f32651a4cd62.jpg) ## How was this patch tested? Verified on the Web UI. You can merge this pull request into a Git repository by running: $ git pull https://github.com/carsonwang/spark eventTimeline Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12029.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12029 commit be4aaa667c4e252fdd30490428a29efb59292320 Author: Carson Wang <carson.w...@intel.com> Date: 2016-03-29T06:36:56Z Fix event timeline display issue --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13889][YARN][Branch-1.6]Fix the calcula...
Github user carsonwang commented on a diff in the pull request: https://github.com/apache/spark/pull/11813#discussion_r56784342 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -66,14 +66,16 @@ private[spark] class ApplicationMaster( // allocation is enabled), with a minimum of 3. private val maxNumExecutorFailures = { -val defaultKey = +val effectiveNumExecutors = --- End diff -- Ok, let's just leave this alone. For users encountering the issue in Spark1.6, they can always manually set `spark.yarn.max.executor.failures` to a larger number as a workaround. Thanks @srowen . I'll close this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13889][YARN][Branch-1.6]Fix the calcula...
Github user carsonwang closed the pull request at: https://github.com/apache/spark/pull/11813 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13889][YARN][Branch-1.6]Fix the calcula...
GitHub user carsonwang opened a pull request: https://github.com/apache/spark/pull/11813 [SPARK-13889][YARN][Branch-1.6]Fix the calculation of the max number of executor failure ## What changes were proposed in this pull request? Backport #11713 to 1.6. The max number of executor failure before failing the application is default to twice the maximum number of executors if dynamic allocation is enabled. The default value for "spark.dynamicAllocation.maxExecutors" is Int.MaxValue. The calculated value of the default max number of executor failure should be Int.MaxValue instead of only 3. ## How was this patch tested? It tests if the value is greater that Int.MaxValue / 2 to avoid the overflow when it multiplies 2. You can merge this pull request into a Git repository by running: $ git pull https://github.com/carsonwang/spark branch-1.6-ExecutorFailNum Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11813.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11813 commit 17d8bc1f13c3b29e22ecbec6a9f08491e5970368 Author: Carson Wang <carson.w...@intel.com> Date: 2016-03-18T05:15:40Z Fix the calculation of the max number of executor failure --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13889][YARN] Fix integer overflow when ...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/11713#issuecomment-197923149 Thanks @srowen . There is no integer overflow in 1.6 but the max number of executor failure is also 3 if dynamic allocation is enabled. It should use Int.MaxValue as the default as the doc and other code use. Do you want me to submit a fix to 1.6? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13889][YARN][Branch-1.6]Fix the calcula...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/11813#issuecomment-198215160 cc @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13889][YARN] Fix integer overflow when ...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/11713#issuecomment-197105407 Without this patch, the application with dynamic allocation enabled will fail when only 3 executors are lost. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13889][YARN] Fix integer overflow when ...
Github user carsonwang commented on a diff in the pull request: https://github.com/apache/spark/pull/11713#discussion_r56270809 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -73,7 +73,8 @@ private[spark] class ApplicationMaster( } else { sparkConf.get(EXECUTOR_INSTANCES).getOrElse(0) } -val defaultMaxNumExecutorFailures = math.max(3, 2 * effectiveNumExecutors) +val defaultMaxNumExecutorFailures = math.max(3, --- End diff -- Added a comment for it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13889][YARN] Fix integer overflow when ...
GitHub user carsonwang opened a pull request: https://github.com/apache/spark/pull/11713 [SPARK-13889][YARN] Fix integer overflow when calculating the max number of executor failure ## What changes were proposed in this pull request? The max number of executor failure before failing the application is default to twice the maximum number of executors if dynamic allocation is enabled. The default value for "spark.dynamicAllocation.maxExecutors" is Int.MaxValue. So this causes an integer overflow and a wrong result. The calculated value of the default max number of executor failure is 3. This PR adds a check to avoid the overflow. ## How was this patch tested? It tests if the value is greater that Int.MaxValue / 2 to avoid the overflow when it multiplies 2. You can merge this pull request into a Git repository by running: $ git pull https://github.com/carsonwang/spark IntOverflow Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11713.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11713 commit 8469148f32be6ed7a1ccb5178c5ab2d421d71dc4 Author: Carson Wang <carson.w...@intel.com> Date: 2016-03-15T04:31:14Z Fix integer overflow when calculating max executor failure number --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13185][SQL] Reuse Calendar object in Da...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/11090#issuecomment-183801352 @srowen The 20 seconds improvements is the difference of the stage time. i.e. before the patch, the stage runs 1.6 min. With this path it runs 1.2 min. It takes about 1 second to create 1 million `Calendar` records. So it saves about 1 microsecond per record. Right, `clear()` doesn't un-set the time zone. #11071 also wants to optimize the `stringToTimestamp` method. In my case, the code doesn't call `stringToTimestamp`. I just noticed it also creates `Calendar` and `TimeZone` objects which may have the similar issue. Anyway, we can fix the issue in `stringToDate` first as this is more obvious? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13185][SQL] Reuse Calendar object in Da...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/11090#issuecomment-180652254 Reusing a `Calendar` object when the method will be called frequently is something recommended by "Effective Java" mentioned [here](http://www.informit.com/articles/article.aspx?p=1216151=5). The more the method is invoked, the more performance improvement we get. `stringToTimestamp` has the same problem but it uses different `TimeZone`s instead of only a GMT `TimeZone`. We need cache the `TimeZone`s in a hash map as I did in #11071. It is a little more involved and there is a case that the `Calendar` cannot be reused. So I'll keep it as is and make this PR as small as possible for now. If it is worth doing, I'll be happy to submit a new PR to address it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12688][SQL] Fix spill size metric in un...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/10634#issuecomment-180659073 @JoshRosen , do you have any further comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12688][SQL] Fix spill size metric in un...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/10634#issuecomment-180659170 /cc @cloud-fan @andrewor14 , did you guys see spill size > 0 when the UI was introduced? Can you take a look at this fix? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...
Github user carsonwang closed the pull request at: https://github.com/apache/spark/pull/11071 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13185][SQL] Reuse Calendar object in Da...
GitHub user carsonwang opened a pull request: https://github.com/apache/spark/pull/11090 [SPARK-13185][SQL] Reuse Calendar object in DateTimeUtils.StringToDate method to improve performance The java `Calendar` object is expensive to create. I have a sub query like this `SELECT a, b, c FROM table UV WHERE (datediff(UV.visitDate, '1997-01-01')>=0 AND datediff(UV.visitDate, '2015-01-01')<=0))` The table stores `visitDate` as String type and has 3 billion records. A `Calendar` object is created every time `DateTimeUtils.stringToDate` is called. By reusing the `Calendar` object, I saw about 20 seconds performance improvement for this stage. You can merge this pull request into a Git repository by running: $ git pull https://github.com/carsonwang/spark SPARK-13185 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11090.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11090 commit 5bbbf917b326062c30a017a76203e97cdb0d0e74 Author: Carson Wang <carson.w...@intel.com> Date: 2016-02-05T07:02:24Z Reuse Calendar object in StringToDate method --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13185][SQL] Reuse Calendar object in Da...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/11090#issuecomment-180233554 /cc @srowen @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/11071#issuecomment-180172808 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...
Github user carsonwang commented on a diff in the pull request: https://github.com/apache/spark/pull/11071#discussion_r51972422 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -55,10 +56,19 @@ object DateTimeUtils { // this is year -17999, calculation: 50 * daysIn400Year final val YearZero = -17999 final val toYearZero = to2001 + 7304850 - final val TimeZoneGMT = TimeZone.getTimeZone("GMT") @transient lazy val defaultTimeZone = TimeZone.getDefault + // Reuse the TimeZone object as it is expensive to create in each method call. + final val timeZones = new ConcurrentHashMap[String, TimeZone] --- End diff -- Added `transient`. The total available timezone IDs should be limited. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/11071#issuecomment-180207538 I have a sub query like this `SELECT a, b, c FROM table UV WHERE (datediff(UV.visitDate, '1997-01-01')>=0 AND datediff(UV.visitDate, '2015-01-01')<=0)) ` When profiling this stage with Spark 1.6, I noticed a lot time was consumed by `DateTimeUtils.stringToDate`. Especially, `TimeZone.getTimeZone` and `Calendar.getInstance` are extremely slow. The table stores `visitDate` as `String` type and has 3 billion records. This means it creates 3 billion `Calendar` and `TimeZone` objects. `TimeZone.getTimeZone` is a synchronized method and will block other threads calling this same method. #10994 fixed one for `DateTimeUtils.stringToDate`. But `DateTimeUtils.stringToTimestamp` has the same issue so I tried to cache the `TimeZone` objects in a map. The total available number of `TimeZone` should be limited. By reusing `Calendar` object instead of creating it each time in the method, I can see more performance improvement. Creating 20 millions `Calendar` objects will take more that 20 seconds on my machine. So we will benefit from reusing it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...
GitHub user carsonwang opened a pull request: https://github.com/apache/spark/pull/11071 [SPARK-13185][SQL] Improve the performance of DateTimeUtils by reusing TimeZone and Calendar objects It is expensive to create java TimeZone and Calendar objects in each method of DateTimeUtils. We can reuse the objects to improve the performance. In one of my Sql queries which calls StringToDate many times, the duration of the stage improved from 1.6 minutes to 1.2 minutes. You can merge this pull request into a Git repository by running: $ git pull https://github.com/carsonwang/spark DateTimeUtilsFix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11071.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11071 commit cb9b157525c05c1a3d33b8c820595cf020a21b43 Author: Carson Wang <carson.w...@intel.com> Date: 2016-02-04T08:12:59Z Reuse TimeZone and Calendar objects in DateTimeUtils --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13100] [SQL] improving the performance ...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/10994#issuecomment-179567757 I think we should also reusing a Calendar object in each thread. It is expensive to create java Calendar and Timezone objects each time the method is called. I noticed the same problem and I saw performance boost if reusing these objects. The duration of one stage was improved from 1.6 minutes to 1.2 minutes. @wangyang1992 @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13100] [SQL] improving the performance ...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/10994#issuecomment-179587937 OK. I will do that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12688][SQL] Fix spill size metric in un...
Github user carsonwang commented on a diff in the pull request: https://github.com/apache/spark/pull/10634#discussion_r50648300 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java --- @@ -125,7 +125,8 @@ public UnsafeKVExternalSorter( // reset the map, so we can re-use it to insert new records. the inMemSorter will not used // anymore, so the underline array could be used by map again. - map.reset(); + final long spillSize = map.reset(); + taskContext.taskMetrics().incMemoryBytesSpilled(spillSize); --- End diff -- Sorry for the delay, @JoshRosen . I will update this soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12688][SQL] Fix spill size metric in un...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/10634#issuecomment-174398084 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12688][SQL] Fix spill size metric in un...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/10634#issuecomment-174421284 @JoshRosen , I now also update `diskBytesSpilled`. Previously it is not updated for aggregation. Please help review this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12688][SQL] Fix spill size metric in un...
Github user carsonwang commented on a diff in the pull request: https://github.com/apache/spark/pull/10634#discussion_r50659285 --- Diff: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java --- @@ -202,6 +201,7 @@ public long spill(long size, MemoryConsumer trigger) throws IOException { // pages will currently be counted as memory spilled even though that space isn't actually // written to disk. This also counts the space needed to store the sorter's pointer array. taskContext.taskMetrics().incMemoryBytesSpilled(spillSize); + taskContext.taskMetrics().incDiskBytesSpilled(writeMetrics.shuffleBytesWritten()); --- End diff -- The `spillSize` here is 0 because the data are stored in a map instead of this sorter. So `incMemoryBytesSpilled(spillSize)` actually increase 0. We need update the `MemoryBytesSpilled` after freeing the memory in the map. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12688][SQL] Fix spill size metric in un...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/10634#issuecomment-174422121 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12339] [WebUI] Added a null check that ...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/10405#issuecomment-166184298 Thanks for catching this. I think the null check here is necessary, and it seems the code that really pass a null taskMetrcis is from the `TaskSetManager` line 796 when a task is resubmitted because of executor lost. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12399] Display correct error message wh...
Github user carsonwang commented on a diff in the pull request: https://github.com/apache/spark/pull/10352#discussion_r48114578 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala --- @@ -115,7 +117,17 @@ class HistoryServer( } def getSparkUI(appKey: String): Option[SparkUI] = { -Option(appCache.get(appKey)) --- End diff -- `appCache.getIfPresent` returns null if there is no cached value for the appKey. But `appCache.get` will try to obtain that value from a `CacheLoader`, cache it and return it. So I think we still need use `appCache.get` here and handle the exception. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12399] Display correct error message wh...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/10352#issuecomment-165705920 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12399] Display correct error message wh...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/10352#issuecomment-165635982 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12399] Display correct error message wh...
GitHub user carsonwang opened a pull request: https://github.com/apache/spark/pull/10352 [SPARK-12399] Display correct error message when accessing REST API with an unknown app Id I got an exception when accessing the below REST API with an unknown application Id. `/api/v1/applications/xxx/jobs` Instead of an exception, I expect an error message "no such app: xxx" which is a similar error message when I access `/api/v1/applications/xxx` ``` org.spark-project.guava.util.concurrent.UncheckedExecutionException: java.util.NoSuchElementException: no app with key xxx at org.spark-project.guava.cache.LocalCache$Segment.get(LocalCache.java:2263) at org.spark-project.guava.cache.LocalCache.get(LocalCache.java:4000) at org.spark-project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at org.spark-project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) at org.apache.spark.deploy.history.HistoryServer.getSparkUI(HistoryServer.scala:116) at org.apache.spark.status.api.v1.UIRoot$class.withSparkUI(ApiRootResource.scala:226) at org.apache.spark.deploy.history.HistoryServer.withSparkUI(HistoryServer.scala:46) at org.apache.spark.status.api.v1.ApiRootResource.getJobs(ApiRootResource.scala:66) ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/carsonwang/spark unknownAppFix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10352.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10352 commit a7721afdaead1977c78c8afb60ebd21e076a1e01 Author: Carson Wang <carson.w...@intel.com> Date: 2015-12-17T07:37:23Z Display correct error message when accessing REST API with an unknown app Id --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/10061#issuecomment-160889037 Hi @JoshRosen, the execution IDs are from the static `SQLExecution` object. So I think they are always unique. Yes, previously each `SQLContext` has its own `SQLListener`. Since the SQL events are now sent to the same event bus, all `SQLListeners` will receives SQL events executed from other `SQLContexts`. If we still want to keep one UI Tab for each `SQLContext`, I think we need keep something like a SQLContext ID in the SQL event so that the `SQLListener` knows if it need handle that event. Is there any strong requirement to keep one UI Tab for each `SQLContext`? I remember some users don't want so many UI tabs because they create many `SQLContext`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] (Followup) Fix SQLListenerMemory...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/9991#issuecomment-160858339 > It's a bit different but not in the way @carsonwang explained; whether you use the hook or handle SparkListenerApplicationEnd, the listener will be cleared when the context is shut down. Sorry I misunderstood it. Ok I will use SparkListenerApplicationEnd. The issue @zsxwing mentioned can probably be addressed in another PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] (Followup) Fix SQLListenerMemory...
Github user carsonwang closed the pull request at: https://github.com/apache/spark/pull/9991 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] (Followup) Fix SQLListenerMemory...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/9991#issuecomment-160876864 Close this and resubmit #10061 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...
Github user carsonwang commented on a diff in the pull request: https://github.com/apache/spark/pull/10061#discussion_r46245965 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -1263,6 +1264,8 @@ object SQLContext { */ @transient private val instantiatedContext = new AtomicReference[SQLContext]() + @transient private val sqlListener = new AtomicReference[SQLListener]() --- End diff -- Many unit tests use `sqlContext.listener`. Can you please suggest how to update the unit tests if we changed to use an AtomicBoolean? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] (Followup) Fix SQLListenerMemory...
Github user carsonwang commented on a diff in the pull request: https://github.com/apache/spark/pull/9991#discussion_r46245206 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLListenerSuite.scala --- @@ -343,6 +343,8 @@ class SQLListenerMemoryLeakSuite extends SparkFunSuite { .set("spark.sql.ui.retainedExecutions", "50") // Set it to 50 to run this test quickly val sc = new SparkContext(conf) try { + // Clear the sql listener created by a previous test suite. + SQLContext.clearSqlListener() --- End diff -- Many unit tests use `sqlContext.listener`. Can you please suggest how to update the unit tests if we changed to use an `AtomicBoolean`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...
GitHub user carsonwang opened a pull request: https://github.com/apache/spark/pull/10061 [SPARK-11206] Support SQL UI on the history server (resubmit) On the live web UI, there is a SQL tab which provides valuable information for the SQL query. But once the workload is finished, we won't see the SQL tab on the history server. It will be helpful if we support SQL UI on the history server so we can analyze it even after its execution. To support SQL UI on the history server: 1. I added an onOtherEvent method to the SparkListener trait and post all SQL related events to the same event bus. 2. Two SQL events SparkListenerSQLExecutionStart and SparkListenerSQLExecutionEnd are defined in the sql module. 3. The new SQL events are written to event log using Jackson. 4. A new trait SparkHistoryListenerFactory is added to allow the history server to feed events to the SQL history listener. The SQL implementation is loaded at runtime using java.util.ServiceLoader. You can merge this pull request into a Git repository by running: $ git pull https://github.com/carsonwang/spark SqlHistoryUI Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10061.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10061 commit fdf9d28362fe991a1df0d5392c9021db78fa7541 Author: Carson Wang <carson.w...@intel.com> Date: 2015-10-16T07:00:11Z Update SparkListener to handle other events commit ff4075d89b480d99d0320390d00a5b10aadc9a93 Author: Carson Wang <carson.w...@intel.com> Date: 2015-10-20T06:06:07Z Write sql events to event log commit b9870e6579a59628c82470e55f2cb6c4ec8fa2a7 Author: Carson Wang <carson.w...@intel.com> Date: 2015-10-21T06:29:37Z Move sql UI classes to core commit 3833055b7c94247b700e6fd0565629e71611d307 Author: Carson Wang <carson.w...@intel.com> Date: 2015-10-21T06:32:28Z rename SqlMetricInfo class name commit c0abfc6b7432750812a47790fe51b35dacba7429 Author: Carson Wang <carson.w...@intel.com> Date: 2015-10-22T02:22:51Z Update sql metric param commit a5b1cf42a4847b2c9c9674ce0d6aa4d332498ca2 Author: Carson Wang <carson.w...@intel.com> Date: 2015-10-26T02:09:32Z handle accumulator updates in sql history UI commit 7b30bc736f09600b25772e30636f8a5c19c6db5e Author: Carson Wang <carson.w...@intel.com> Date: 2015-10-26T06:51:25Z Use a single SQL Tab for all SparkContext commit d52288bb2e18a5f0e898110893a091919e13ea84 Author: Carson Wang <carson.w...@intel.com> Date: 2015-10-27T08:49:30Z update style commit 7a2acedfc524e5c5887bd783e6fbbe289313306a Author: Carson Wang <carson.w...@intel.com> Date: 2015-10-28T05:08:23Z Throws exception for unknown events commit caab0bab0299d4eb985b2e5e68cc5813faac6dfb Author: Carson Wang <carson.w...@intel.com> Date: 2015-10-28T05:39:07Z Fix build error commit 0af5afeafe894614e7c1cb83f343db0a0869ad77 Author: Carson Wang <carson.w...@intel.com> Date: 2015-10-28T06:07:28Z Fix style commit 8d565f24ab365482333d4ff2331c72b01d02358a Author: Carson Wang <carson.w...@intel.com> Date: 2015-11-06T02:46:35Z Avoid moving the sql classes commit 1954d71d674dc35659cba6f9e3b56d42a756778e Author: Carson Wang <carson.w...@intel.com> Date: 2015-11-06T06:15:59Z code clean commit 927bae84244f900724c3865aac969e4639594760 Author: Carson Wang <carson.w...@intel.com> Date: 2015-11-06T07:17:45Z code clean commit b03d98bcb6854a01b65fd5e43368580dec192482 Author: Carson Wang <carson.w...@intel.com> Date: 2015-11-09T05:24:04Z minor fix commit 51f913bed91dd95029b404092e8cd7c25b02cad6 Author: Carson Wang <carson.w...@intel.com> Date: 2015-11-10T02:04:09Z Fix unit tests commit bca3f5ffca9a9e6f5c56633812432892af66b4e7 Author: Carson Wang <carson.w...@intel.com> Date: 2015-11-12T06:07:36Z Address comments commit 60033f8da8bd564c01811c25dc9dbfdf897896b7 Author: Carson Wang <carson.w...@intel.com> Date: 2015-11-12T06:28:31Z Fix RAT test commit fe5c16529d821098658cb61ca0f0a21d2a568270 Author: Carson Wang <carson.w...@intel.com> Date: 2015-11-12T06:36:20Z Fix style commit 8d94707d4e44741d5e3c24a146d2bf526cf884ce Author: Carson Wang <carson.w...@intel.com> Date: 2015-11-19T03:24:50Z Address vanzin's comments commit 56f24bafb3f5b9306bf393220d018e0620f1296d Author: Carson Wang <carson.w...@intel.com> Date: 2015-11-19T07:37:23Z Address hao's comments commit 690277e37c76b98ba57bc4bd5b0ae2c87669578d Author: Carson Wang <carson.w...@intel.com> Date: 2015-11-20T02:39:40Z Remove one empty line commit 5270209504a337b50cdbafe8cc3e8f83c6ea9745 Author: Carson Wang <carson.w...@intel.com> Date: 2015-12-01T06:04:40Z c
[GitHub] spark pull request: [SPARK-11206] (Followup) Fix SQLListenerMemory...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/9991#issuecomment-160494777 @vanzin , I wrapped the calls to the hooks with `Utils.tryLogNonFatalError`. I didn't clean up the `SQLListener` after a application end event because another `SQLContext` created later still wants to use the same `SQLListener`. I think the history server does not have the problem because there is no reference to the `SQLListener` from a companion object in that case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] (Followup) Fix SQLListenerMemory...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/9991#issuecomment-160495888 @zsxwing , do you have any further comments regarding how the `SQLListener` is cleaned up? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] (Followup) Fix SQLListenerMemory...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/9991#issuecomment-160242651 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] (Followup) Fix SQLListenerMemory...
Github user carsonwang commented on a diff in the pull request: https://github.com/apache/spark/pull/9991#discussion_r46014967 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLListenerSuite.scala --- @@ -343,6 +343,8 @@ class SQLListenerMemoryLeakSuite extends SparkFunSuite { .set("spark.sql.ui.retainedExecutions", "50") // Set it to 50 to run this test quickly val sc = new SparkContext(conf) try { + // Clear the sql listener created by a previous test suite. + SQLContext.clearSqlListener() --- End diff -- I think we can add a SparkContext stop hook. When SparkContext is being stopped, clear the reference. The user doesn't have to call a method to clear the sqlListener reference. The sqlListener is added to SparkContext and will only be garbage collected when SparkContext is stopped. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] (Followup) Fix SQLListenerMemory...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/9991#issuecomment-160055041 The original purpose of this PR is to fix the `SQLListenerMemoryLeakSuite` test failure. This can be resolved by clearing `SQLContext.sqlListener` before the test. To prevent memory leak similar to SPARK-11700, I added a `SparkContext` stop hook to clear the `sqlListener` reference. I didn't make `SQLContext.clearSqlListener` a public API because it seems a little confusing for users to call it. And the `sqlListener` is added to SparkContext, it will not be GCed at once even if a user calls `SQLContext.clearSqlListener`. Now we clear the reference when the `SparkContext` is being stopped to allow the `sqlListener` to be GCed later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] (Followup) Fix SQLListenerMemory...
Github user carsonwang commented on a diff in the pull request: https://github.com/apache/spark/pull/9991#discussion_r45952482 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLListenerSuite.scala --- @@ -343,6 +343,8 @@ class SQLListenerMemoryLeakSuite extends SparkFunSuite { .set("spark.sql.ui.retainedExecutions", "50") // Set it to 50 to run this test quickly val sc = new SparkContext(conf) try { + // Clear the sql listener created by a previous test suite. + SQLContext.clearSqlListener() --- End diff -- I see. Is it enough to make `SQLContext.clearSqlListner` public here? So we provide a way to clear the reference for users who want the object to be GCed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] (Followup) Fix SQLListenerMemory...
GitHub user carsonwang opened a pull request: https://github.com/apache/spark/pull/9991 [SPARK-11206] (Followup) Fix SQLListenerMemoryLeakSuite test error A followup to #9297, fix the SQLListenerMemoryLeakSuite test error. The [failure](https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/HADOOP_VERSION=2.0.0-mr1-cdh4.1.2,label=spark-test/4896/) occurs because a `sqlListener` created by a previous test suite is not cleared in the SQLListenerMemoryLeakSuite. In the failure case, the previous test suite DateFunctionsSuite has 91 completed executions. So the error message is `91 was not less than or equal to 50.` For test suites extends `SharedSQLContext`, the sqlListener is cleared in the method `beforeAll`. Since `SQLListenerMemoryLeakSuite` doesn't extends `SharedSQLContext`, the `sqlListener` need to be cleared manually before creating the `SQLContext`. /cc @vanzin @JoshRosen @chenghao-intel You can merge this pull request into a Git repository by running: $ git pull https://github.com/carsonwang/spark SQLListenerTestFix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9991.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9991 commit 8ca3031948b56f5f54dced1b58de35ec27f1e370 Author: Carson Wang <carson.w...@intel.com> Date: 2015-11-26T06:41:10Z Fix SQLListenerMemoryLeakSuite test error --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/9297#issuecomment-159825154 I just submitted #9991 to fix the test failure. Details are described in the new PR. Thanks all! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] (Followup) Fix SQLListenerMemory...
Github user carsonwang commented on a diff in the pull request: https://github.com/apache/spark/pull/9991#discussion_r45948964 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLListenerSuite.scala --- @@ -343,6 +343,8 @@ class SQLListenerMemoryLeakSuite extends SparkFunSuite { .set("spark.sql.ui.retainedExecutions", "50") // Set it to 50 to run this test quickly val sc = new SparkContext(conf) try { + // Clear the sql listener created by a previous test suite. + SQLContext.clearSqlListener() --- End diff -- Previously each `SQLContext` has its own `sqlListener`. Because now the SQL events are posted to the event bus. All `SQLContext` now share a single `sqlListener`. I don't think a user need clear `SQLContext.sqlListener`. This is only used by the unit tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/9297#issuecomment-158264556 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...
Github user carsonwang commented on a diff in the pull request: https://github.com/apache/spark/pull/9297#discussion_r45288401 --- Diff: core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala --- @@ -131,6 +135,17 @@ case class SparkListenerApplicationEnd(time: Long) extends SparkListenerEvent private[spark] case class SparkListenerLogStart(sparkVersion: String) extends SparkListenerEvent /** + * Interface for registering listeners defined in other modules like SQL, which are used to + * rebuild the history UI. + */ +trait SparkListenerRegister { --- End diff -- `SparkHistoryListenerFactory` sounds good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...
Github user carsonwang commented on a diff in the pull request: https://github.com/apache/spark/pull/9297#discussion_r45288842 --- Diff: core/src/main/scala/org/apache/spark/ui/SparkUI.scala --- @@ -127,6 +130,11 @@ private[spark] object SparkUI { val DEFAULT_RETAINED_STAGES = 1000 val DEFAULT_RETAINED_JOBS = 1000 + val listenerRegisters: Iterable[SparkListenerRegister] = { --- End diff -- Oh yes, this can be a local variable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...
Github user carsonwang commented on a diff in the pull request: https://github.com/apache/spark/pull/9297#discussion_r45309477 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala --- @@ -193,38 +214,39 @@ private[sql] class SQLListener(conf: SparkConf) extends SparkListener with Loggi } } - def onExecutionStart( - executionId: Long, - description: String, - details: String, - physicalPlanDescription: String, - physicalPlanGraph: SparkPlanGraph, - time: Long): Unit = { -val sqlPlanMetrics = physicalPlanGraph.nodes.flatMap { node => - node.metrics.map(metric => metric.accumulatorId -> metric) -} - -val executionUIData = new SQLExecutionUIData(executionId, description, details, - physicalPlanDescription, physicalPlanGraph, sqlPlanMetrics.toMap, time) -synchronized { - activeExecutions(executionId) = executionUIData - _executionIdToData(executionId) = executionUIData -} - } - - def onExecutionEnd(executionId: Long, time: Long): Unit = synchronized { -_executionIdToData.get(executionId).foreach { executionUIData => - executionUIData.completionTime = Some(time) - if (!executionUIData.hasRunningJobs) { -// onExecutionEnd happens after all "onJobEnd"s -// So we should update the execution lists. -markExecutionFinished(executionId) - } else { -// There are some running jobs, onExecutionEnd happens before some "onJobEnd"s. -// Then we don't if the execution is successful, so let the last onJobEnd updates the -// execution lists. + override def onOtherEvent(event: SparkListenerEvent): Unit = event match { +case executionStart: SparkListenerSQLExecutionStart => + val physicalPlanGraph = SparkPlanGraph(executionStart.sparkPlanInfo) + val sqlPlanMetrics = physicalPlanGraph.nodes.flatMap { node => +node.metrics.map(metric => metric.accumulatorId -> metric) + } + val executionUIData = new SQLExecutionUIData( +executionStart.executionId, +executionStart.description, +executionStart.details, +executionStart.physicalPlanDescription, +physicalPlanGraph, +sqlPlanMetrics.toMap, +executionStart.time) + synchronized { +activeExecutions(executionStart.executionId) = executionUIData +_executionIdToData(executionStart.executionId) = executionUIData + } +case executionEnd: SparkListenerSQLExecutionEnd => synchronized { --- End diff -- done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...
Github user carsonwang commented on a diff in the pull request: https://github.com/apache/spark/pull/9297#discussion_r45309470 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala --- @@ -193,38 +214,39 @@ private[sql] class SQLListener(conf: SparkConf) extends SparkListener with Loggi } } - def onExecutionStart( - executionId: Long, - description: String, - details: String, - physicalPlanDescription: String, - physicalPlanGraph: SparkPlanGraph, - time: Long): Unit = { -val sqlPlanMetrics = physicalPlanGraph.nodes.flatMap { node => - node.metrics.map(metric => metric.accumulatorId -> metric) -} - -val executionUIData = new SQLExecutionUIData(executionId, description, details, - physicalPlanDescription, physicalPlanGraph, sqlPlanMetrics.toMap, time) -synchronized { - activeExecutions(executionId) = executionUIData - _executionIdToData(executionId) = executionUIData -} - } - - def onExecutionEnd(executionId: Long, time: Long): Unit = synchronized { -_executionIdToData.get(executionId).foreach { executionUIData => - executionUIData.completionTime = Some(time) - if (!executionUIData.hasRunningJobs) { -// onExecutionEnd happens after all "onJobEnd"s -// So we should update the execution lists. -markExecutionFinished(executionId) - } else { -// There are some running jobs, onExecutionEnd happens before some "onJobEnd"s. -// Then we don't if the execution is successful, so let the last onJobEnd updates the -// execution lists. + override def onOtherEvent(event: SparkListenerEvent): Unit = event match { +case executionStart: SparkListenerSQLExecutionStart => --- End diff -- done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...
Github user carsonwang commented on a diff in the pull request: https://github.com/apache/spark/pull/9297#discussion_r45309655 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala --- @@ -104,21 +104,39 @@ private class LongSQLMetricParam(val stringValue: Seq[Long] => String, initialVa override def zero: LongSQLMetricValue = new LongSQLMetricValue(initialValue) } +private[sql] object LongSQLMetricParam extends LongSQLMetricParam(_.sum.toString, 0L) --- End diff -- It seems `LongSQLMetricParam` is the simplest one. Don't have another good name in mind at the moment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/9297#issuecomment-157977639 Thanks @vanzin and @chenghao-intel for reviewing. Just pushed updates to address the comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...
Github user carsonwang commented on a diff in the pull request: https://github.com/apache/spark/pull/9297#discussion_r45309519 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala --- @@ -91,7 +91,7 @@ private[sql] class LongSQLMetric private[metric](name: String, param: LongSQLMet } } -private class LongSQLMetricParam(val stringValue: Seq[Long] => String, initialValue: Long) +private[sql] class LongSQLMetricParam(val stringValue: Seq[Long] => String, initialValue: Long) --- End diff -- Removed that after removing another two `[sql]`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/9297#issuecomment-156287223 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/9297#issuecomment-156339821 Hi @vanzin, I updated the code to address your comments. Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/9297#issuecomment-156296483 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/9297#issuecomment-156309871 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...
Github user carsonwang commented on a diff in the pull request: https://github.com/apache/spark/pull/9297#discussion_r44615239 --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala --- @@ -96,6 +114,7 @@ private[spark] object JsonProtocol { executorMetricsUpdateToJson(metricsUpdate) case blockUpdated: SparkListenerBlockUpdated => throw new MatchError(blockUpdated) // TODO(ekl) implement this + case _ => parse(write(event)) --- End diff -- Hi @steveloughran, I think currently the generic handling is only for new events added in the sub modules. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...
Github user carsonwang commented on a diff in the pull request: https://github.com/apache/spark/pull/9297#discussion_r44622811 --- Diff: core/src/main/scala/org/apache/spark/ui/SparkUI.scala --- @@ -150,7 +151,14 @@ private[spark] object SparkUI { appName: String, basePath: String, startTime: Long): SparkUI = { -create(None, conf, listenerBus, securityManager, appName, basePath, startTime = startTime) +val sparkUI = create( + None, conf, listenerBus, securityManager, appName, basePath, startTime = startTime) +JsonProtocol.eventRegisters.foreach { eventRegister => + val listener = eventRegister.getListener() + listenerBus.addListener(listener) + eventRegister.attachUITab(listener, sparkUI) --- End diff -- Yes. Ok, I changed to only attach the SQL tab when a `SparkListenerSQLExecutionStart` event is received for the first time. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...
Github user carsonwang commented on a diff in the pull request: https://github.com/apache/spark/pull/9297#discussion_r44614968 --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala --- @@ -96,6 +114,7 @@ private[spark] object JsonProtocol { executorMetricsUpdateToJson(metricsUpdate) case blockUpdated: SparkListenerBlockUpdated => throw new MatchError(blockUpdated) // TODO(ekl) implement this + case _ => parse(write(event)) --- End diff -- I tried using the `JsonTypeInfo` annotation and it worked. The difference is that it need write the full class name now. The event will look like ``` "Event":"org.apache.spark.sql.execution.ui.SparkListenerSQLExecutionStart" ``` When deserializing the data, we use reflection to find the class so that the class list is not needed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/9297#issuecomment-155348070 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/9297#issuecomment-152536961 @vanzin Thanks a lot for the comment. This sounds great and is very helpful. I agree it is not a good idea to move more stuff to the core. The underlying code change you mentioned sounds a lot of works to do, especially when we consider backwards compatibility. Can we apply part of them so that we can support SQL UI on the history server in this PR and also avoid moving the SQL classes to the core? I was thinking adding a `toJson` method to the `SparkListenerEvent`. As you mentioned when it is not a sealed trait, we can define sql events in the sql module. For existing events in the core, we still use the `JsonProtocol` in the `toJson` method to convert the event to json. For sql events, we can define `toJson` method in the sql module. This way we will be able to write the sql events to the log. Then we can use something you mentioned to feed events from the history server to the sql listener. I know this might still not be very scalable. But we don't make things worse once we are able to avoid moving sql classes to the core and also support SQL UI on the history server. Do you think this is doable for this ticket? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...
GitHub user carsonwang opened a pull request: https://github.com/apache/spark/pull/9297 [SPARK-11206] Support SQL UI on the history server On the live web UI, there is a SQL tab which provides valuable information for the SQL query. But once the workload is finished, we won't see the SQL tab on the history server. It will be helpful if we support SQL UI on the history server so we can analyze it even after its execution. To support SQL UI on the history server, I add an `onOtherEvent` method to the `SparkListener` trait and post all SQL related events to the same event bus. Two SQL events `SparkListenerSQLExecutionStart` and `SparkListenerSQLExecutionEnd` are added. The history server, the standalone Master that rebuilds the web UI, and the event log listener which writes events to the storage are all in the core module. These components have to reference some necessary SQL classes like the SQL events, the SQLTab, etc. It will be best if we can make these components downstream of sql in the future. Currently I have to move these classes to core. This change also make a single SQLTab for all `SQLContext` because they can now share a single `SQLListener`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/carsonwang/spark SqlHistoryUI Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9297.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9297 commit fdf9d28362fe991a1df0d5392c9021db78fa7541 Author: Carson Wang <carson.w...@intel.com> Date: 2015-10-16T07:00:11Z Update SparkListener to handle other events commit ff4075d89b480d99d0320390d00a5b10aadc9a93 Author: Carson Wang <carson.w...@intel.com> Date: 2015-10-20T06:06:07Z Write sql events to event log commit b9870e6579a59628c82470e55f2cb6c4ec8fa2a7 Author: Carson Wang <carson.w...@intel.com> Date: 2015-10-21T06:29:37Z Move sql UI classes to core commit 3833055b7c94247b700e6fd0565629e71611d307 Author: Carson Wang <carson.w...@intel.com> Date: 2015-10-21T06:32:28Z rename SqlMetricInfo class name commit c0abfc6b7432750812a47790fe51b35dacba7429 Author: Carson Wang <carson.w...@intel.com> Date: 2015-10-22T02:22:51Z Update sql metric param commit a5b1cf42a4847b2c9c9674ce0d6aa4d332498ca2 Author: Carson Wang <carson.w...@intel.com> Date: 2015-10-26T02:09:32Z handle accumulator updates in sql history UI commit 7b30bc736f09600b25772e30636f8a5c19c6db5e Author: Carson Wang <carson.w...@intel.com> Date: 2015-10-26T06:51:25Z Use a single SQL Tab for all SparkContext commit d52288bb2e18a5f0e898110893a091919e13ea84 Author: Carson Wang <carson.w...@intel.com> Date: 2015-10-27T08:49:30Z update style --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...
Github user carsonwang commented on a diff in the pull request: https://github.com/apache/spark/pull/9297#discussion_r43218812 --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala --- @@ -504,9 +542,28 @@ private[spark] object JsonProtocol { case `executorRemoved` => executorRemovedFromJson(json) case `logStart` => logStartFromJson(json) case `metricsUpdate` => executorMetricsUpdateFromJson(json) + case `sqlExecutionStart` => sqlExecutionStartFromJson(json) + case `sqlExecutionEnd` => sqlExecutionEndFromJson(json) --- End diff -- Added that. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org