[GitHub] spark issue #22907: [SPARK-25896][CORE][WIP] Accumulator should only be upda...

2018-11-21 Thread carsonwang
Github user carsonwang commented on the issue:

https://github.com/apache/spark/pull/22907
  
What if there is a FetchFailure and Spark reruns some tasks in the previous 
succeeded shuffle map stage? That will be a new ShuffleMapStage and we will 
still double counting the accumulators, right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20303: [SPARK-23128][SQL] A new approach to do adaptive executi...

2018-11-15 Thread carsonwang
Github user carsonwang commented on the issue:

https://github.com/apache/spark/pull/20303
  
@cloud-fan @gatorsmile , are you ready to start reviewing this? I can bring 
this update to date.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20303: [SPARK-23128][SQL] A new approach to do adaptive executi...

2018-10-11 Thread carsonwang
Github user carsonwang commented on the issue:

https://github.com/apache/spark/pull/20303
  
@aaron-aa , the committers agreed to start reviewing the code after 2.4 
release.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21754: [SPARK-24705][SQL] ExchangeCoordinator broken when dupli...

2018-08-02 Thread carsonwang
Github user carsonwang commented on the issue:

https://github.com/apache/spark/pull/21754
  
This LGTM as a fix. However, ideally we should also support reusing an 
exchange used in different joins. There is no need to shuffle write the same 
table twice, we just need read it differently. For example in one stage, a 
reducer may read partition 0 to 2, while in another stage a reducer may read 
partition 0 to 3. We just need a different partitionStartIndices to form a 
different ShuffledRowRDD, then we can reuse the Exchange. I should have 
addressed this in my new implementation of adaptive execution, @cloud-fan, 
let's pay attention to it when we reviewing that pr. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20303: [SPARK-23128][SQL] A new approach to do adaptive executi...

2018-01-29 Thread carsonwang
Github user carsonwang commented on the issue:

https://github.com/apache/spark/pull/20303
  
Jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20303: [SPARK-23128][SQL] A new approach to do adaptive executi...

2018-01-17 Thread carsonwang
Github user carsonwang commented on the issue:

https://github.com/apache/spark/pull/20303
  
cc @cloud-fan , @gatorsmile , @yhuai 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20303: [SPARK-23128][SQL] A new approach to do adaptive ...

2018-01-17 Thread carsonwang
GitHub user carsonwang opened a pull request:

https://github.com/apache/spark/pull/20303

[SPARK-23128][SQL] A new approach to do adaptive execution in Spark SQL

## What changes were proposed in this pull request?
This is the co-work with @yucai , @gczsjdy , @chenghao-intel , 
@xuanyuanking 

We'd like to introduce a new approach to do adaptive execution in Spark 
SQL. The idea is described at 
https://docs.google.com/document/d/1mpVjvQZRAkD-Ggy6-hcjXtBPiQoVbZGe3dLnAKgtJ4k/edit?usp=sharing

## How was this patch tested?
Updated ExchangeCoordinatorSuite.
We also tested this with all queries in TPC-DS. 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/carsonwang/spark AE_1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20303.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20303


commit 7e9e57859ae35f70b57ece2fd907464392d88c0f
Author: Carson Wang <carson.wang@...>
Date:   2017-11-22T08:49:01Z

Introduce QueryStage for adapative execution

commit 2aa185ddec4616b60e7596153c3dd62188e4d145
Author: Carson Wang <carson.wang@...>
Date:   2017-12-12T02:19:53Z

Add QueryPlan for ExecutedCommandExec

commit 7c3d182f6b6c21f437fbef1ea5ebb2843934dbc6
Author: Carson Wang <carson.wang@...>
Date:   2017-12-13T05:31:52Z

Add config spark.sql.adaptive.maxNumPostShufflePartitions

commit e9b3075c7799b8de6038cc6492cd9e54c12f108e
Author: Carson Wang <carson.wang@...>
Date:   2017-12-13T07:40:08Z

update commetns

commit 9666c5fa5b42ebabc4e2b099a1f545945e959e6f
Author: Carson Wang <carson.wang@...>
Date:   2017-12-18T08:38:32Z

Fix style

commit 9b29a3c5eb20b76c1ef7f897c07ccdd3a98a27b4
Author: Carson Wang <carson.wang@...>
Date:   2017-12-26T02:22:45Z

fix bug

commit e0b98fbed96c1c07ebf2a1a6846576f705cf2c24
Author: Carson Wang <carson.wang@...>
Date:   2018-01-17T05:59:51Z

update doc and style




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19681: [SPARK-20652][sql] Store SQL UI data in the new a...

2017-12-21 Thread carsonwang
Github user carsonwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/19681#discussion_r158226032
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala
 ---
@@ -0,0 +1,366 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.ui
+
+import java.util.Date
+import java.util.concurrent.ConcurrentHashMap
+import java.util.function.Function
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.{JobExecutionStatus, SparkConf}
+import org.apache.spark.internal.Logging
+import org.apache.spark.scheduler._
+import org.apache.spark.sql.execution.SQLExecution
+import org.apache.spark.sql.execution.metric._
+import org.apache.spark.status.LiveEntity
+import org.apache.spark.status.config._
+import org.apache.spark.ui.SparkUI
+import org.apache.spark.util.kvstore.KVStore
+
+private[sql] class SQLAppStatusListener(
+conf: SparkConf,
+kvstore: KVStore,
+live: Boolean,
+ui: Option[SparkUI] = None)
+  extends SparkListener with Logging {
+
+  // How often to flush intermediate state of a live execution to the 
store. When replaying logs,
+  // never flush (only do the very last write).
+  private val liveUpdatePeriodNs = if (live) 
conf.get(LIVE_ENTITY_UPDATE_PERIOD) else -1L
+
+  // Live tracked data is needed by the SQL status store to calculate 
metrics for in-flight
+  // executions; that means arbitrary threads may be querying these maps, 
so they need to be
+  // thread-safe.
+  private val liveExecutions = new ConcurrentHashMap[Long, 
LiveExecutionData]()
+  private val stageMetrics = new ConcurrentHashMap[Int, LiveStageMetrics]()
+
+  private var uiInitialized = false
+
+  override def onJobStart(event: SparkListenerJobStart): Unit = {
+val executionIdString = 
event.properties.getProperty(SQLExecution.EXECUTION_ID_KEY)
+if (executionIdString == null) {
+  // This is not a job created by SQL
+  return
+}
+
+val executionId = executionIdString.toLong
+val jobId = event.jobId
+val exec = getOrCreateExecution(executionId)
+
+// Record the accumulator IDs for the stages of this job, so that the 
code that keeps
+// track of the metrics knows which accumulators to look at.
+val accumIds = exec.metrics.map(_.accumulatorId).sorted.toList
+event.stageIds.foreach { id =>
+  stageMetrics.put(id, new LiveStageMetrics(id, 0, accumIds.toArray, 
new ConcurrentHashMap()))
+}
+
+exec.jobs = exec.jobs + (jobId -> JobExecutionStatus.RUNNING)
+exec.stages = event.stageIds.toSet
--- End diff --

@vanzin , shall we add the stageIds to the existing stageIds? Otherwise we 
will lose the stageIds in previous jobs?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19877: [SPARK-22681]Accumulator should only be updated o...

2017-12-04 Thread carsonwang
Github user carsonwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/19877#discussion_r154833757
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -1832,6 +1832,27 @@ class DAGSchedulerSuite extends SparkFunSuite with 
LocalSparkContext with TimeLi
 assertDataStructuresEmpty()
   }
 
+  test("accumulator not calculated for resubmitted task in result stage") {
+// just for register
--- End diff --

Just removed that. Thanks @vanzin  for the review.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19877: [SPARK-22681]Accumulator should only be updated once for...

2017-12-04 Thread carsonwang
Github user carsonwang commented on the issue:

https://github.com/apache/spark/pull/19877
  
cc @vanzin 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19877: [SPARK-22681]Accumulator should only updated once...

2017-12-04 Thread carsonwang
GitHub user carsonwang opened a pull request:

https://github.com/apache/spark/pull/19877

[SPARK-22681]Accumulator should only updated once for each task in result 
stage

## What changes were proposed in this pull request?
As the doc says "For accumulator updates performed inside actions only, 
Spark guarantees that each task’s update to the accumulator will only be 
applied once, i.e. restarted tasks will not update the value."
But currently the code doesn't guarantee this.

## How was this patch tested?
New added tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/carsonwang/spark fixAccum

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19877.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19877


commit 882126c2671e1733d572350af9749e9f8bdca1c2
Author: Carson Wang <carson.w...@intel.com>
Date:   2017-12-04T12:23:14Z

Do not update accumulator for resubmitted task in result stage




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19755: [SPARK-22524][SQL] Subquery shows reused on UI SQL tab e...

2017-11-15 Thread carsonwang
Github user carsonwang commented on the issue:

https://github.com/apache/spark/pull/19755
  
Can you please show the UI before and after the change?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18169: [SPARK-20941] [SQL] Fix SubqueryExec Reuse

2017-08-08 Thread carsonwang
Github user carsonwang commented on the issue:

https://github.com/apache/spark/pull/18169
  
@gatorsmile , why was this reverted? Are you going to open another PR to 
fix it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17535: [SPARK-20222][SQL] Bring back the Spark SQL UI wh...

2017-05-08 Thread carsonwang
Github user carsonwang closed the pull request at:

https://github.com/apache/spark/pull/17535


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17535: [SPARK-20222][SQL] Bring back the Spark SQL UI when exec...

2017-05-08 Thread carsonwang
Github user carsonwang commented on the issue:

https://github.com/apache/spark/pull/17535
  
This fix will be included in #17540 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operat...

2017-05-01 Thread carsonwang
Github user carsonwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/17540#discussion_r114238664
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala ---
@@ -73,21 +99,35 @@ object SQLExecution {
   }
   r
 } else {
-  // Don't support nested `withNewExecutionId`. This is an example of 
the nested
-  // `withNewExecutionId`:
+  // Nesting `withNewExecutionId` may be incorrect; log a warning.
+  //
+  // This is an example of the nested `withNewExecutionId`:
   //
   // class DataFrame {
+  //   // Note: `collect` will call withNewExecutionId
   //   def foo: T = withNewExecutionId { 
something.createNewDataFrame().collect() }
   // }
   //
-  // Note: `collect` will call withNewExecutionId
   // In this case, only the "executedPlan" for "collect" will be 
executed. The "executedPlan"
-  // for the outer DataFrame won't be executed. So it's meaningless to 
create a new Execution
-  // for the outer DataFrame. Even if we track it, since its 
"executedPlan" doesn't run,
+  // for the outer Dataset won't be executed. So it's meaningless to 
create a new Execution
+  // for the outer Dataset. Even if we track it, since its 
"executedPlan" doesn't run,
   // all accumulator metrics will be 0. It will confuse people if we 
show them in Web UI.
   //
-  // A real case is the `DataFrame.count` method.
-  throw new IllegalArgumentException(s"$EXECUTION_ID_KEY is already 
set")
+  // Some operations will start nested executions. For example, 
CacheTableCommand will uses
+  // Dataset#count to materialize cached records when caching is not 
lazy. Because there are
+  // legitimate reasons to nest executions in withNewExecutionId, this 
logs a warning but does
+  // not throw an exception to avoid failing at runtime. Exceptions 
will be thrown for tests
+  // to ensure that nesting is avoided.
+  //
+  // To avoid this warning, use nested { ... }
+  if 
(!Option(sc.getLocalProperty(ALLOW_NESTED_EXECUTION)).exists(_.toBoolean)) {
--- End diff --

just ignore this. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...

2017-05-01 Thread carsonwang
Github user carsonwang commented on the issue:

https://github.com/apache/spark/pull/17540
  
Yes, that's reasonable. I was asking because I noticed `withNewExecutionId` 
was added in `hiveResultString` method so it should have been fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operat...

2017-05-01 Thread carsonwang
Github user carsonwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/17540#discussion_r114235457
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala ---
@@ -73,21 +99,35 @@ object SQLExecution {
   }
   r
 } else {
-  // Don't support nested `withNewExecutionId`. This is an example of 
the nested
-  // `withNewExecutionId`:
+  // Nesting `withNewExecutionId` may be incorrect; log a warning.
+  //
+  // This is an example of the nested `withNewExecutionId`:
   //
   // class DataFrame {
+  //   // Note: `collect` will call withNewExecutionId
   //   def foo: T = withNewExecutionId { 
something.createNewDataFrame().collect() }
   // }
   //
-  // Note: `collect` will call withNewExecutionId
   // In this case, only the "executedPlan" for "collect" will be 
executed. The "executedPlan"
-  // for the outer DataFrame won't be executed. So it's meaningless to 
create a new Execution
-  // for the outer DataFrame. Even if we track it, since its 
"executedPlan" doesn't run,
+  // for the outer Dataset won't be executed. So it's meaningless to 
create a new Execution
+  // for the outer Dataset. Even if we track it, since its 
"executedPlan" doesn't run,
   // all accumulator metrics will be 0. It will confuse people if we 
show them in Web UI.
   //
-  // A real case is the `DataFrame.count` method.
-  throw new IllegalArgumentException(s"$EXECUTION_ID_KEY is already 
set")
+  // Some operations will start nested executions. For example, 
CacheTableCommand will uses
+  // Dataset#count to materialize cached records when caching is not 
lazy. Because there are
+  // legitimate reasons to nest executions in withNewExecutionId, this 
logs a warning but does
+  // not throw an exception to avoid failing at runtime. Exceptions 
will be thrown for tests
+  // to ensure that nesting is avoided.
+  //
+  // To avoid this warning, use nested { ... }
+  if 
(!Option(sc.getLocalProperty(ALLOW_NESTED_EXECUTION)).exists(_.toBoolean)) {
--- End diff --

We need check if the value is true instead of only if it exists?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operat...

2017-05-01 Thread carsonwang
Github user carsonwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/17540#discussion_r114234728
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala ---
@@ -73,21 +99,35 @@ object SQLExecution {
   }
   r
 } else {
-  // Don't support nested `withNewExecutionId`. This is an example of 
the nested
-  // `withNewExecutionId`:
+  // Nesting `withNewExecutionId` may be incorrect; log a warning.
+  //
+  // This is an example of the nested `withNewExecutionId`:
   //
   // class DataFrame {
+  //   // Note: `collect` will call withNewExecutionId
   //   def foo: T = withNewExecutionId { 
something.createNewDataFrame().collect() }
   // }
   //
-  // Note: `collect` will call withNewExecutionId
   // In this case, only the "executedPlan" for "collect" will be 
executed. The "executedPlan"
-  // for the outer DataFrame won't be executed. So it's meaningless to 
create a new Execution
-  // for the outer DataFrame. Even if we track it, since its 
"executedPlan" doesn't run,
+  // for the outer Dataset won't be executed. So it's meaningless to 
create a new Execution
+  // for the outer Dataset. Even if we track it, since its 
"executedPlan" doesn't run,
   // all accumulator metrics will be 0. It will confuse people if we 
show them in Web UI.
   //
-  // A real case is the `DataFrame.count` method.
-  throw new IllegalArgumentException(s"$EXECUTION_ID_KEY is already 
set")
+  // Some operations will start nested executions. For example, 
CacheTableCommand will uses
+  // Dataset#count to materialize cached records when caching is not 
lazy. Because there are
+  // legitimate reasons to nest executions in withNewExecutionId, this 
logs a warning but does
+  // not throw an exception to avoid failing at runtime. Exceptions 
will be thrown for tests
+  // to ensure that nesting is avoided.
+  //
+  // To avoid this warning, use nested { ... }
+  if 
(!Option(sc.getLocalProperty(ALLOW_NESTED_EXECUTION)).exists(_.toBoolean)) {
+if (testing) {
+  logWarning(s"$EXECUTION_ID_KEY is already set")
--- End diff --

According to the comment, we should throw the exception here and log 
warning at runtime?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...

2017-05-01 Thread carsonwang
Github user carsonwang commented on the issue:

https://github.com/apache/spark/pull/17540
  
Hi @rdblue , just wanted to confirm that  this also fixed #17535 so we 
should have the UI when executing queries in Spark SQL CLI?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17535: [SPARK-20222][SQL] Bring back the Spark SQL UI when exec...

2017-04-05 Thread carsonwang
Github user carsonwang commented on the issue:

https://github.com/apache/spark/pull/17535
  
Yes, it is closely related but two scenarios of adding 
`SQLExecution.withNewExecutionId`.
Now some tests fail because `withNewExecutionId` is called twice.
```
java.lang.IllegalArgumentException: spark.sql.execution.id is already set
[info]  at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:90)
[info]  at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:164)
[info]  at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.run(InsertIntoHiveTable.scala:311)
[info]  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
[info]  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
[info]  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67)
[info]  at 
org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:295)
[info]  at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$hiveResultString$3.apply(QueryExecution.scala:137)
[info]  at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$hiveResultString$3.apply(QueryExecution.scala:136)
[info]  at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
[info]  at 
org.apache.spark.sql.execution.QueryExecution.hiveResultString(QueryExecution.scala:136)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17535: [SPARK-20222][SQL] Bring back the Spark SQL UI wh...

2017-04-04 Thread carsonwang
GitHub user carsonwang opened a pull request:

https://github.com/apache/spark/pull/17535

[SPARK-20222][SQL] Bring back the Spark SQL UI when executing queries in 
Spark SQL CLI

## What changes were proposed in this pull request?
There is no Spark SQL UI when executing queries in Spark SQL CLI. It worked 
before in 2.1.0.
The method `hiveResultString` is used by `SparkSQLDriver`. It seems the 
line `SQLExecution.withNewExecutionId` was removed accidently. This PR is to 
bring the SQL UI back when using Spark SQL CLI.

## How was this patch tested?
Manually ran bin/spark-sql and executed a query. 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/carsonwang/spark sqlUIFix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17535.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17535


commit 1d843afb9e5a49f429f5723973d0a0457cf36b3b
Author: Carson Wang <carson.w...@intel.com>
Date:   2017-04-05T02:41:42Z

Show Spark SQL UI when using spark sql cli




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16952: [SPARK-19620][SQL]Fix incorrect exchange coordinator id ...

2017-03-09 Thread carsonwang
Github user carsonwang commented on the issue:

https://github.com/apache/spark/pull/16952
  
@gatorsmile  @cloud-fan @yhuai , can you help review and merge this minor 
one line fix? The code change itself is straightforward.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17009: [SPARK-19674][SQL]Ignore non-existing driver accumulator...

2017-02-22 Thread carsonwang
Github user carsonwang commented on the issue:

https://github.com/apache/spark/pull/17009
  
Thanks @cloud-fan . `driver accumulators don't belong to this execution` is 
more appropriate. I'll update the words.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17009: [SPARK-19674][SQL]Ignore non-existing driver accu...

2017-02-22 Thread carsonwang
Github user carsonwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/17009#discussion_r102656311
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLListenerSuite.scala
 ---
@@ -147,6 +147,10 @@ class SQLListenerSuite extends SparkFunSuite with 
SharedSQLContext with JsonTest
 
 checkAnswer(listener.getExecutionMetrics(0), 
accumulatorUpdates.mapValues(_ * 2))
 
+// Non-existing driver accumulator updates should be filtered and no 
exception will be thrown.
--- End diff --

I was doing some own experiments that adds physical operators at runtime. 
The metrics of the added operators are not registered to the execution so I got 
a NoSuchElementException. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17009: [SPARK-19674][SQL]Ignore non-existing driver accumulator...

2017-02-20 Thread carsonwang
Github user carsonwang commented on the issue:

https://github.com/apache/spark/pull/17009
  
cc @cloud-fan @zsxwing 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16952: [SPARK-19620][SQL]Fix incorrect exchange coordinator id ...

2017-02-16 Thread carsonwang
Github user carsonwang commented on the issue:

https://github.com/apache/spark/pull/16952
  
cc @yhuai 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16952: [SPARK-19620][SQL]Fix incorrect exchange coordinator id ...

2017-02-16 Thread carsonwang
Github user carsonwang commented on the issue:

https://github.com/apache/spark/pull/16952
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16952: [SPARK-19620][SQL]Fix incorrect exchange coordina...

2017-02-15 Thread carsonwang
GitHub user carsonwang opened a pull request:

https://github.com/apache/spark/pull/16952

[SPARK-19620][SQL]Fix incorrect exchange coordinator id in the physical plan

## What changes were proposed in this pull request?
When adaptive execution is enabled, an exchange coordinator is used in the 
Exchange operators. For Join, the same exchange coordinator is used for its two 
Exchanges. But the physical plan shows two different coordinator Ids which is 
confusing.

This PR is to fix the incorrect exchange coordinator id in the physical 
plan. The coordinator object instead of the `Option[ExchangeCoordinator]` 
should be used to generate the identity hash code of the same coordinator. 

## How was this patch tested?
manual tests



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/carsonwang/spark FixCoordinatorId

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16952.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16952


commit b2eb68afee9673274122f241f5d9eb64142a509f
Author: Carson Wang <carson.w...@intel.com>
Date:   2017-02-16T06:22:45Z

Fix incorrect exchange coordinator id




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16419: [MINOR][DOC]Fix doc of ForeachWriter to use write...

2016-12-27 Thread carsonwang
GitHub user carsonwang opened a pull request:

https://github.com/apache/spark/pull/16419

[MINOR][DOC]Fix doc of ForeachWriter to use writeStream

## What changes were proposed in this pull request?

Fix the document of `ForeachWriter` to use `writeStream` instead of `write` 
for a streaming dataset.

## How was this patch tested?
Docs only.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/carsonwang/spark FixDoc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16419.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16419


commit 175d941370ba5585f3ba55b5f0998059e9e1b6d7
Author: Carson Wang <carson.w...@intel.com>
Date:   2016-12-28T06:50:01Z

Fix doc of ForeachWriter




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14232][WebUI]Fix event timeline display...

2016-03-29 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/12029#issuecomment-202772769
  
Just noticed another minor issue in the picture. It seems the container Id 
is too long to fit in the black rectangle. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14232][WebUI]Fix event timeline display...

2016-03-29 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/12029#issuecomment-202767219
  
After applying the patch, the event timeline can be showed without 
problems. Picture attached.

![afterpatch](https://cloud.githubusercontent.com/assets/9278199/14101400/cb9b9c62-f5c6-11e5-9022-5d6f580f59e4.jpg)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14232][WebUI]Fix event timeline display...

2016-03-29 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/12029#issuecomment-202746825
  
cc @sarutak 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14232][WebUI]Fix event timeline display...

2016-03-29 Thread carsonwang
GitHub user carsonwang opened a pull request:

https://github.com/apache/spark/pull/12029

[SPARK-14232][WebUI]Fix event timeline display issue when an executor is 
removed with a multiple line reason.

## What changes were proposed in this pull request?
The event timeline doesn't show on job page if an executor is removed with 
a multiple line reason. This PR replaces all new line characters with spaces.


![timelineerror](https://cloud.githubusercontent.com/assets/9278199/14100211/5fd4cd30-f5be-11e5-9cea-f32651a4cd62.jpg)

## How was this patch tested?
Verified on the Web UI.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/carsonwang/spark eventTimeline

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12029.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12029


commit be4aaa667c4e252fdd30490428a29efb59292320
Author: Carson Wang <carson.w...@intel.com>
Date:   2016-03-29T06:36:56Z

Fix event timeline display issue




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13889][YARN][Branch-1.6]Fix the calcula...

2016-03-20 Thread carsonwang
Github user carsonwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/11813#discussion_r56784342
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
@@ -66,14 +66,16 @@ private[spark] class ApplicationMaster(
   // allocation is enabled), with a minimum of 3.
 
   private val maxNumExecutorFailures = {
-val defaultKey =
+val effectiveNumExecutors =
--- End diff --

Ok, let's just leave this alone. For users encountering the issue in 
Spark1.6, they can always manually set `spark.yarn.max.executor.failures` to a 
larger number as a workaround. Thanks @srowen . I'll close this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13889][YARN][Branch-1.6]Fix the calcula...

2016-03-20 Thread carsonwang
Github user carsonwang closed the pull request at:

https://github.com/apache/spark/pull/11813


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13889][YARN][Branch-1.6]Fix the calcula...

2016-03-19 Thread carsonwang
GitHub user carsonwang opened a pull request:

https://github.com/apache/spark/pull/11813

[SPARK-13889][YARN][Branch-1.6]Fix the calculation of the max number of 
executor failure

## What changes were proposed in this pull request?

Backport #11713 to 1.6.
The max number of executor failure before failing the application is 
default to twice the maximum number of executors if dynamic allocation is 
enabled. The default value for "spark.dynamicAllocation.maxExecutors" is 
Int.MaxValue. The calculated value of the default max number of executor 
failure should be Int.MaxValue instead of only 3.

## How was this patch tested?
It tests if the value is greater that Int.MaxValue / 2 to avoid the 
overflow when it multiplies 2.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/carsonwang/spark branch-1.6-ExecutorFailNum

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11813.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11813


commit 17d8bc1f13c3b29e22ecbec6a9f08491e5970368
Author: Carson Wang <carson.w...@intel.com>
Date:   2016-03-18T05:15:40Z

Fix the calculation of the max number of executor failure




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13889][YARN] Fix integer overflow when ...

2016-03-19 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/11713#issuecomment-197923149
  
Thanks @srowen . There is no integer overflow in 1.6 but the max number of 
executor failure is also 3 if dynamic allocation is enabled. It should use 
Int.MaxValue as the default as the doc and other code use. Do you want me to 
submit a fix to 1.6?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13889][YARN][Branch-1.6]Fix the calcula...

2016-03-19 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/11813#issuecomment-198215160
  
cc @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13889][YARN] Fix integer overflow when ...

2016-03-15 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/11713#issuecomment-197105407
  
Without this patch, the application with dynamic allocation enabled will 
fail when only 3 executors are lost.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13889][YARN] Fix integer overflow when ...

2016-03-15 Thread carsonwang
Github user carsonwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/11713#discussion_r56270809
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
@@ -73,7 +73,8 @@ private[spark] class ApplicationMaster(
   } else {
 sparkConf.get(EXECUTOR_INSTANCES).getOrElse(0)
   }
-val defaultMaxNumExecutorFailures = math.max(3, 2 * 
effectiveNumExecutors)
+val defaultMaxNumExecutorFailures = math.max(3,
--- End diff --

Added a comment for it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13889][YARN] Fix integer overflow when ...

2016-03-14 Thread carsonwang
GitHub user carsonwang opened a pull request:

https://github.com/apache/spark/pull/11713

[SPARK-13889][YARN] Fix integer overflow when calculating the max number of 
executor failure

## What changes were proposed in this pull request?
The max number of executor failure before failing the application is 
default to twice the maximum number of executors if dynamic allocation is 
enabled. The default value for "spark.dynamicAllocation.maxExecutors" is 
Int.MaxValue. So this causes an integer overflow and a wrong result. The 
calculated value of the default max number of executor failure is 3. This PR 
adds a check to avoid the overflow.

## How was this patch tested?
It tests if the value is greater that Int.MaxValue / 2 to avoid the 
overflow when it multiplies 2. 




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/carsonwang/spark IntOverflow

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11713.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11713


commit 8469148f32be6ed7a1ccb5178c5ab2d421d71dc4
Author: Carson Wang <carson.w...@intel.com>
Date:   2016-03-15T04:31:14Z

Fix integer overflow when calculating max executor failure number




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Reuse Calendar object in Da...

2016-02-13 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/11090#issuecomment-183801352
  
@srowen The 20 seconds improvements is the difference of the stage time. 
i.e. before the patch, the stage runs 1.6 min. With this path it runs 1.2 min. 
It takes about 1 second to create 1 million `Calendar` records. So it saves 
about 1 microsecond per record. 

Right, `clear()` doesn't un-set the time zone.

#11071 also wants to optimize the `stringToTimestamp` method. In my case, 
the code doesn't call `stringToTimestamp`. I just noticed it also creates 
`Calendar` and `TimeZone` objects which may have the similar issue. Anyway, we 
can fix the issue in `stringToDate` first as this is more obvious?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Reuse Calendar object in Da...

2016-02-05 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/11090#issuecomment-180652254
  
Reusing a `Calendar` object when the method will be called frequently is 
something recommended by "Effective Java" mentioned 
[here](http://www.informit.com/articles/article.aspx?p=1216151=5). The 
more the method is invoked, the more performance improvement we get.

`stringToTimestamp` has the same problem but it uses different `TimeZone`s 
instead of only a GMT `TimeZone`. We need cache the `TimeZone`s in a hash map 
as I did in #11071. It is a little more involved and there is a case that the 
`Calendar` cannot be reused. So I'll keep it as is and make this PR as small as 
possible for now. If it is worth doing, I'll be happy to submit a new PR to 
address it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12688][SQL] Fix spill size metric in un...

2016-02-05 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/10634#issuecomment-180659073
  
@JoshRosen , do you have any further comments?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12688][SQL] Fix spill size metric in un...

2016-02-05 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/10634#issuecomment-180659170
  
/cc @cloud-fan @andrewor14 , did you guys see spill size > 0  when the UI 
was introduced? Can you take a look at this fix?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread carsonwang
Github user carsonwang closed the pull request at:

https://github.com/apache/spark/pull/11071


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Reuse Calendar object in Da...

2016-02-04 Thread carsonwang
GitHub user carsonwang opened a pull request:

https://github.com/apache/spark/pull/11090

[SPARK-13185][SQL] Reuse Calendar object in DateTimeUtils.StringToDate 
method to improve performance

The java `Calendar` object is expensive to create. I have a sub query like 
this `SELECT a, b, c FROM table UV WHERE (datediff(UV.visitDate, 
'1997-01-01')>=0 AND datediff(UV.visitDate, '2015-01-01')<=0))`

The table stores `visitDate` as String type and has 3 billion records. A 
`Calendar` object is created every time `DateTimeUtils.stringToDate` is called. 
By reusing the `Calendar` object, I saw about 20 seconds performance 
improvement for this stage.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/carsonwang/spark SPARK-13185

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11090.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11090


commit 5bbbf917b326062c30a017a76203e97cdb0d0e74
Author: Carson Wang <carson.w...@intel.com>
Date:   2016-02-05T07:02:24Z

Reuse Calendar object in StringToDate method




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Reuse Calendar object in Da...

2016-02-04 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/11090#issuecomment-180233554
  
/cc @srowen @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/11071#issuecomment-180172808
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread carsonwang
Github user carsonwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/11071#discussion_r51972422
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -55,10 +56,19 @@ object DateTimeUtils {
   // this is year -17999, calculation: 50 * daysIn400Year
   final val YearZero = -17999
   final val toYearZero = to2001 + 7304850
-  final val TimeZoneGMT = TimeZone.getTimeZone("GMT")
 
   @transient lazy val defaultTimeZone = TimeZone.getDefault
 
+  // Reuse the TimeZone object as it is expensive to create in each method 
call.
+  final val timeZones = new ConcurrentHashMap[String, TimeZone]
--- End diff --

Added `transient`. The total available timezone IDs should be limited.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/11071#issuecomment-180207538
  
I have a sub query like this `SELECT a, b, c FROM table UV  WHERE 
(datediff(UV.visitDate, '1997-01-01')>=0 AND datediff(UV.visitDate, 
'2015-01-01')<=0)) `
When profiling this stage with Spark 1.6, I noticed a lot time was consumed 
by `DateTimeUtils.stringToDate`. Especially, `TimeZone.getTimeZone` and 
`Calendar.getInstance` are extremely slow. The table stores `visitDate` as 
`String` type and has 3 billion records. This means it creates 3 billion 
`Calendar` and `TimeZone` objects.

`TimeZone.getTimeZone` is a synchronized method and will block other 
threads calling this same method. #10994 fixed one for 
`DateTimeUtils.stringToDate`. But `DateTimeUtils.stringToTimestamp` has the 
same issue so I tried to cache the `TimeZone` objects in a map. The total 
available number of `TimeZone` should be limited.

By reusing `Calendar` object instead of creating it each time in the 
method, I can see more performance improvement. Creating 20 millions `Calendar` 
objects will take more that 20 seconds on my machine. So we will benefit from 
reusing it.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread carsonwang
GitHub user carsonwang opened a pull request:

https://github.com/apache/spark/pull/11071

[SPARK-13185][SQL] Improve the performance of DateTimeUtils by reusing 
TimeZone and Calendar objects

It is expensive to create java TimeZone and Calendar objects in each method 
of DateTimeUtils. We can reuse the objects to improve the performance. In one 
of my Sql queries which calls StringToDate many times, the duration of the 
stage improved from 1.6 minutes to 1.2 minutes.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/carsonwang/spark DateTimeUtilsFix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11071.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11071


commit cb9b157525c05c1a3d33b8c820595cf020a21b43
Author: Carson Wang <carson.w...@intel.com>
Date:   2016-02-04T08:12:59Z

Reuse TimeZone and Calendar objects in DateTimeUtils




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13100] [SQL] improving the performance ...

2016-02-03 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/10994#issuecomment-179567757
  
I think we should also reusing a Calendar object in each thread. It is 
expensive to create java Calendar and Timezone objects each time the method is 
called. I noticed the same problem and I saw performance boost if reusing these 
objects. The duration of one stage was improved from 1.6 minutes to 1.2 
minutes. @wangyang1992 @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13100] [SQL] improving the performance ...

2016-02-03 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/10994#issuecomment-179587937
  
OK. I will do that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12688][SQL] Fix spill size metric in un...

2016-01-24 Thread carsonwang
Github user carsonwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/10634#discussion_r50648300
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java
 ---
@@ -125,7 +125,8 @@ public UnsafeKVExternalSorter(
 
   // reset the map, so we can re-use it to insert new records. the 
inMemSorter will not used
   // anymore, so the underline array could be used by map again.
-  map.reset();
+  final long spillSize = map.reset();
+  taskContext.taskMetrics().incMemoryBytesSpilled(spillSize);
--- End diff --

Sorry for the delay, @JoshRosen . I will update this soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12688][SQL] Fix spill size metric in un...

2016-01-24 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/10634#issuecomment-174398084
  
retest this please 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12688][SQL] Fix spill size metric in un...

2016-01-24 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/10634#issuecomment-174421284
  
@JoshRosen , I now also update `diskBytesSpilled`. Previously it is not 
updated for aggregation. Please help review this. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12688][SQL] Fix spill size metric in un...

2016-01-24 Thread carsonwang
Github user carsonwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/10634#discussion_r50659285
  
--- Diff: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
 ---
@@ -202,6 +201,7 @@ public long spill(long size, MemoryConsumer trigger) 
throws IOException {
 // pages will currently be counted as memory spilled even though that 
space isn't actually
 // written to disk. This also counts the space needed to store the 
sorter's pointer array.
 taskContext.taskMetrics().incMemoryBytesSpilled(spillSize);
+
taskContext.taskMetrics().incDiskBytesSpilled(writeMetrics.shuffleBytesWritten());
--- End diff --

The `spillSize` here is 0 because the data are stored in a map instead of 
this sorter. So `incMemoryBytesSpilled(spillSize)` actually increase 0. We need 
update the `MemoryBytesSpilled` after freeing the memory in the map.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12688][SQL] Fix spill size metric in un...

2016-01-24 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/10634#issuecomment-174422121
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12339] [WebUI] Added a null check that ...

2015-12-20 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/10405#issuecomment-166184298
  
Thanks for catching this. I think the null check here is necessary, and it 
seems the code that really pass a null taskMetrcis is from the `TaskSetManager` 
line 796 when a task is resubmitted because of executor lost.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12399] Display correct error message wh...

2015-12-20 Thread carsonwang
Github user carsonwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/10352#discussion_r48114578
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala ---
@@ -115,7 +117,17 @@ class HistoryServer(
   }
 
   def getSparkUI(appKey: String): Option[SparkUI] = {
-Option(appCache.get(appKey))
--- End diff --

`appCache.getIfPresent` returns null if there is no cached value for the 
appKey. But `appCache.get` will try to obtain that value from a `CacheLoader`, 
cache it and return it. So I think we still need use `appCache.get` here and 
handle the exception. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12399] Display correct error message wh...

2015-12-17 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/10352#issuecomment-165705920
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12399] Display correct error message wh...

2015-12-17 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/10352#issuecomment-165635982
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12399] Display correct error message wh...

2015-12-16 Thread carsonwang
GitHub user carsonwang opened a pull request:

https://github.com/apache/spark/pull/10352

[SPARK-12399] Display correct error message when accessing REST API with an 
unknown app Id

I got an exception when accessing the below REST API with an unknown 
application Id.
`/api/v1/applications/xxx/jobs`
Instead of an exception, I expect an error message "no such app: xxx" which 
is a similar error message when I access `/api/v1/applications/xxx`
```
org.spark-project.guava.util.concurrent.UncheckedExecutionException: 
java.util.NoSuchElementException: no app with key xxx
at 
org.spark-project.guava.cache.LocalCache$Segment.get(LocalCache.java:2263)
at org.spark-project.guava.cache.LocalCache.get(LocalCache.java:4000)
at 
org.spark-project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
at 
org.spark-project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
at 
org.apache.spark.deploy.history.HistoryServer.getSparkUI(HistoryServer.scala:116)
at 
org.apache.spark.status.api.v1.UIRoot$class.withSparkUI(ApiRootResource.scala:226)
at 
org.apache.spark.deploy.history.HistoryServer.withSparkUI(HistoryServer.scala:46)
at 
org.apache.spark.status.api.v1.ApiRootResource.getJobs(ApiRootResource.scala:66)
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/carsonwang/spark unknownAppFix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10352.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10352


commit a7721afdaead1977c78c8afb60ebd21e076a1e01
Author: Carson Wang <carson.w...@intel.com>
Date:   2015-12-17T07:37:23Z

Display correct error message when accessing REST API with an unknown app Id




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...

2015-12-01 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/10061#issuecomment-160889037
  
Hi @JoshRosen, the execution IDs are from the static `SQLExecution` object. 
So I think they are always unique. Yes, previously each `SQLContext` has its 
own `SQLListener`. Since the SQL events are now sent to the same event bus, all 
`SQLListeners` will receives SQL events executed from other `SQLContexts`. If 
we still want to keep one UI Tab for each `SQLContext`, I think we need keep 
something like a SQLContext ID in the SQL event so that the `SQLListener` knows 
if it need handle that event. Is there any strong requirement to keep one UI 
Tab for each `SQLContext`? I remember some users don't want so many UI tabs 
because they create many `SQLContext`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11206] (Followup) Fix SQLListenerMemory...

2015-11-30 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/9991#issuecomment-160858339
  
> It's a bit different but not in the way @carsonwang explained; whether 
you use the hook or handle SparkListenerApplicationEnd, the listener will be 
cleared when the context is shut down.

Sorry I misunderstood it. Ok I will use SparkListenerApplicationEnd. The 
issue @zsxwing  mentioned can probably be addressed in another PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11206] (Followup) Fix SQLListenerMemory...

2015-11-30 Thread carsonwang
Github user carsonwang closed the pull request at:

https://github.com/apache/spark/pull/9991


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11206] (Followup) Fix SQLListenerMemory...

2015-11-30 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/9991#issuecomment-160876864
  
Close this and resubmit #10061


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...

2015-11-30 Thread carsonwang
Github user carsonwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/10061#discussion_r46245965
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala ---
@@ -1263,6 +1264,8 @@ object SQLContext {
*/
   @transient private val instantiatedContext = new 
AtomicReference[SQLContext]()
 
+  @transient private val sqlListener = new AtomicReference[SQLListener]()
--- End diff --

Many unit tests use `sqlContext.listener`. Can you please suggest how to 
update the unit tests if we changed to use an AtomicBoolean?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11206] (Followup) Fix SQLListenerMemory...

2015-11-30 Thread carsonwang
Github user carsonwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/9991#discussion_r46245206
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLListenerSuite.scala
 ---
@@ -343,6 +343,8 @@ class SQLListenerMemoryLeakSuite extends SparkFunSuite {
   .set("spark.sql.ui.retainedExecutions", "50") // Set it to 50 to run 
this test quickly
 val sc = new SparkContext(conf)
 try {
+  // Clear the sql listener created by a previous test suite.
+  SQLContext.clearSqlListener()
--- End diff --

Many unit tests use `sqlContext.listener`. Can you please suggest how to 
update the unit tests if we changed to use an `AtomicBoolean`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...

2015-11-30 Thread carsonwang
GitHub user carsonwang opened a pull request:

https://github.com/apache/spark/pull/10061

[SPARK-11206] Support SQL UI on the history server (resubmit)

On the live web UI, there is a SQL tab which provides valuable information 
for the SQL query. But once the workload is finished, we won't see the SQL tab 
on the history server. It will be helpful if we support SQL UI on the history 
server so we can analyze it even after its execution.

To support SQL UI on the history server:
1. I added an onOtherEvent method to the SparkListener trait and post all 
SQL related events to the same event bus.
2. Two SQL events SparkListenerSQLExecutionStart and 
SparkListenerSQLExecutionEnd are defined in the sql module.
3. The new SQL events are written to event log using Jackson.
4. A new trait SparkHistoryListenerFactory is added to allow the history 
server to feed events to the SQL history listener. The SQL implementation is 
loaded at runtime using java.util.ServiceLoader.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/carsonwang/spark SqlHistoryUI

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10061.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10061


commit fdf9d28362fe991a1df0d5392c9021db78fa7541
Author: Carson Wang <carson.w...@intel.com>
Date:   2015-10-16T07:00:11Z

Update SparkListener to handle other events

commit ff4075d89b480d99d0320390d00a5b10aadc9a93
Author: Carson Wang <carson.w...@intel.com>
Date:   2015-10-20T06:06:07Z

Write sql events to event log

commit b9870e6579a59628c82470e55f2cb6c4ec8fa2a7
Author: Carson Wang <carson.w...@intel.com>
Date:   2015-10-21T06:29:37Z

Move sql UI classes to core

commit 3833055b7c94247b700e6fd0565629e71611d307
Author: Carson Wang <carson.w...@intel.com>
Date:   2015-10-21T06:32:28Z

rename SqlMetricInfo class name

commit c0abfc6b7432750812a47790fe51b35dacba7429
Author: Carson Wang <carson.w...@intel.com>
Date:   2015-10-22T02:22:51Z

Update sql metric param

commit a5b1cf42a4847b2c9c9674ce0d6aa4d332498ca2
Author: Carson Wang <carson.w...@intel.com>
Date:   2015-10-26T02:09:32Z

handle accumulator updates in sql history UI

commit 7b30bc736f09600b25772e30636f8a5c19c6db5e
Author: Carson Wang <carson.w...@intel.com>
Date:   2015-10-26T06:51:25Z

Use a single SQL Tab for all SparkContext

commit d52288bb2e18a5f0e898110893a091919e13ea84
Author: Carson Wang <carson.w...@intel.com>
Date:   2015-10-27T08:49:30Z

update style

commit 7a2acedfc524e5c5887bd783e6fbbe289313306a
Author: Carson Wang <carson.w...@intel.com>
Date:   2015-10-28T05:08:23Z

Throws exception for unknown events

commit caab0bab0299d4eb985b2e5e68cc5813faac6dfb
Author: Carson Wang <carson.w...@intel.com>
Date:   2015-10-28T05:39:07Z

Fix build error

commit 0af5afeafe894614e7c1cb83f343db0a0869ad77
Author: Carson Wang <carson.w...@intel.com>
Date:   2015-10-28T06:07:28Z

Fix style

commit 8d565f24ab365482333d4ff2331c72b01d02358a
Author: Carson Wang <carson.w...@intel.com>
Date:   2015-11-06T02:46:35Z

Avoid moving the sql classes

commit 1954d71d674dc35659cba6f9e3b56d42a756778e
Author: Carson Wang <carson.w...@intel.com>
Date:   2015-11-06T06:15:59Z

code clean

commit 927bae84244f900724c3865aac969e4639594760
Author: Carson Wang <carson.w...@intel.com>
Date:   2015-11-06T07:17:45Z

code clean

commit b03d98bcb6854a01b65fd5e43368580dec192482
Author: Carson Wang <carson.w...@intel.com>
Date:   2015-11-09T05:24:04Z

minor fix

commit 51f913bed91dd95029b404092e8cd7c25b02cad6
Author: Carson Wang <carson.w...@intel.com>
Date:   2015-11-10T02:04:09Z

Fix unit tests

commit bca3f5ffca9a9e6f5c56633812432892af66b4e7
Author: Carson Wang <carson.w...@intel.com>
Date:   2015-11-12T06:07:36Z

Address comments

commit 60033f8da8bd564c01811c25dc9dbfdf897896b7
Author: Carson Wang <carson.w...@intel.com>
Date:   2015-11-12T06:28:31Z

Fix RAT test

commit fe5c16529d821098658cb61ca0f0a21d2a568270
Author: Carson Wang <carson.w...@intel.com>
Date:   2015-11-12T06:36:20Z

Fix style

commit 8d94707d4e44741d5e3c24a146d2bf526cf884ce
Author: Carson Wang <carson.w...@intel.com>
Date:   2015-11-19T03:24:50Z

Address vanzin's comments

commit 56f24bafb3f5b9306bf393220d018e0620f1296d
Author: Carson Wang <carson.w...@intel.com>
Date:   2015-11-19T07:37:23Z

Address hao's comments

commit 690277e37c76b98ba57bc4bd5b0ae2c87669578d
Author: Carson Wang <carson.w...@intel.com>
Date:   2015-11-20T02:39:40Z

Remove one empty line

commit 5270209504a337b50cdbafe8cc3e8f83c6ea9745
Author: Carson Wang <carson.w...@intel.com>
Date:   2015-12-01T06:04:40Z

c

[GitHub] spark pull request: [SPARK-11206] (Followup) Fix SQLListenerMemory...

2015-11-29 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/9991#issuecomment-160494777
  
@vanzin , I wrapped the calls to the hooks with 
`Utils.tryLogNonFatalError`. I didn't clean up the `SQLListener` after a 
application end event because another `SQLContext` created later still wants to 
use the same `SQLListener`. I think the history server does not have the 
problem because there is no reference to the `SQLListener` from a companion 
object in that case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11206] (Followup) Fix SQLListenerMemory...

2015-11-29 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/9991#issuecomment-160495888
  
@zsxwing , do you have any further comments regarding how the `SQLListener` 
is cleaned up?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11206] (Followup) Fix SQLListenerMemory...

2015-11-27 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/9991#issuecomment-160242651
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11206] (Followup) Fix SQLListenerMemory...

2015-11-26 Thread carsonwang
Github user carsonwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/9991#discussion_r46014967
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLListenerSuite.scala
 ---
@@ -343,6 +343,8 @@ class SQLListenerMemoryLeakSuite extends SparkFunSuite {
   .set("spark.sql.ui.retainedExecutions", "50") // Set it to 50 to run 
this test quickly
 val sc = new SparkContext(conf)
 try {
+  // Clear the sql listener created by a previous test suite.
+  SQLContext.clearSqlListener()
--- End diff --

I think we can add a SparkContext stop hook. When SparkContext is being 
stopped, clear the reference. The user doesn't have to call a method to clear 
the sqlListener reference. The sqlListener is added to SparkContext and will 
only be garbage collected when SparkContext is stopped.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11206] (Followup) Fix SQLListenerMemory...

2015-11-26 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/9991#issuecomment-160055041
  
The original purpose of this PR is to fix the `SQLListenerMemoryLeakSuite` 
test failure. This can be resolved by clearing `SQLContext.sqlListener` before 
the test.

To prevent memory leak similar to SPARK-11700, I added a `SparkContext` 
stop hook to clear the `sqlListener` reference. I didn't make 
`SQLContext.clearSqlListener` a public API because it seems a little confusing 
for users to call it. And the `sqlListener` is added to SparkContext, it will 
not be GCed at once even if a user calls `SQLContext.clearSqlListener`. Now we 
clear the reference when the `SparkContext` is being stopped to allow the 
`sqlListener` to be GCed later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11206] (Followup) Fix SQLListenerMemory...

2015-11-26 Thread carsonwang
Github user carsonwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/9991#discussion_r45952482
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLListenerSuite.scala
 ---
@@ -343,6 +343,8 @@ class SQLListenerMemoryLeakSuite extends SparkFunSuite {
   .set("spark.sql.ui.retainedExecutions", "50") // Set it to 50 to run 
this test quickly
 val sc = new SparkContext(conf)
 try {
+  // Clear the sql listener created by a previous test suite.
+  SQLContext.clearSqlListener()
--- End diff --

I see. Is it enough to make `SQLContext.clearSqlListner` public here? So we 
provide a way to clear the reference for users who want the object to be GCed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11206] (Followup) Fix SQLListenerMemory...

2015-11-25 Thread carsonwang
GitHub user carsonwang opened a pull request:

https://github.com/apache/spark/pull/9991

[SPARK-11206] (Followup) Fix SQLListenerMemoryLeakSuite test error

A followup to #9297, fix the SQLListenerMemoryLeakSuite test error. The 
[failure](https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/HADOOP_VERSION=2.0.0-mr1-cdh4.1.2,label=spark-test/4896/)
 occurs because a `sqlListener` created by a previous test suite is not cleared 
in the SQLListenerMemoryLeakSuite.
In the failure case, the previous test suite DateFunctionsSuite has 91 
completed executions. So the error message is `91 was not less than or equal to 
50.`

For test suites extends `SharedSQLContext`, the sqlListener is cleared in 
the method `beforeAll`. Since `SQLListenerMemoryLeakSuite` doesn't extends 
`SharedSQLContext`, the `sqlListener` need to be cleared manually before 
creating the `SQLContext`.

/cc @vanzin @JoshRosen @chenghao-intel 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/carsonwang/spark SQLListenerTestFix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9991.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9991


commit 8ca3031948b56f5f54dced1b58de35ec27f1e370
Author: Carson Wang <carson.w...@intel.com>
Date:   2015-11-26T06:41:10Z

Fix SQLListenerMemoryLeakSuite test error




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...

2015-11-25 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/9297#issuecomment-159825154
  
I just submitted #9991 to fix the test failure. Details are described in 
the new PR. Thanks all!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11206] (Followup) Fix SQLListenerMemory...

2015-11-25 Thread carsonwang
Github user carsonwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/9991#discussion_r45948964
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLListenerSuite.scala
 ---
@@ -343,6 +343,8 @@ class SQLListenerMemoryLeakSuite extends SparkFunSuite {
   .set("spark.sql.ui.retainedExecutions", "50") // Set it to 50 to run 
this test quickly
 val sc = new SparkContext(conf)
 try {
+  // Clear the sql listener created by a previous test suite.
+  SQLContext.clearSqlListener()
--- End diff --

Previously each `SQLContext` has its own `sqlListener`. Because now the SQL 
events are posted to the event bus. All `SQLContext` now share a single 
`sqlListener`. I don't think a user need clear `SQLContext.sqlListener`. This 
is only used by the unit tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...

2015-11-19 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/9297#issuecomment-158264556
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...

2015-11-18 Thread carsonwang
Github user carsonwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/9297#discussion_r45288401
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala ---
@@ -131,6 +135,17 @@ case class SparkListenerApplicationEnd(time: Long) 
extends SparkListenerEvent
 private[spark] case class SparkListenerLogStart(sparkVersion: String) 
extends SparkListenerEvent
 
 /**
+ * Interface for registering listeners defined in other modules like SQL, 
which are used to
+ * rebuild the history UI.
+ */
+trait SparkListenerRegister {
--- End diff --

`SparkHistoryListenerFactory` sounds good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...

2015-11-18 Thread carsonwang
Github user carsonwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/9297#discussion_r45288842
  
--- Diff: core/src/main/scala/org/apache/spark/ui/SparkUI.scala ---
@@ -127,6 +130,11 @@ private[spark] object SparkUI {
   val DEFAULT_RETAINED_STAGES = 1000
   val DEFAULT_RETAINED_JOBS = 1000
 
+  val listenerRegisters: Iterable[SparkListenerRegister] = {
--- End diff --

Oh yes, this can be a local variable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...

2015-11-18 Thread carsonwang
Github user carsonwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/9297#discussion_r45309477
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala ---
@@ -193,38 +214,39 @@ private[sql] class SQLListener(conf: SparkConf) 
extends SparkListener with Loggi
 }
   }
 
-  def onExecutionStart(
-  executionId: Long,
-  description: String,
-  details: String,
-  physicalPlanDescription: String,
-  physicalPlanGraph: SparkPlanGraph,
-  time: Long): Unit = {
-val sqlPlanMetrics = physicalPlanGraph.nodes.flatMap { node =>
-  node.metrics.map(metric => metric.accumulatorId -> metric)
-}
-
-val executionUIData = new SQLExecutionUIData(executionId, description, 
details,
-  physicalPlanDescription, physicalPlanGraph, sqlPlanMetrics.toMap, 
time)
-synchronized {
-  activeExecutions(executionId) = executionUIData
-  _executionIdToData(executionId) = executionUIData
-}
-  }
-
-  def onExecutionEnd(executionId: Long, time: Long): Unit = synchronized {
-_executionIdToData.get(executionId).foreach { executionUIData =>
-  executionUIData.completionTime = Some(time)
-  if (!executionUIData.hasRunningJobs) {
-// onExecutionEnd happens after all "onJobEnd"s
-// So we should update the execution lists.
-markExecutionFinished(executionId)
-  } else {
-// There are some running jobs, onExecutionEnd happens before some 
"onJobEnd"s.
-// Then we don't if the execution is successful, so let the last 
onJobEnd updates the
-// execution lists.
+  override def onOtherEvent(event: SparkListenerEvent): Unit = event match 
{
+case executionStart: SparkListenerSQLExecutionStart =>
+  val physicalPlanGraph = SparkPlanGraph(executionStart.sparkPlanInfo)
+  val sqlPlanMetrics = physicalPlanGraph.nodes.flatMap { node =>
+node.metrics.map(metric => metric.accumulatorId -> metric)
+  }
+  val executionUIData = new SQLExecutionUIData(
+executionStart.executionId,
+executionStart.description,
+executionStart.details,
+executionStart.physicalPlanDescription,
+physicalPlanGraph,
+sqlPlanMetrics.toMap,
+executionStart.time)
+  synchronized {
+activeExecutions(executionStart.executionId) = executionUIData
+_executionIdToData(executionStart.executionId) = executionUIData
+  }
+case executionEnd: SparkListenerSQLExecutionEnd => synchronized {
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...

2015-11-18 Thread carsonwang
Github user carsonwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/9297#discussion_r45309470
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala ---
@@ -193,38 +214,39 @@ private[sql] class SQLListener(conf: SparkConf) 
extends SparkListener with Loggi
 }
   }
 
-  def onExecutionStart(
-  executionId: Long,
-  description: String,
-  details: String,
-  physicalPlanDescription: String,
-  physicalPlanGraph: SparkPlanGraph,
-  time: Long): Unit = {
-val sqlPlanMetrics = physicalPlanGraph.nodes.flatMap { node =>
-  node.metrics.map(metric => metric.accumulatorId -> metric)
-}
-
-val executionUIData = new SQLExecutionUIData(executionId, description, 
details,
-  physicalPlanDescription, physicalPlanGraph, sqlPlanMetrics.toMap, 
time)
-synchronized {
-  activeExecutions(executionId) = executionUIData
-  _executionIdToData(executionId) = executionUIData
-}
-  }
-
-  def onExecutionEnd(executionId: Long, time: Long): Unit = synchronized {
-_executionIdToData.get(executionId).foreach { executionUIData =>
-  executionUIData.completionTime = Some(time)
-  if (!executionUIData.hasRunningJobs) {
-// onExecutionEnd happens after all "onJobEnd"s
-// So we should update the execution lists.
-markExecutionFinished(executionId)
-  } else {
-// There are some running jobs, onExecutionEnd happens before some 
"onJobEnd"s.
-// Then we don't if the execution is successful, so let the last 
onJobEnd updates the
-// execution lists.
+  override def onOtherEvent(event: SparkListenerEvent): Unit = event match 
{
+case executionStart: SparkListenerSQLExecutionStart =>
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...

2015-11-18 Thread carsonwang
Github user carsonwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/9297#discussion_r45309655
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala 
---
@@ -104,21 +104,39 @@ private class LongSQLMetricParam(val stringValue: 
Seq[Long] => String, initialVa
   override def zero: LongSQLMetricValue = new 
LongSQLMetricValue(initialValue)
 }
 
+private[sql] object LongSQLMetricParam extends 
LongSQLMetricParam(_.sum.toString, 0L)
--- End diff --

It seems `LongSQLMetricParam` is the simplest one. Don't have another good 
name in mind at the moment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...

2015-11-18 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/9297#issuecomment-157977639
  
Thanks @vanzin and @chenghao-intel for reviewing. Just pushed updates to 
address the comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...

2015-11-18 Thread carsonwang
Github user carsonwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/9297#discussion_r45309519
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala 
---
@@ -91,7 +91,7 @@ private[sql] class LongSQLMetric private[metric](name: 
String, param: LongSQLMet
   }
 }
 
-private class LongSQLMetricParam(val stringValue: Seq[Long] => String, 
initialValue: Long)
+private[sql] class LongSQLMetricParam(val stringValue: Seq[Long] => 
String, initialValue: Long)
--- End diff --

Removed that after removing another two `[sql]`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...

2015-11-12 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/9297#issuecomment-156287223
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...

2015-11-12 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/9297#issuecomment-156339821
  
Hi @vanzin, I updated the code to address your comments. Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...

2015-11-12 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/9297#issuecomment-156296483
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...

2015-11-12 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/9297#issuecomment-156309871
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...

2015-11-11 Thread carsonwang
Github user carsonwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/9297#discussion_r44615239
  
--- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala ---
@@ -96,6 +114,7 @@ private[spark] object JsonProtocol {
 executorMetricsUpdateToJson(metricsUpdate)
   case blockUpdated: SparkListenerBlockUpdated =>
 throw new MatchError(blockUpdated)  // TODO(ekl) implement this
+  case _ => parse(write(event))
--- End diff --

Hi @steveloughran, I think currently the generic handling is only for new 
events added in the sub modules.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...

2015-11-11 Thread carsonwang
Github user carsonwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/9297#discussion_r44622811
  
--- Diff: core/src/main/scala/org/apache/spark/ui/SparkUI.scala ---
@@ -150,7 +151,14 @@ private[spark] object SparkUI {
   appName: String,
   basePath: String,
   startTime: Long): SparkUI = {
-create(None, conf, listenerBus, securityManager, appName, basePath, 
startTime = startTime)
+val sparkUI = create(
+  None, conf, listenerBus, securityManager, appName, basePath, 
startTime = startTime)
+JsonProtocol.eventRegisters.foreach { eventRegister =>
+  val listener = eventRegister.getListener()
+  listenerBus.addListener(listener)
+  eventRegister.attachUITab(listener, sparkUI)
--- End diff --

Yes. Ok, I changed to only attach the SQL tab when a 
`SparkListenerSQLExecutionStart` event is received for the first time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...

2015-11-11 Thread carsonwang
Github user carsonwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/9297#discussion_r44614968
  
--- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala ---
@@ -96,6 +114,7 @@ private[spark] object JsonProtocol {
 executorMetricsUpdateToJson(metricsUpdate)
   case blockUpdated: SparkListenerBlockUpdated =>
 throw new MatchError(blockUpdated)  // TODO(ekl) implement this
+  case _ => parse(write(event))
--- End diff --

I tried using the `JsonTypeInfo` annotation and it worked. The difference 
is that it need write the full class name now. The event will look like 
```
"Event":"org.apache.spark.sql.execution.ui.SparkListenerSQLExecutionStart"
```
When deserializing the data, we use reflection to find the class so that 
the class list is not needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...

2015-11-09 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/9297#issuecomment-155348070
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...

2015-10-30 Thread carsonwang
Github user carsonwang commented on the pull request:

https://github.com/apache/spark/pull/9297#issuecomment-152536961
  
@vanzin  Thanks a lot for the comment. This sounds great and is very 
helpful. I agree it is not a good idea to move more stuff to the core. The 
underlying code change you mentioned sounds a lot of works to do, especially 
when we consider backwards compatibility. Can we apply part of them so that we 
can support SQL UI on the history server in this PR and also avoid moving the 
SQL classes to the core? I was thinking adding a `toJson` method to the 
`SparkListenerEvent`. As you mentioned when it is not a sealed trait, we can 
define sql events in the sql module. For existing events in the core, we still 
use the `JsonProtocol` in the `toJson` method to convert the event to json. For 
sql events, we can define `toJson` method in the sql module. This way we will 
be able to write the sql events to the log. Then we can use something you 
mentioned to feed events from the history server to the sql listener.

I know this might still not be very scalable. But we don't make things 
worse once we are able to avoid moving sql classes to the core and also support 
SQL UI on the history server. Do you think this is doable for this ticket?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...

2015-10-27 Thread carsonwang
GitHub user carsonwang opened a pull request:

https://github.com/apache/spark/pull/9297

[SPARK-11206] Support SQL UI on the history server

On the live web UI, there is a SQL tab which provides valuable information 
for the SQL query. But once the workload is finished, we won't see the SQL tab 
on the history server. It will be helpful if we support SQL UI on the history 
server so we can analyze it even after its execution.

To support SQL UI on the history server, I add an `onOtherEvent` method to 
the `SparkListener` trait and post all SQL related events to the same event 
bus. Two SQL events `SparkListenerSQLExecutionStart` and 
`SparkListenerSQLExecutionEnd` are added. 

The history server, the standalone Master that rebuilds the web UI, and the 
event log listener which writes events to the storage are all in the core 
module. These components have to reference some necessary SQL classes like the 
SQL events, the SQLTab, etc. It will be best if we can make these components 
downstream of sql in the future. Currently I have to move these classes to 
core. 
This change also make a single SQLTab for all `SQLContext` because they can 
now share a single `SQLListener`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/carsonwang/spark SqlHistoryUI

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9297.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9297


commit fdf9d28362fe991a1df0d5392c9021db78fa7541
Author: Carson Wang <carson.w...@intel.com>
Date:   2015-10-16T07:00:11Z

Update SparkListener to handle other events

commit ff4075d89b480d99d0320390d00a5b10aadc9a93
Author: Carson Wang <carson.w...@intel.com>
Date:   2015-10-20T06:06:07Z

Write sql events to event log

commit b9870e6579a59628c82470e55f2cb6c4ec8fa2a7
Author: Carson Wang <carson.w...@intel.com>
Date:   2015-10-21T06:29:37Z

Move sql UI classes to core

commit 3833055b7c94247b700e6fd0565629e71611d307
Author: Carson Wang <carson.w...@intel.com>
Date:   2015-10-21T06:32:28Z

rename SqlMetricInfo class name

commit c0abfc6b7432750812a47790fe51b35dacba7429
Author: Carson Wang <carson.w...@intel.com>
Date:   2015-10-22T02:22:51Z

Update sql metric param

commit a5b1cf42a4847b2c9c9674ce0d6aa4d332498ca2
Author: Carson Wang <carson.w...@intel.com>
Date:   2015-10-26T02:09:32Z

handle accumulator updates in sql history UI

commit 7b30bc736f09600b25772e30636f8a5c19c6db5e
Author: Carson Wang <carson.w...@intel.com>
Date:   2015-10-26T06:51:25Z

Use a single SQL Tab for all SparkContext

commit d52288bb2e18a5f0e898110893a091919e13ea84
Author: Carson Wang <carson.w...@intel.com>
Date:   2015-10-27T08:49:30Z

update style




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...

2015-10-27 Thread carsonwang
Github user carsonwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/9297#discussion_r43218812
  
--- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala ---
@@ -504,9 +542,28 @@ private[spark] object JsonProtocol {
   case `executorRemoved` => executorRemovedFromJson(json)
   case `logStart` => logStartFromJson(json)
   case `metricsUpdate` => executorMetricsUpdateFromJson(json)
+  case `sqlExecutionStart` => sqlExecutionStartFromJson(json)
+  case `sqlExecutionEnd` => sqlExecutionEndFromJson(json)
--- End diff --

Added that. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >