[GitHub] spark pull request #19170: [SPARK-21961][Core] Filter out BlockStatuses Accu...

2018-08-02 Thread zhouyejoe
Github user zhouyejoe closed the pull request at:

https://github.com/apache/spark/pull/19170


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20744: [SPARK-23608][CORE][WebUI] Add synchronization in SHS be...

2018-03-15 Thread zhouyejoe
Github user zhouyejoe commented on the issue:

https://github.com/apache/spark/pull/20744
  
It is weird that the PySpark unit tests failed, I don't think it is related.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20744: [SPARK-23608][CORE][WebUI] Add synchronization in...

2018-03-14 Thread zhouyejoe
Github user zhouyejoe commented on a diff in the pull request:

https://github.com/apache/spark/pull/20744#discussion_r174548913
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala ---
@@ -150,14 +150,18 @@ class HistoryServer(
   ui: SparkUI,
   completed: Boolean) {
 assert(serverInfo.isDefined, "HistoryServer must be bound before 
attaching SparkUIs")
-ui.getHandlers.foreach(attachHandler)
+handlers.synchronized {
+  ui.getHandlers.foreach(attachHandler)
+}
 addFilters(ui.getHandlers, conf)
--- End diff --

Yes. Thanks for catching this. Will update.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20744: [SPARK-23608][CORE][WebUI] Add synchronization in...

2018-03-12 Thread zhouyejoe
Github user zhouyejoe commented on a diff in the pull request:

https://github.com/apache/spark/pull/20744#discussion_r173981826
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala ---
@@ -148,14 +148,17 @@ class HistoryServer(
   appId: String,
   attemptId: Option[String],
   ui: SparkUI,
-  completed: Boolean) {
+  completed: Boolean): Unit = this.synchronized {
--- End diff --

In order to fine grain the synchronization, can we add synchronize on 
handlers only?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20744: [SPARK-23608][CORE][WebUI] Add synchronization in SHS be...

2018-03-08 Thread zhouyejoe
Github user zhouyejoe commented on the issue:

https://github.com/apache/spark/pull/20744
  
Updated. Please help trigger the jenkins again. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20744: [SPARK-23608][CORE][WebUI] Add synchronization in...

2018-03-06 Thread zhouyejoe
Github user zhouyejoe commented on a diff in the pull request:

https://github.com/apache/spark/pull/20744#discussion_r172609731
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala ---
@@ -148,14 +148,14 @@ class HistoryServer(
   appId: String,
   attemptId: Option[String],
   ui: SparkUI,
-  completed: Boolean) {
+  completed: Boolean) = this.synchronized {
 assert(serverInfo.isDefined, "HistoryServer must be bound before 
attaching SparkUIs")
 ui.getHandlers.foreach(attachHandler)
 addFilters(ui.getHandlers, conf)
   }
 
   /** Detach a reconstructed UI from this server. Only valid after bind(). 
*/
-  override def detachSparkUI(appId: String, attemptId: Option[String], ui: 
SparkUI): Unit = {
+  override def detachSparkUI(appId: String, attemptId: Option[String], ui: 
SparkUI): Unit = this.synchronized {
--- End diff --

Thanks. Will fix the style violation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20744: [SPARK-23608][CORE][WebUI] Add synchronization in...

2018-03-05 Thread zhouyejoe
GitHub user zhouyejoe opened a pull request:

https://github.com/apache/spark/pull/20744

[SPARK-23608][CORE][WebUI] Add synchronization in SHS between attachSparkUI 
and detachSparkUI functions to avoid concurrent modification issue to Jetty 
Handlers

Jetty handlers are dynamically attached/detached while SHS is running. But 
the attach and detach operations might be taking place at the same time due to 
the async in load/clear in Guava Cache. 

## What changes were proposed in this pull request?
Add synchronization between attachSparkUI and detachSparkUI in SHS.

## How was this patch tested?
With this patch, the jetty handlers missing issue never happens again in 
our production cluster SHS.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhouyejoe/spark SPARK-23608

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20744.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20744


commit f4118c03d7ea8cdd9ade5ec44db3e1bcb5f45535
Author: Ye Zhou 
Date:   2018-03-06T02:19:17Z

[SPARK-23608][CORE][WebUI] Add synchronization in SHS between attachSparkUI 
and detachSparkUI functions to avoid concurrent modification issue to Jetty 
Handlers.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2017-12-12 Thread zhouyejoe
Github user zhouyejoe commented on the issue:

https://github.com/apache/spark/pull/19170
  
@vanzin Yes, I agree with you that the latest listener will not write these 
data into logs. But here is the story. We deployed SHS(Spark History Server) 
with LevelDB months ago in our clusters before you started to merge patches 
into trunk. We directly used your development branch to build binary only for 
History Server. In our cluster, there are multiple different versions of Spark 
including Spark 1.6.x and Spark 2.1. Then we started some kind of pressure 
testing on this SHS for our internal use cases which requires SHS to analyze 
each application logs and create DBs. Maybe we are using SHS too aggressively, 
but the GC issue is one of the major issues we met. We also reproduced this 
issue using Original SHS without LevelDB. So we created this ticket to solve 
the problem which has ran fine for several months. Without this patch, our SHS 
with LevelDB would never be in a stable status and cannot serve our users. I 
think we are not the only company that has multiple versions of Spar
 k in production environment, as far as I know, Netflix is another example. In 
case of large scale clusters where thousands of Spark application logs 
processed by a single SHS instance, this patch would definitely help.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2017-12-11 Thread zhouyejoe
Github user zhouyejoe commented on the issue:

https://github.com/apache/spark/pull/19170
  
@vanzin The problem still exists with your new changes to Spark History 
Server. Once you use ListenerBus to replay the 
log(https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L664),
 it will use JsonProtocol to create events from Json 
Data(https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala#L85).
 Once use JsonProtocol, the problem still 
exists(https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala#L689).
 
Correct me if I am wrong. Thanks. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2017-12-11 Thread zhouyejoe
Github user zhouyejoe commented on the issue:

https://github.com/apache/spark/pull/19170
  
Hi, @vanzin. No, this doesn't change anything else. It only changes how the 
JSON data gets transferred into Events. I was a little bit busy with other 
stuffs. I will fix the unit test. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2017-11-15 Thread zhouyejoe
Github user zhouyejoe commented on the issue:

https://github.com/apache/spark/pull/19170
  
I will work on it. Thanks for review.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2017-11-06 Thread zhouyejoe
Github user zhouyejoe commented on the issue:

https://github.com/apache/spark/pull/19170
  
@jiangxb1987 Hi, I was waiting for the response from Ryan Blue about the 
ticket SPARK-20084. The fix for the unit test should be pretty straight 
forward. I just need a confirmation on the question I have. Do you have any 
idea?

Original question:
why not the blockstatusupdates are not filtering out in 
executorMetricsUpdate? This line 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala#L245
While I am working on SPARK-21961, I filtered those blockstatusupdates 
while reading from logs in Spark History Server, but it causing some unit test 
failure.
Should it not be filtered out in both executorMetricsUpdateFromJson and 
executorMetricsUpdateToJson?
Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17412: [SPARK-20084][Core] Remove internal.metrics.updatedBlock...

2017-09-20 Thread zhouyejoe
Github user zhouyejoe commented on the issue:

https://github.com/apache/spark/pull/17412
  
@rdblue Hi, why not the blockstatusupdates are not filtering out in 
executorMetricsUpdate? This line 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala#L245
While I am working on SPARK-21961, I filtered those blockstatusupdates 
while reading from logs in Spark History Server, but it causing some unit test 
failure.
Should it not be filtered out in both executorMetricsUpdateFromJson and 
executorMetricsUpdateToJson? 
Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2017-09-20 Thread zhouyejoe
Github user zhouyejoe commented on the issue:

https://github.com/apache/spark/pull/19170
  
I will fix the unit test failure. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19170: [SPARK-21961][Core] Filter out BlockStatuses Accu...

2017-09-08 Thread zhouyejoe
GitHub user zhouyejoe opened a pull request:

https://github.com/apache/spark/pull/19170

[SPARK-21961][Core] Filter out BlockStatuses Accumulators during replaying 
history logs in Spark History Server

## What changes were proposed in this pull request?

Filter out BlockStatus update data in logs when replaying. If the 
accumulable name matches the accumulableBlacklist, they will be filtered out. 
In this case, no more BlockStatus objects will be created in Spark History 
Server. SPARK-20084 adds a function called "accumulablesToJson", this patch 
adds a function called "accumulablesFromJson" to match.

Remove the manual added BlockStatus updates in Unit test to pass the unit 
test.

## How was this patch tested?
Unit test passed after the unit test change.
Deployed in our production cluster, memory consumption drops a lot. Less 
than 5 short term Full GC happened when replaying more than 4K history logs, 
which has logs generated from 1.6.x and 2.1.0. 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhouyejoe/spark SPARK-21961

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19170.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19170


commit 04c1e2aa24c61f13f1df5148416bb00f0649fcaf
Author: Ye Zhou 
Date:   2017-09-08T23:10:38Z

[SPARK-21961][Core] Filter out BlockStatuses Accumulators during replaying 
history logs in Spark History Server




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18941: [SPARK-21715][WebUI] History Server should not re...

2017-09-07 Thread zhouyejoe
Github user zhouyejoe closed the pull request at:

https://github.com/apache/spark/pull/18941


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18941: [SPARK-21715][WebUI] History Server should not re...

2017-08-22 Thread zhouyejoe
Github user zhouyejoe commented on a diff in the pull request:

https://github.com/apache/spark/pull/18941#discussion_r134587276
  
--- Diff: core/src/main/scala/org/apache/spark/ui/UIUtils.scala ---
@@ -161,6 +161,7 @@ private[spark] object UIUtils extends Logging {
 
   def commonHeaderNodes: Seq[Node] = {
 
+
--- End diff --

Definitely I will. Another thing I found that this is not the problem with 
the standalone spark history server, but also exists in the the live UI running 
in the driver for a running application.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18941: [SPARK-21715][WebUI] History Server should not re...

2017-08-22 Thread zhouyejoe
Github user zhouyejoe commented on a diff in the pull request:

https://github.com/apache/spark/pull/18941#discussion_r134579481
  
--- Diff: core/src/main/scala/org/apache/spark/ui/UIUtils.scala ---
@@ -161,6 +161,7 @@ private[spark] object UIUtils extends Logging {
 
   def commonHeaderNodes: Seq[Node] = {
 
+
--- End diff --

In the past week, I was trying to find out a solution for the overlaps but 
I didn't make it successfully, as I am not so familiar with CSS and frontend 
stuffs. I will try to find some professions in our company to solve this issue. 
Additionally, this line will disable the downloading for the favicon.icon. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18941: [SPARK-21715][CORE] History Server should not res...

2017-08-14 Thread zhouyejoe
GitHub user zhouyejoe opened a pull request:

https://github.com/apache/spark/pull/18941

[SPARK-21715][CORE] History Server should not respond history page html 
conte…

## What changes were proposed in this pull request?
Add the images required by dataTables jQuery plugin that will show up in 
the history page table header. All the images are downloaded from the official 
github site for dataTables. Change the path for the images in the 
dataTables.bootstrap.css to make sure the images can be found. Disable the 
favicon.ico downloading by changing the header in the html.

## How was this patch tested?
Build and unit test passed.
Deployed and check the UI changes. The images are getting correctly 
downloaded and History Server can handle these requests correctly.


![afterthispatch](https://user-images.githubusercontent.com/8699921/29291500-a7f266ec-80f8-11e7-9315-7b1d706338ef.png)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhouyejoe/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18941.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18941


commit ad97b6f7f61399c370af6052c3cad6b39881ab24
Author: Ye Zhou 
Date:   2017-08-14T20:39:28Z

SPARK-21715 History Server should not respond history page html content 
multiple times for only one http request




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org