[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19170
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2018-06-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19170
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19170
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19170
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19170
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2017-12-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19170
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2017-12-12 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/19170
  
> Maybe we are using SHS too aggressively, but the GC issue is one of the 
major issues we met.

Can you describe what this issue is? That is not what the bug is showing. 
The bug shows a heap dump with a lot of `BlockStatus` objects. I'm saying that 
with the new code, you should not get into that situation, because the SHS does 
not hold on to those objects. Is that not what you see?

If you see `BlockStatus` objects still being referenced then there is 
probably a bug somewhere.

Barring the issue above, this patch to the best of my knowledge would not 
help much with GC. The code still loads data from disk for these events (= 
creates garbage) and still creates json4s objects for it (= more garbage). 
You'd be avoiding a trivial amount of garbage after that by doing this 
filtering.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2017-12-12 Thread zhouyejoe
Github user zhouyejoe commented on the issue:

https://github.com/apache/spark/pull/19170
  
@vanzin Yes, I agree with you that the latest listener will not write these 
data into logs. But here is the story. We deployed SHS(Spark History Server) 
with LevelDB months ago in our clusters before you started to merge patches 
into trunk. We directly used your development branch to build binary only for 
History Server. In our cluster, there are multiple different versions of Spark 
including Spark 1.6.x and Spark 2.1. Then we started some kind of pressure 
testing on this SHS for our internal use cases which requires SHS to analyze 
each application logs and create DBs. Maybe we are using SHS too aggressively, 
but the GC issue is one of the major issues we met. We also reproduced this 
issue using Original SHS without LevelDB. So we created this ticket to solve 
the problem which has ran fine for several months. Without this patch, our SHS 
with LevelDB would never be in a stable status and cannot serve our users. I 
think we are not the only company that has multiple versions of Spar
 k in production environment, as far as I know, Netflix is another example. In 
case of large scale clusters where thousands of Spark application logs 
processed by a single SHS instance, this patch would definitely help.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2017-12-12 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/19170
  
(BTW, you could argue this could be useful in 2.2 and 2.1, because they 
still use the old listener code. But this is just dead code in master and we 
shouldn't merge it there.)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2017-12-12 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/19170
  
The filtering on write is to reduce the size of the event log file. What is 
the filtering on read achieving? Especially since any recent event logs won't 
even have that data?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2017-12-12 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19170
  
It's not a big improvement but makes the code base more consistent.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2017-12-12 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19170
  
if SHS is the only user of `JsonProtocol`, then we should ignore 
BlockStatus update events in `JsonProtocol` as SHS doesn't need it at all.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2017-12-11 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/19170
  
And then the event will be processed and garbage collected and the objects 
will go away?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2017-12-11 Thread zhouyejoe
Github user zhouyejoe commented on the issue:

https://github.com/apache/spark/pull/19170
  
@vanzin The problem still exists with your new changes to Spark History 
Server. Once you use ListenerBus to replay the 
log(https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L664),
 it will use JsonProtocol to create events from Json 
Data(https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala#L85).
 Once use JsonProtocol, the problem still 
exists(https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala#L689).
 
Correct me if I am wrong. Thanks. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2017-12-11 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/19170
  
I'm not asking whether it changes anything else, I'm asking whether it does 
anything anymore.

The bug shows a heap dump with a bunch of `BlockStatus` objects, but the 
SHS does not create those objects anymore to the best of my knowledge, so I'm 
just questioning whether this change is now obsolete.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2017-12-11 Thread zhouyejoe
Github user zhouyejoe commented on the issue:

https://github.com/apache/spark/pull/19170
  
Hi, @vanzin. No, this doesn't change anything else. It only changes how the 
JSON data gets transferred into Events. I was a little bit busy with other 
stuffs. I will fix the unit test. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2017-12-11 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/19170
  
@vanzin I think it just changes to not load BlockStatuses generated by old 
Spark versions.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2017-12-11 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/19170
  
Does this change do anything anymore? I don't think the SHS (nor the UI) 
uses `BlockStatus` anymore for anything.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2017-11-15 Thread zhouyejoe
Github user zhouyejoe commented on the issue:

https://github.com/apache/spark/pull/19170
  
I will work on it. Thanks for review.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2017-11-10 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19170
  
the hight level idea LGTM, just make sure history sever is the only 
consumer for this even logs json parser.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2017-11-10 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/19170
  
Please feel free to fix the test case failures.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2017-11-10 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/19170
  
The change should be safe as long as the extracted BlockStatus accumulable 
info is not used in the web UI. Also cc @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2017-11-06 Thread zhouyejoe
Github user zhouyejoe commented on the issue:

https://github.com/apache/spark/pull/19170
  
@jiangxb1987 Hi, I was waiting for the response from Ryan Blue about the 
ticket SPARK-20084. The fix for the unit test should be pretty straight 
forward. I just need a confirmation on the question I have. Do you have any 
idea?

Original question:
why not the blockstatusupdates are not filtering out in 
executorMetricsUpdate? This line 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala#L245
While I am working on SPARK-21961, I filtered those blockstatusupdates 
while reading from logs in Spark History Server, but it causing some unit test 
failure.
Should it not be filtered out in both executorMetricsUpdateFromJson and 
executorMetricsUpdateToJson?
Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2017-11-06 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/19170
  
ping @zhouyejoe


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2017-09-20 Thread zhouyejoe
Github user zhouyejoe commented on the issue:

https://github.com/apache/spark/pull/19170
  
I will fix the unit test failure. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2017-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19170
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2017-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19170
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81951/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2017-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19170
  
**[Test build #81951 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81951/testReport)**
 for PR 19170 at commit 
[`04c1e2a`](https://github.com/apache/spark/commit/04c1e2aa24c61f13f1df5148416bb00f0649fcaf).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2017-09-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19170
  
**[Test build #81951 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81951/testReport)**
 for PR 19170 at commit 
[`04c1e2a`](https://github.com/apache/spark/commit/04c1e2aa24c61f13f1df5148416bb00f0649fcaf).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2017-09-19 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/19170
  
cc @vanzin 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2017-09-19 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/19170
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...

2017-09-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19170
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org