[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19170 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19170 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19170 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19170 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19170 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19170 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/19170 > Maybe we are using SHS too aggressively, but the GC issue is one of the major issues we met. Can you describe what this issue is? That is not what the bug is showing. The bug shows a heap dump with a lot of `BlockStatus` objects. I'm saying that with the new code, you should not get into that situation, because the SHS does not hold on to those objects. Is that not what you see? If you see `BlockStatus` objects still being referenced then there is probably a bug somewhere. Barring the issue above, this patch to the best of my knowledge would not help much with GC. The code still loads data from disk for these events (= creates garbage) and still creates json4s objects for it (= more garbage). You'd be avoiding a trivial amount of garbage after that by doing this filtering. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...
Github user zhouyejoe commented on the issue: https://github.com/apache/spark/pull/19170 @vanzin Yes, I agree with you that the latest listener will not write these data into logs. But here is the story. We deployed SHS(Spark History Server) with LevelDB months ago in our clusters before you started to merge patches into trunk. We directly used your development branch to build binary only for History Server. In our cluster, there are multiple different versions of Spark including Spark 1.6.x and Spark 2.1. Then we started some kind of pressure testing on this SHS for our internal use cases which requires SHS to analyze each application logs and create DBs. Maybe we are using SHS too aggressively, but the GC issue is one of the major issues we met. We also reproduced this issue using Original SHS without LevelDB. So we created this ticket to solve the problem which has ran fine for several months. Without this patch, our SHS with LevelDB would never be in a stable status and cannot serve our users. I think we are not the only company that has multiple versions of Spar k in production environment, as far as I know, Netflix is another example. In case of large scale clusters where thousands of Spark application logs processed by a single SHS instance, this patch would definitely help. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/19170 (BTW, you could argue this could be useful in 2.2 and 2.1, because they still use the old listener code. But this is just dead code in master and we shouldn't merge it there.) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/19170 The filtering on write is to reduce the size of the event log file. What is the filtering on read achieving? Especially since any recent event logs won't even have that data? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19170 It's not a big improvement but makes the code base more consistent. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19170 if SHS is the only user of `JsonProtocol`, then we should ignore BlockStatus update events in `JsonProtocol` as SHS doesn't need it at all. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/19170 And then the event will be processed and garbage collected and the objects will go away? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...
Github user zhouyejoe commented on the issue: https://github.com/apache/spark/pull/19170 @vanzin The problem still exists with your new changes to Spark History Server. Once you use ListenerBus to replay the log(https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L664), it will use JsonProtocol to create events from Json Data(https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala#L85). Once use JsonProtocol, the problem still exists(https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala#L689). Correct me if I am wrong. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/19170 I'm not asking whether it changes anything else, I'm asking whether it does anything anymore. The bug shows a heap dump with a bunch of `BlockStatus` objects, but the SHS does not create those objects anymore to the best of my knowledge, so I'm just questioning whether this change is now obsolete. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...
Github user zhouyejoe commented on the issue: https://github.com/apache/spark/pull/19170 Hi, @vanzin. No, this doesn't change anything else. It only changes how the JSON data gets transferred into Events. I was a little bit busy with other stuffs. I will fix the unit test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/19170 @vanzin I think it just changes to not load BlockStatuses generated by old Spark versions. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/19170 Does this change do anything anymore? I don't think the SHS (nor the UI) uses `BlockStatus` anymore for anything. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...
Github user zhouyejoe commented on the issue: https://github.com/apache/spark/pull/19170 I will work on it. Thanks for review. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19170 the hight level idea LGTM, just make sure history sever is the only consumer for this even logs json parser. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/19170 Please feel free to fix the test case failures. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/19170 The change should be safe as long as the extracted BlockStatus accumulable info is not used in the web UI. Also cc @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...
Github user zhouyejoe commented on the issue: https://github.com/apache/spark/pull/19170 @jiangxb1987 Hi, I was waiting for the response from Ryan Blue about the ticket SPARK-20084. The fix for the unit test should be pretty straight forward. I just need a confirmation on the question I have. Do you have any idea? Original question: why not the blockstatusupdates are not filtering out in executorMetricsUpdate? This line https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala#L245 While I am working on SPARK-21961, I filtered those blockstatusupdates while reading from logs in Spark History Server, but it causing some unit test failure. Should it not be filtered out in both executorMetricsUpdateFromJson and executorMetricsUpdateToJson? Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/19170 ping @zhouyejoe --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...
Github user zhouyejoe commented on the issue: https://github.com/apache/spark/pull/19170 I will fix the unit test failure. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19170 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19170 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81951/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19170 **[Test build #81951 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81951/testReport)** for PR 19170 at commit [`04c1e2a`](https://github.com/apache/spark/commit/04c1e2aa24c61f13f1df5148416bb00f0649fcaf). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19170 **[Test build #81951 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81951/testReport)** for PR 19170 at commit [`04c1e2a`](https://github.com/apache/spark/commit/04c1e2aa24c61f13f1df5148416bb00f0649fcaf). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/19170 cc @vanzin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/19170 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19170: [SPARK-21961][Core] Filter out BlockStatuses Accumulator...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19170 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org