Github user zhouyejoe commented on the issue:
https://github.com/apache/spark/pull/19170
@vanzin Yes, I agree with you that the latest listener will not write these
data into logs. But here is the story. We deployed SHS(Spark History Server)
with LevelDB months ago in our clusters before you started to merge patches
into trunk. We directly used your development branch to build binary only for
History Server. In our cluster, there are multiple different versions of Spark
including Spark 1.6.x and Spark 2.1. Then we started some kind of pressure
testing on this SHS for our internal use cases which requires SHS to analyze
each application logs and create DBs. Maybe we are using SHS too aggressively,
but the GC issue is one of the major issues we met. We also reproduced this
issue using Original SHS without LevelDB. So we created this ticket to solve
the problem which has ran fine for several months. Without this patch, our SHS
with LevelDB would never be in a stable status and cannot serve our users. I
think we are not the only company that has multiple versions of Spar
k in production environment, as far as I know, Netflix is another example. In
case of large scale clusters where thousands of Spark application logs
processed by a single SHS instance, this patch would definitely help.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]