Best practices of maintaining a long running SparkContext

Zhong Wang Mon, 07 Mar 2016 17:36:04 -0800

Hi zeppelin-users,

Because Zeppelin relies on a long running SparkContext, it is quite
important to make it stable to improve availability. From my experience, I
run into a couple of issues if I run a SparkContext for several days,
including:
--
1. EventLoggong doest work due to HDFS lease issue. Similar to this:
https://mail-archives.apache.org/mod_mbox/spark-user/201507.mbox/%3ccae6kwsp_c00gksmnx0obu5aouxphdjs-syqywt-jfi3psvc...@mail.gmail.com%3E
2. SparkUI is getting slower due to large number of history jobs
3. Cached data is gone mystically


They may not be Zeppelin issues, but I would like to hear the problems you
run into, and your experience of how to deal with maintaining a long
running SparkContext.

I know that we can do some cleanups periodically by restarting the spark
interpreter, but I am wondering whether there are better ways.

Thanks!

Zhong

Best practices of maintaining a long running SparkContext

Reply via email to