Experience with centralised logging for Spark?

Edward Sargisson Fri, 03 Jul 2015 11:35:28 -0700

Hi all,
I'm wondering if anybody as any experience with centralised logging for
Spark - or even has felt that there was  need for this given the WebUI.


At my organization we use Log4j2 and Flume as the front end of our
centralised logging system. I was looking into modifying Spark to use that
system and I'm reconsidering my approach. I thought I'd ask the community
to see what people have tried.

Log4j2 is important because it works nicely with Flume. The problem I've
got is that all of the Spark processes (master, worker, spark-submit) use
the same conf directory and so would get the same log4j2.xml. This then
means that they would try and use the same directory for the file channel
(which will fail because Flume locks its directory). Secondly, if I want to
add an interceptor to stamp every event with the component name then I
cannot tell the difference between the components - everything would get
'apache-spark'.

This could be fixed by modifying the start up scripts to pass the component
name around; but that's more modification than I really want to make.

So are people generally  happy with the WebUI approach for getting access
to stderr and stdout or have other peopled rolled better solutions?

Yes, I'm aware of https://issues.apache.org/jira/browse/SPARK-6305 and the
associated pull request.

Many thanks, in advance, for your thoughts.

Cheers,
Edward

Experience with centralised logging for Spark?

Reply via email to