Hi all, I'm wondering if anybody as any experience with centralised logging for Spark - or even has felt that there was need for this given the WebUI.
At my organization we use Log4j2 and Flume as the front end of our centralised logging system. I was looking into modifying Spark to use that system and I'm reconsidering my approach. I thought I'd ask the community to see what people have tried. Log4j2 is important because it works nicely with Flume. The problem I've got is that all of the Spark processes (master, worker, spark-submit) use the same conf directory and so would get the same log4j2.xml. This then means that they would try and use the same directory for the file channel (which will fail because Flume locks its directory). Secondly, if I want to add an interceptor to stamp every event with the component name then I cannot tell the difference between the components - everything would get 'apache-spark'. This could be fixed by modifying the start up scripts to pass the component name around; but that's more modification than I really want to make. So are people generally happy with the WebUI approach for getting access to stderr and stdout or have other peopled rolled better solutions? Yes, I'm aware of https://issues.apache.org/jira/browse/SPARK-6305 and the associated pull request. Many thanks, in advance, for your thoughts. Cheers, Edward