Apache Spark Log4j logging applicationId

Luca Borin Tue, 23 Jul 2019 22:05:58 -0700

Hi,

I would like to add the applicationId to all logs produced by Spark through
Log4j. Consider that I have a cluster with several jobs running in it, so
the presence of the applicationId would be useful to logically divide them.


I have found a partial solution. If I change the layout of the
PatternLayout logger, I can add the print of the ThreadContext (see here
<https://logging.apache.org/log4j/2.x/manual/thread-context.html>), which
can be used to add through MDC the information of the applicationId (see
here
<https://stackoverflow.com/questions/54706582/output-spark-application-id-in-the-logs-with-log4j>).
This works for the driver, but I would like to add this information at
Spark application startup, both for driver and workers. Notice that I'm
working with a managed environment (Databricks), so I'm partially limited
in cluster management. One workaround to execute the put of the parameter
through MDC to all workers is to use a broadcast variable and perform an
action with it, but I don't think it is stable, considering that this
should work also if the worker machine restarts or is substituted.

Thank you

Apache Spark Log4j logging applicationId

Reply via email to