JoshRosen opened a new pull request, #36747:
URL: https://github.com/apache/spark/pull/36747

   ### What changes were proposed in this pull request?
   
   This PR addresses a performance problem in Log4J 2 related to exception 
logging: in certain scenarios I observed that Log4J2's default exception 
stacktrace logging can be ~10x slower than Log4J 1.
   
   The problem stems from a new log pattern format in Log4J2 called ["extended 
exception"](https://logging.apache.org/log4j/2.x/manual/layouts.html#PatternExtendedException),
 which enriches the regular stacktrace string with information on the name of 
the JAR files that contained the classes in each stack frame.
   
   Log4J queries the classloader to determine the source JAR for each class. 
This isn't cheap, but this information is cached and reused in future exception 
logging calls. In certain scenarios involving runtime-generated classes, this 
lookup will fail and the failed lookup result will _not_ be cached. As a 
result, expensive classloading operations will be performed every time such an 
exception is logged. In addition to being very slow, these operations take out 
a lock on the classloader and thus can cause severe lock contention if multiple 
threads are logging errors. This issue is described in more detail in [a 
comment on a Log4J2 
JIRA](https://issues.apache.org/jira/browse/LOG4J2-2391?focusedCommentId=16667140&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16667140)
 and in a linked blogpost. Spark frequently uses generated classes and lambdas 
and thus Spark executor logs will almost always trigger this edge-case and 
suffer from poor performance. 
   
   By default, if you do not specify an explicit exception format in your 
logging pattern then Log4J2 will add this "extended exception" pattern (see 
PatternLayout's alwaysWriteExceptions flag in Log4J's documentation, plus [the 
code implementing that 
flag](https://github.com/apache/logging-log4j2/blob/d6c8ab0863c551cdf0f8a5b1966ab45e3cddf572/log4j-core/src/main/java/org/apache/logging/log4j/core/pattern/PatternParser.java#L206-L209)
 in Log4J2).
   
   In this PR, I have updated Spark's default Log4J2 configurations so that 
each pattern layout includes an explicit %ex so that it uses the normal 
(non-extended) exception logging format. This is the workaround that is 
currently recommended on the Log4J JIRA.
   
   
   
   ### Why are the changes needed?
   
   Avoid performance regressions in Spark programs which use Spark's default 
Log4J 2 configuration and log many exceptions. Although it's true that any 
program logging exceptions at a high rate should probably just fix the source 
of the exceptions, I think it's still a good idea for us to try to fix this 
out-of-the-box performance difference so that users' existing workloads do not 
regress when upgrading to 3.3.0.
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as 
the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes 
- provide the console output, description and/or an example to show the 
behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to 
the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   Yes: it changes the default exception logging format so that it matches 
Log4J 1's default rather than Log4J 2's. The new format is consistent with 
behavior in previous Spark versions, but is different than the behavior in the 
current Spark 3.3.0-rc3.
   
   ### How was this patch tested?
   
   Existing tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to