New submission from David Fritz <dfr...@purestorage.com>:

I believe this also impacts 3.10 and 3.11 based on the original bpo-45401 which 
led to this change. Prior to commit ac421c348b in the 3.9 branch there were no 
additional os.path checks in the shouldRollover() methods of 
RotatingFileHandler and TimedRotatingFileHandler, which led to maximum 
performance regardless of where log files existed. With the new 
"os.path.exists(self.baseFilename) and not os.path.isfile(self.baseFilename)" 
checks added to these functions, a varying performance degradation is seen if 
the log lives on a filesystem that is not a local disk. It is hard for me to 
strictly call this a "regression", since I understand the reason of the 
original change (only rollover regular files), but I feel this is something 
developers should be aware of, or possibly have a way to disable (without 
making their own subclass) since this cost was not incurred before.

For example, let's say we have user home folders as mounts via S3FS, and each 
user has a log stored in their home folders. Since there are now os.path checks 
on every single emit call, performance of logging is now directly impacted by 
networking latency if using one of these native logging handlers. This came as 
quite a surprise when upgrading to 3.9.8+, when operations that were previously 
taking ~2 seconds were now taking ~2 1/2 minutes. This is reproducible (once 
you have a similar external filesystem mounted) using the following script 
(also attached):


# vi logtest.py
import logging
import logging.handlers

logging.basicConfig(filename="test.log", level=logging.INFO)
logger = logging.getLogger("test_logger")
log_handler = logging.handlers.TimedRotatingFileHandler("test.log", when='D', 
backupCount=2)
logger.addHandler(log_handler)

for i in range(0, 10000):
    logger.info("iteration: %d", i)


As seen between the commits, cost spikes due to the new calls (double the 
amount of log emit calls):

$ time /build_397/bin/python3 -m cProfile logtest.py |awk '$5>0.00'
...[truncated output to highest percall items]...
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  12/1    0.000    0.000    0.549    0.549 {built-in method builtins.exec} # 
3.9.7_d57d33c234 (no posix.stat calls)
  12/1    0.000    0.000   70.125   70.125 {built-in method builtins.exec} # 
3.9.7_ac421c348b
 20100   68.972    0.003   68.972    0.003 {built-in method posix.stat}    # 
3.9.7_ac421c348b

This performance degradation is further compounded if for example you have 
logging being performed inside threads, where locks have to be obtained. This 
is actually how I noticed it originally, because I was attempting to use the 
cProfile module to find what was taking so long. I ended up testing several 
python versions to make sure I was not crazy, before identifying the exact 
commit where the change in performance occurred.

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    67    0.661    0.010    0.661    0.010 {method 'acquire' of '_thread.lock' 
objects} # 3.7.8
    67    1.237    0.018    1.237    0.018 {method 'acquire' of '_thread.lock' 
objects} # 3.9.0
    67    1.293    0.019    1.293    0.019 {method 'acquire' of '_thread.lock' 
objects} # 3.9.4
    67    0.756    0.011    0.756    0.011 {method 'acquire' of '_thread.lock' 
objects} # 3.9.6
    67    1.218    0.018    1.218    0.018 {method 'acquire' of '_thread.lock' 
objects} # 3.9.7
    67  152.411    2.275  152.411    2.275 {method 'acquire' of '_thread.lock' 
objects} # 3.9.8
    67  152.382    2.274  152.382    2.274 {method 'acquire' of '_thread.lock' 
objects} # 3.9.9

Ultimately I do not have a recommendation for a native fix that keeps both the 
improvement from the commit, and the performance from before the commit. 
Perhaps caching the results of the checks until there is a rollover? Only 
perform the check on initial handler creation? Otherwise it appears the only 
solution is to subclass and override the shouldRollover() to not include the 
os.path checks and match <=3.9.7 behavior. This is the solution I am 
implementing in the codebase/environment where this was discovered while this 
report is discussed further. This feels less than ideal, but if that is the 
consensus then hopefully this report can help the next person to come across 
it. Thank you for your time and all the hard work put into Python!

----------
files: logtest.py
messages: 409381
nosy: dfritz
priority: normal
severity: normal
status: open
title: Log emit performance degradation in RotatingFileHandlers due to 
filesystem checks
type: performance
versions: Python 3.10, Python 3.11, Python 3.9
Added file: https://bugs.python.org/file50529/logtest.py

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue46207>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to