New submission from Yonatan Goldschmidt <[email protected]>:
We have an application which creates 10,000s of logging.Logger &
logging.Handler objects. I encountered a major slowdown during GCs which happen
to collect dead Handler objects, calling logging._removeHandlerRef() along the
way.
That function looks like:
def _removeHandlerRef(wr):
acquire, release, handlers = _acquireLock, _releaseLock, _handlerList
if acquire and release and handlers:
acquire()
try:
if wr in handlers:
handlers.remove(wr)
finally:
release()
and is called by a weakref, attached to each Handler created in
_addHandlerRef().
The slowdown occurs in the "in" operator, and the remove() call. These
operations on very long lists are expensive. My suggestion is that, without
modifying the observed behavior of this function, it can be changed to perform
only one lookup in the list, by simply calling remove() and ignoring the
ValueError if it gets raised.
Attached is a small script which demonstrates the issue. It creates N1 Handler
objects, then deletes the strong reference to the last N2 handler objects, and
triggers a GC, measuring the time it takes. Additionally, we can measure its
entire execution (since all remaining Handler objects are cleaned up on exit,
and eventually _removeHandlerRef() is called for each, and it drastically slows
down the interpreter shutdown).
Running the demo with N1=30k and N2=2k:
root@8aafe9d1308b:/# python -V
Python 3.9.2
root@8aafe9d1308b:/# time python demo.py 30000 2000
len(logging._handlerList)=30001
len(logging._handlerList)=28001
gc.collect() took 1.552838139992673s
real 0m12.626s
user 0m12.618s
sys 0m0.008s
then applying the improving patch...
root@8aafe9d1308b:/# (cd /usr/local/lib/python3.9/ && patch -p2 < /patch )
patching file logging/__init__.py
Hunk #1 succeeded at 826 (offset -19 lines).
and trying again:
root@8aafe9d1308b:/# time python demo.py 30000 2000
len(logging._handlerList)=30001
len(logging._handlerList)=28001
gc.collect() took 0.8691686679958366s
real 0m7.134s
user 0m7.130s
sys 0m0.004s
root@8aafe9d1308b:/#
Almost a 2x speedup.
I also tried removing the acquire/release locks (now that the remove operation
is atomic...). I'm not sure it maintains the semantics, and it didn't make too
much of a positive effect on the run-time anyway, so I didn't continue looking
into it.
----------
components: Library (Lib)
files: demo.py
messages: 394229
nosy: Yonatan Goldschmidt
priority: normal
severity: normal
status: open
title: Improving _removeHandlerRef for a very long _handlerList
type: performance
versions: Python 3.11
Added file: https://bugs.python.org/file50060/demo.py
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue44222>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com