New submission from Yonatan Goldschmidt <yon.goldschm...@gmail.com>:

We have an application which creates 10,000s of logging.Logger & 
logging.Handler objects. I encountered a major slowdown during GCs which happen 
to collect dead Handler objects, calling logging._removeHandlerRef() along the 
way.
That function looks like:

    def _removeHandlerRef(wr):
        acquire, release, handlers = _acquireLock, _releaseLock, _handlerList
        if acquire and release and handlers:
            acquire()
            try:
                if wr in handlers:
                    handlers.remove(wr)
            finally:
                release()

and is called by a weakref, attached to each Handler created in 
_addHandlerRef().

The slowdown occurs in the "in" operator, and the remove() call. These 
operations on very long lists are expensive. My suggestion is that, without 
modifying the observed behavior of this function, it can be changed to perform 
only one lookup in the list, by simply calling remove() and ignoring the 
ValueError if it gets raised.

Attached is a small script which demonstrates the issue. It creates N1 Handler 
objects, then deletes the strong reference to the last N2 handler objects, and 
triggers a GC, measuring the time it takes. Additionally, we can measure its 
entire execution (since all remaining Handler objects are cleaned up on exit, 
and eventually _removeHandlerRef() is called for each, and it drastically slows 
down the interpreter shutdown).

Running the demo with N1=30k and N2=2k:

    root@8aafe9d1308b:/# python -V
    Python 3.9.2
    root@8aafe9d1308b:/# time python demo.py 30000 2000
    len(logging._handlerList)=30001
    len(logging._handlerList)=28001
    gc.collect() took 1.552838139992673s

    real    0m12.626s
    user    0m12.618s
    sys 0m0.008s

then applying the improving patch...

    root@8aafe9d1308b:/# (cd /usr/local/lib/python3.9/ && patch -p2 < /patch )
    patching file logging/__init__.py
    Hunk #1 succeeded at 826 (offset -19 lines).

and trying again:

    root@8aafe9d1308b:/# time python demo.py 30000 2000
    len(logging._handlerList)=30001
    len(logging._handlerList)=28001
    gc.collect() took 0.8691686679958366s

    real    0m7.134s
    user    0m7.130s
    sys 0m0.004s
    root@8aafe9d1308b:/#

Almost a 2x speedup.

I also tried removing the acquire/release locks (now that the remove operation 
is atomic...). I'm not sure it maintains the semantics, and it didn't make too 
much of a positive effect on the run-time anyway, so I didn't continue looking 
into it.

----------
components: Library (Lib)
files: demo.py
messages: 394229
nosy: Yonatan Goldschmidt
priority: normal
severity: normal
status: open
title: Improving _removeHandlerRef for a very long _handlerList
type: performance
versions: Python 3.11
Added file: https://bugs.python.org/file50060/demo.py

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue44222>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to