[issue35185] Logger race condition - loses lines if removeHandler called from another thread while logging

2019-06-20 Thread Vinay Sajip


Vinay Sajip  added the comment:

> I'd prefer correctness to be always there automatically, rather than 
> something the user must remember to enable by setting a flag such as 
> lockCallHandling

Yes, I agree, I was just looking at different approaches while mulling all this 
over. Correctness-by-default can of course always be implemented by setting 
lockCallHandling to True by default, which is what I have currently in my 
benchmark testing.

Re. your change:

- self.handlers.remove(hdlr)
+ newhandlers = list(self.handlers)
+ newhandlers.remove(hdlr)
+ self.handlers = hdlr

Did you mean self.handlers = newhandlers in that last line? That's what I 
assumed you meant, but you've posted that twice now, so I'm not sure what 
you're getting at.

I agree that having a slower add/removeHandler is acceptable - certainly better 
than slowing down Logger.handle(). But I'm not sure about

> so without relying on any GIL locking callHandlers will still see the old 
> list or the new list

on platforms other than CPython, since we can't be sure where thread 
pre-emption might happen (i.e. it could happen inside a source code line 
boundary) and locking at least has the benefit of having defined semantics.

The other issue is where someone might have subclassed and overridden 
add/removeHandler methods, for whatever reason. I know it's unlikely, but one 
never knows.

I think we're broadly on the same page. I'm just not yet sure what the best 
approach is :-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35185] Logger race condition - loses lines if removeHandler called from another thread while logging

2019-06-20 Thread Ben Spiller


Ben Spiller  added the comment:

Interesting conversation. :)

Yes I agree correctness is definitely top priority. :) I'd go further and say 
I'd prefer correctness to be always there automatically, rather than something 
the user must remember to enable by setting a flag such as lockCallHandling. 
(as an aside, adding a separate extra code path and option like that would 
require a bit more doc and testing changes than just fixing the bug by making 
the self.handlers list immutable, which is a small and simple change not 
needing extra doc). 

I'm also not convinced it's worth optimizing the performance of add/remove 
logger (which sounds like it's the goal of the callHandlers-locking approach 
you suggest, if I'm understanding correctly?) - since in a realistic 
application is always that's going to be vastly less frequent than invoking 
callhandlers. Especially if it reduces performance of the main logging, which 
is invoked much more often. Though admittedly the 1% regression you quoted 
isn't so bad (assuming that's true in CPython/IronPython/Jython/others). The 
test program I provided is a contrived way of quickly reproducing the race 
condition, but I certainly wouldn't use it for measuring or optimizing 
performance as it wasn't designed for that - the ratio of add/remove loggers to 
callhandlers calls is likely to be unrepresentative of a real application, and 
there's vastly more contention on calling add/remove loggers than you'd see in 
the wild. 

Do you see any downsides to the immutable self.handlers approach, other than 
performance of add/remove logger being a little lower?

Personally I think we're on safer ground if we permit add/remove logger be 
slightly slower (but at least correct! correctness trumps performance), but 
only if we avoid regressing the more important performance of logging itself. 

Does that seem reasonable?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35185] Logger race condition - loses lines if removeHandler called from another thread while logging

2019-06-20 Thread Vinay Sajip


Vinay Sajip  added the comment:

> I'd definitely suggest we go for a solution that doesn't hit performance of 
> normal logging

I agree, but correctness is important. I'm tending to the following:

1. Introduce a lockCallHandling module-level variable, defaulting to False to 
maximise logging performance (and follow current behaviour) and settable to 
True for situations such as your example, where adding/removing handlers from 
threads is wanted.

2. In Logger.handle, add an extra check for lockCallHandling to decide whether 
to lock/release around callHandlers().

BTW a simple benchmark of performance of locking around callHanders vs. not 
locking, using a FileHandler and default StreamHandler, showed an average 
difference of ~ 480 usec per iteration (mean time of 47.310 secs unlocked vs. 
47.784 locked, for a million iterations of logger.debug() - a 1% increase).

> the second idea I suggested should do that

Yes, but the copying seems to make things slower, as suggested by the output of 
your script (in terms of the iteration counts in a fixed amount of time).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35185] Logger race condition - loses lines if removeHandler called from another thread while logging

2019-06-20 Thread Ben Spiller


Ben Spiller  added the comment:

I'd definitely suggest we go for a solution that doesn't hit performance of 
normal logging when you're not adding/removing things, being as that's the more 
common case. I guess that's the reason why callHandlers was originally 
implemented without grabbing the mutex, and we should probably keep it that 
way. Logging can be a performance-critical part of some applications and I feel 
more comfortable about the fix (and more confident it won't get vetoed :)) if 
we can avoid changing callHandlers(). 

You make a good point about ensuring the solution works for non-GIL python 
versions. I thought about it some more... correct me if I'm wrong but as far as 
I can see the second idea I suggested should do that, i.e.
- self.handlers.remove(hdlr)
+ newhandlers = list(self.handlers)
+ newhandlers.remove(hdlr)
+ self.handlers = hdlr

... which effectively changes the model so that the _value_ of the 
self.handlers list is immutable (only which list the self.handlers reference 
points to changes), so without relying on any GIL locking callHandlers will 
still see the old list or the new list but never see an inconsistent value, 
since such a list never exists. That solves the read-write race condition; we'd 
still want to keep the existing locking in add/removeHandler which prevents 
write-write race conditions. 

What do you think?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35185] Logger race condition - loses lines if removeHandler called from another thread while logging

2019-06-19 Thread Vinay Sajip


Vinay Sajip  added the comment:

The other alternative would be to lock around callHandlers(). With the change 
you propose to addHandler/removeHandler, there are no errors, but the output of 
your test program is (with the lock acquisition/release in place):

Thread finished after 468 iterations
Thread finished after 488 iterations
Thread finished after 476 iterations
Thread finished after 462 iterations

If the lock acquisition/release is removed from addHandler/removeHandler, then 
there are still no errors, and the output is:

Thread finished after 479 iterations
Thread finished after 453 iterations
Thread finished after 468 iterations
Thread finished after 469 iterations

If I leave addHandler/removeHandler as they were and add locking around 
callHandlers(), there are no errors and the output is:

Thread finished after 566 iterations
Thread finished after 608 iterations
Thread finished after 612 iterations
Thread finished after 605 iterations

This seems to suggest that locking around callHandlers() is better (more 
performant) than the copying of handler lists involved in your suggestion. Any 
comments on that? Also it will work in non-GIL environments like 
Jython/IronPython.

The callHandlers() locking will of course add a cost even for those situations 
where handlers aren't added and removed in multi-threading scenarios, but I 
haven't benchmarked it yet. It's generally not good practice to add/remove 
handlers willy-nilly in threads, though of course it's not forbidden (e.g. the 
configuration functionality allows a thread to listen for configuration changes 
at runtime and then reconfigure logging, which adds/removes handlers in the 
thread. However, loggers are disabled while the reconfiguration is in progress, 
and some loss of messages would be expected to be tolerable in such a case).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35185] Logger race condition - loses lines if removeHandler called from another thread while logging

2018-11-07 Thread Ned Deily


Change by Ned Deily :


--
nosy: +vinay.sajip

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35185] Logger race condition - loses lines if removeHandler called from another thread while logging

2018-11-07 Thread Ben Spiller


New submission from Ben Spiller :

I just came across a fairly serious thread-safety / race condition bug in the 
logging.Loggers class, which causes random log lines to be lost i.e. not get 
passed to some of the registered handlers, if (other, unrelated) handlers are 
being added/removed using add/removeHandler from another thread during logging. 
This potentially affects all log handler classes, though for timing reasons 
I've found it easiest to reproduce with the logging.FileHandler. 

See attached test program that reproduces this. 

I did some debugging and looks like although add/removeHandler are protected by 
_acquireLock(), they modify the self.handlers list in-place, and the 
callHandlers method iterates over self.handlers with no locking - so if you're 
unlucky you can end up with some of your handlers not being called. 

A trivial way to fix the bug is by editing callHandlers and copying the list 
before iterating over it:
- for hdlr in c.handlers:
+ for hdlr in list(c.handlers):

However since that could affect the performance of routine log statements a 
better fix is probably to change the implementation of add/removeHandler to 
avoid in-place modification of self.handlers so that (as a result of the GIL) 
it'll be safe to iterate over the list in callHandlers, e.g. change 
removeHandler like this:

- self.handlers.remove(hdlr)
+ newhandlers = list(self.handlers)
+ newhandlers.remove(hdlr)
+ self.handlers = hdlr

(and the equivalent in addHandler)

--
components: Library (Lib)
files: logger-race.py
messages: 329429
nosy: benspiller
priority: normal
severity: normal
status: open
title: Logger race condition - loses lines if removeHandler called from another 
thread while logging
type: behavior
versions: Python 2.7, Python 3.4, Python 3.5, Python 3.6, Python 3.7
Added file: https://bugs.python.org/file47914/logger-race.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com