> On Dec 4, 2016, at 9:50 AM, Jean-Paul Calderone <exar...@twistedmatrix.com> 
> wrote:
> 
> On Sun, Dec 4, 2016 at 12:50 AM, Glyph Lefkowitz <gl...@twistedmatrix.com 
> <mailto:gl...@twistedmatrix.com>> wrote:
> Following up on a Stack Overflow question from some time ago, 
> http://stackoverflow.com/questions/40604545/twisted-using-connectprotocol-to-connect-endpoint-cause-memory-leak?noredirect=1#comment68573508_40604545
>  
> <http://stackoverflow.com/questions/40604545/twisted-using-connectprotocol-to-connect-endpoint-cause-memory-leak?noredirect=1#comment68573508_40604545>
>  since the submitter added a minimal reproducer, I used Heappy 
> https://pypi.org/project/guppy/ <https://pypi.org/project/guppy/> to look at 
> memory sizing, and seeing large numbers of Logger instances and type objects 
> leaking when using client endpoints.  It was not immediately obvious to me 
> where the leak is occurring though, as I was careful to clean up the Deferred 
> results and not leave them in a failure state.
> 
> I am hoping that I can entice someone else to diagnose this far enough to 
> actually file a bug :-).
> 
> 
> Answered.  I didn't file a bug, I'll let someone else with ideas about 
> twisted.logger think about what exactly the bug is.

I wrote up this bug, and will file it when the ability to file bugs on Trac 
comes back:

twisted.logger._initialBuffer can consume a surprisingly large amount of memory 
if logging is not initialized


The way that `twisted.logger` is supposed to work is that at process startup 
time, the global log observer has a ring buffer for any messages emitted before 
logging is initialized, and emit those messages to the initial set of log 
observers passed to `globalLogBeginner.beginLoggingTo`.

The size of this buffer (in `twisted.logger._buffer._DEFAULT_BUFFER_MAXIMUM` is 
65535.  This value was selected arbitrarily, probably because somebody (me or 
wsanchez) thought "huh, yeah, 64k, that's probably a fine number); but of 
course, that's 64k ''bytes''.

If this were a buffer of actual formatted log messages, of say 200 bytes each, 
that would be about 13 megabytes, which is maybe an acceptable amount of RAM to 
spend on a log buffer.

However, it isn't that.  It's a buffer of 64k log ''events'', each of which 
probably has a `log_logger` and `log_source` set, each of which is an object 
attached to potentially arbitrary data.  For example, every `Factory` that 
starts up logs something, which means you're holding on to an instance, and an 
instance dictionary, and the protocol instance, and the protocol instance 
dictionary.  Worse yet, any logged ''failures'' might hold on to all the stuff 
on their stack.

Add it all up and you end up with a log buffer totaling in the hundreds of 
megabytes, or even gigabytes, once it's full.  In an application that naively 
uses Twisted without ever initializing logging, this hangs around forever.

This buffer should probably be a ''lot'' smaller, and we might want to emit a 
warning when it fills up, reminding people that it is ''only polite'' to start 
up the logging subsystem, even just to explicitly throw logs away.

Text is here in case someone else manages to make trac come back and would like 
to file it before I get back :).

-glyph

_______________________________________________
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

Reply via email to