A couple of strategic gc.collect() calls can be useful. You can also tweak how the garbage collector gets run by changing settings in the gc module.
-Chris On Fri, Oct 21, 2005 at 04:13:09PM -0400, Robby Dermody wrote: > > Hey guys (thus begins a book of a post :), > > I'm in the process of writing a commercial VoIP call monitoring and > recording application suite in python and pyrex. Basically, this > software sits in a VoIP callcenter-type environment (complete with agent > phones and VoIP servers), sniffs voice data off of the network, and > allows users to listen into calls. It can record calls as well. The > project is about a year and 3 months in the making and lately the > codebase has stabilized enough to where it can be used by some of our > clients. The entire project has about 37,000 lines of python and pyrex > code (along with 1-2K lines of unrelated java code). > > Now, some disjointed rambling about the architecture of this software. > This software has two long-running server-type components. One > component, the "director" application, is written in pure python and > makes use of the twisted, nevow, and kinterbasdb libraries (which I > realize link to some C extensions). The other component, the > "harvester", is a mixture of python and pyrex, and makes use of the > twisted library, along with using the C libs libpcap and glib on the > pyrex end. Basically, the director is the "master" component. A single > director process interacts with users of the system through a web and/or > pygtk client application interface and can coordinate 1 to n harvesters > spread about the world. The harvester is the "heavy lifter" component > that sniffs the network traffic and sifts out the voice and signalling > data. It then updates the director of call status changes, and can > provide users of the system access to the data. It records the data to > disk as well. The scalibility of this thing is really cool: given a > single director sitting somewhere coordinating the list of agents, > multiple harvester can be placed anywhere there is voice traffic. A user > that logs into the director can end up seeing the activity of all of > these seperate voice networks presented like a single giant mesh. > > Overall, I have been very pleased with python and the 3rd party > libraries that I use (twisted, nevow, kinterbasdb and pygtk). It is a > joy to program with, and I think the python community has done a fine > job. However, as I have been running the software lately and profiling > its memory usage, the one and only Big Problem I have seen is that of > the memory usage. Ideally, the server application(s) should be able to > run indefinitely, but from the results I'm seeing I will end up > exhausting the memory on a 2 GB machine in 2 to 3 days of heavy load. > > Now normally I would not raise up an issue like this on this list, but > based on the conversations held on this list lately, and the work done > by Evan Jones (http://evanjones.ca/python-memory.html), I am led to > believe that this memory usage -- while partially due to some probably > leaks in my program -- is largely due to the current python gc. I have > some graphs I made to show the extent of this memory usage growth: > > http://public.robbyd.fastmail.fm/iq-graph1.gif > > http://public.robbyd.fastmail.fm/iq-graph-director-rss.gif > > http://public.robbyd.fastmail.fm/iq-graph-harv-rss.gif > > The preceding three diagrams are the result of running the 1 director > process and 1 harvester process on the same machine for about 48 hours. > This is the most basic configuration of this software. I was running > this application through /usr/bin/python (CPython) on a Debian 'testing' > box running Linux 2.4 with 2GB of memory and Python version 2.3.5. > During that time, I gathered the resident and virtual memory size of > each component at 120 second intervals. I then imported this data into > MINITAB and did some plots. The first one is a graph of the resident > (RSS) and virtual memory usage of the two applications. The second one > is a zoomed in graph of the director's resident memory usage (complete > with a best fit quadratic), and the 3rd one is a zoomed in graph of the > harvester's resident memory usage. > > To give you an idea of the network load these apps were undergoing > during this sampling time, by the time 48 hours had passed, the > harvester had gathered and parsed about 900 million packets. During the > day there will be 50-70 agents talking. This number goes to 10-30 at night. > > In the diagrams above, one can see the night-day separation clearly. At > night, the memory usage growth seemed to all but stop, but with the > increased call volume of the day, it started shooting off again. When I > first started gathering this data, I was hoping for a logarithmic curve, > but at least after 48 hours, it looks like the usage increase is almost > linear. (Although logarithmic may still be the case after it exceeds a > gig or two of used memory. :) I'm not sure if this is something that I > should expect from the current gc, and when it would stop. > > Now, as I stated above, I am certain that at least some of this > increased memory usage is due to either un-collectable objects in the > python code, or memory leaks in the pyrex code (where I make some use of > malloc/free). I am working on finding and removing these issues, but > from what I've seen with the help of gc UNCOLLECTABLE traces, there are > not many un-collectable reference issues at least. Yes, there are some > but definitely not enough to justify growth like I am seeing. The pyrex > side should not be leaking too much, I'm very good about freeing what I > allocate in pyrex/C land. I will be running that linked to a memory leak > finding library in the next few days. Past the code reviews I've done, > what makes me think that I don't have any *wild* leaks going on at least > with the pyrex code is that I am seeing the same type of growth patterns > in both apps, and I don't use any pyrex with the director. Yes, the > harvester is consuming much more memory, but it also does the majority > of the heavy lifting. > > I am alright with the app not freeing all the memory it can between high > and low activity times, but what puzzles me is how the memory usage just > keeps on growing and growing. Will it ever stop? > > What I would like to know if others on this list have had similar > problems with python's gc in long running, larger python applications. > Am I crazy or is this a real problem with python's gc itself? If it's a > python gc issue, then it's my opinion that we will need to enhance the > gc before python can really gain leverage as a language suitable for > "enterprise-class" applications. I have surprised many other programmers > that I'm writing an application like this in python/pyrex that works > just as well and even more efficiently than the C/C++/Java competitors. > The only thing I have left to show is that the app lasts as long between > restarts. ;) > > > Robby > -- > http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list