I am writing a screen scraping application using BeautifulSoup:

http://www.crummy.com/software/BeautifulSoup/

(which is fantastic, by the way).

I have an object that has two methods, each of which loads an HTML document and 
scrapes out some information, putting strings from the HTML documents into 
lists and dictionaries. I have a set of these objects from which I am 
aggregating and returning data. 

With a large number of these objects, the memory footprint is very large. The 
"soup" object is a local variable to each scraping method, so I assumed it 
would be cleaned up after the method had returned.  However, I've found that 
using guppy, after the methods have returned most of the memory is being taken 
up with BeautifulSoup objects of one type or another. I'm not declaring 
BeautifulSoup objects anywhere else.

I've tried assigning None into the "soup" objects at the end of the method 
calls and calling garbage collection manually, but this doesn't seem to help. 
I'd like to find out exactly what object "owns" the various BeautifulSoup 
structures, but I'm quite a new guppy user and I can't figure out how to do 
this.

How do I force the memory for these soup objects to be freed? Is there antyhing 
else I should be looking at to find out the cause of these problems?

Peter
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to