John Nagle <na...@animats.com> writes: > Analysis of each domain is > performed in a separate process, but each process uses multiple > threads to read process several web pages simultaneously. > > Some of the threads go compute-bound for a second or two at a time as > they parse web pages.
You're probably better off using separate processes for the different pages. If I remember, you were using BeautifulSoup, which while very cool, is pretty doggone slow for use on large volumes of pages. I don't know if there's much that can be done about that without going off on a fairly messy C or C++ coding adventure. Maybe someday someone will do that. -- http://mail.python.org/mailman/listinfo/python-list