John Nagle <na...@animats.com> writes:
> Analysis of each domain is
> performed in a separate process, but each process uses multiple
> threads to read process several web pages simultaneously.
>
>    Some of the threads go compute-bound for a second or two at a time as
> they parse web pages.  

You're probably better off using separate processes for the different
pages.  If I remember, you were using BeautifulSoup, which while very
cool, is pretty doggone slow for use on large volumes of pages.  I don't
know if there's much that can be done about that without going off on a
fairly messy C or C++ coding adventure.  Maybe someday someone will do
that.
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to