Re: Overcoming python performance penalty for multicore CPU

Stefan Behnel Mon, 08 Feb 2010 01:19:45 -0800

Paul Rubin, 04.02.2010 02:51:
> John Nagle writes:
>> Analysis of each domain is
>> performed in a separate process, but each process uses multiple
>> threads to read process several web pages simultaneously.
>>
>>    Some of the threads go compute-bound for a second or two at a time as
>> they parse web pages.  
> 
> You're probably better off using separate processes for the different
> pages.  If I remember, you were using BeautifulSoup, which while very
> cool, is pretty doggone slow for use on large volumes of pages.  I don't
> know if there's much that can be done about that without going off on a
> fairly messy C or C++ coding adventure.  Maybe someday someone will do
> that.


Well, if multi-core performance is so important here, then there's a pretty
simple thing the OP can do: switch to lxml.

http://blog.ianbicking.org/2008/03/30/python-html-parser-performance/

Stefan
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Overcoming python performance penalty for multicore CPU

Reply via email to