On Thu, May 10, 2012 at 8:14 AM, Jabba Laci <[email protected]> wrote: > What's the best way?
>From what I've heard, http://scrapy.org/ . It is a single-thread single-process web crawler that nonetheless can download things concurrently. Doing what you want in Scrapy would probably involve learning about Twisted, the library Scrapy works on top of. This is somewhat more involved than just throwing threads and urllib and lxml.html together, although most of the Twisted developers are really helpful. It might not be worth it to you, depending on the size of the task. Dave's answer is pretty general and good though. -- Devin -- http://mail.python.org/mailman/listinfo/python-list
