Not sure if this one got out:



One more for this morning.

I thought Michael had suggested a while back that we could thread the
retriever to allow more than one page to be retrieved at a time.  Has any
work been done on this?  Since I'm fetching the pages at hotsync time, this
would be a great speed up.  I think it would only require modification of
spider::parse.  I'm thinking that instead of cycling through until the
queue is empty, we cycle through until all threads are finished and the
queue is empty.  There would be another thread running that would check the
queue for links to fetch.  It would spawn a new thread to fetch each link.
When each thread finishes, it would then dump the content on the main queue
for parsing.

Any ideas?  Can Python even spawn threads?



Reply via email to