Jorgen Grahn wrote:
On Thu, 2010-01-07, Marco Salden wrote:
On Jan 6, 5:36 am, Philip Semanchuk <phi...@semanchuk.com> wrote:
On Jan 5, 2010, at 11:26 PM, aditya shukla wrote:

Hello people,
I have 5 directories corresponding 5 different urls .I want to download images from those urls and place them in the respective directories.I have to extract the contents and download them simultaneously.I can extract the contents and do then one by one. My questions is for doing it simultaneously
do I have to use threads?
No. You could spawn 5 copies of wget (or curl or a Python program that you've written). Whether or not that will perform better or be easier to code, debug and maintain depends on the other aspects of your program(s).

bye
Philip
Yep, the more easier and straightforward the approach, the better:
threads are always (programmers')-error-prone by nature.
But my question would be: does it REALLY need to be simultaneously:
the CPU/OS only has more overhead doing this in parallel with
processess. Measuring sequential processing and then trying to
optimize (e.g. for user response or whatever) would be my prefered way
to go. Less=More.

Normally when you do HTTP in parallell over several TCP sockets, it
has nothing to do with CPU overhead. You just don't want every GET to
be delayed just because the server(s) are lazy responding to the first
few ones; or you might want to read the text of a web page and the CSS
before a few huge pictures have been downloaded.

His "I have to [do them] simultaneously" makes me want to ask "Why?".

If he's expecting *many* pictures, I doubt that the parallel download
will buy him much.  Reusing the same TCP socket for all of them is
more likely to help, especially if the pictures aren't tiny. One
long-lived TCP connection is much more efficient than dozens of
short-lived ones.

Personally, I'd popen() wget and let it do the job for me.

From my own experience:

I wanted to download a number of webpages.

I noticed that there was a significant delay before it would reply, and
an especially long delay for one of them, so I used a number of threads,
each one reading a URL from a queue, performing the download, and then
reading the next URL, until there were none left (actually, until it
read the sentinel None, which it put back for the other threads).

The result?

Shorter total download time because it could be downloading one webpage
while waiting for another to reply.

(Of course, I had to make sure that I didn't have too many threads,
because that might've put too many demands on the website, not a nice
thing to do!)
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to