Re: Do I have to use threads?

2010-01-13 Thread Tom
On Jan 7, 5:38 pm, MRAB pyt...@mrabarnett.plus.com wrote:
 Jorgen Grahn wrote:
  On Thu, 2010-01-07, Marco Salden wrote:
  On Jan 6, 5:36 am, Philip Semanchuk phi...@semanchuk.com wrote:
  On Jan 5, 2010, at 11:26 PM, aditya shukla wrote:

  Hello people,
  I have 5 directories corresponding 5  different urls .I want to  
  download
  images from those urls and place them in the respective  
  directories.I have
  to extract the contents and download them simultaneously.I can  
  extract the
  contents and do then one by one. My questions is for doing it  
  simultaneously
  do I have to use threads?
  No. You could spawn 5 copies of wget (or curl or a Python program that  
  you've written). Whether or not that will perform better or be easier  
  to code, debug and maintain depends on the other aspects of your  
  program(s).

  bye
  Philip
  Yep, the more easier and straightforward the approach, the better:
  threads are always (programmers')-error-prone by nature.
  But my question would be: does it REALLY need to be simultaneously:
  the CPU/OS only has more overhead doing this in parallel with
  processess. Measuring sequential processing and then trying to
  optimize (e.g. for user response or whatever) would be my prefered way
  to go. Less=More.

  Normally when you do HTTP in parallell over several TCP sockets, it
  has nothing to do with CPU overhead. You just don't want every GET to
  be delayed just because the server(s) are lazy responding to the first
  few ones; or you might want to read the text of a web page and the CSS
  before a few huge pictures have been downloaded.

  His I have to [do them] simultaneously makes me want to ask Why?.

  If he's expecting *many* pictures, I doubt that the parallel download
  will buy him much.  Reusing the same TCP socket for all of them is
  more likely to help, especially if the pictures aren't tiny. One
  long-lived TCP connection is much more efficient than dozens of
  short-lived ones.

  Personally, I'd popen() wget and let it do the job for me.

  From my own experience:

 I wanted to download a number of webpages.

 I noticed that there was a significant delay before it would reply, and
 an especially long delay for one of them, so I used a number of threads,
 each one reading a URL from a queue, performing the download, and then
 reading the next URL, until there were none left (actually, until it
 read the sentinel None, which it put back for the other threads).

 The result?

 Shorter total download time because it could be downloading one webpage
 while waiting for another to reply.

 (Of course, I had to make sure that I didn't have too many threads,
 because that might've put too many demands on the website, not a nice
 thing to do!)

A fair few of my scripts require multiple uploads and downloads, and I
always use threads to do so. I was using an API which was quite badly
designed, and I got a list of UserId's from one API call then had to
query another API method to get info on each of the UserId's I got
from the first API. I could have used twisted, but in the end I just
made a simple thread pool (30 threads and an in/out Queue). The
result? A *massive* speedup, even with the extra complications of
waiting until all the threads are done then grouping the results
together from the output Queue.

Since then I always use native threads.

Tom
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Do I have to use threads?

2010-01-08 Thread Jorgen Grahn
On Wed, 2010-01-06, Gary Herron wrote:
 aditya shukla wrote:
 Hello people,

 I have 5 directories corresponding 5  different urls .I want to 
 download images from those urls and place them in the respective 
 directories.I have to extract the contents and download them 
 simultaneously.I can extract the contents and do then one by one. My 
 questions is for doing it simultaneously do I have to use threads?

 Please point me in the right direction.


 Thanks

 Aditya

 You've been given some bad advice here.

 First -- threads are lighter-weight than processes, so threads are 
 probably *more* efficient.  However, with only five thread/processes, 
 the difference is probably not noticeable.(If the prejudice against 
 threads comes from concerns over the GIL -- that also is a misplaced 
 concern in this instance.  Since you only have network connection, you 
 will receive only one packet at a time, so only one thread will be 
 active at a time.   If the extraction process uses a significant enough 
 amount of CPU time

I wonder what that extraction would be, by the way.  Unless you ask
for compression of the HTTP data, the images come as-is on the TCP
stream.

 so that the extractions are all running at the same 
 time *AND* if you are running on a machine with separate CPU/cores *AND* 
 you would like the extractions to be running truly in parallel on those 
 separate cores,  *THEN*, and only then, will processes be more efficient 
 than threads.)

I can't remember what the bad advice was, but here processes versus
threads clearly doesn't matter performance-wise.  I generally
recommend processes, because how they work is well-known and they're
not as vulnerable to weird synchronization bugs as threads.

 Second, running 5 wgets is equivalent to 5 processes not 5 threads.

 And third -- you don't have to use either threads *or* processes.  There 
 is another possibility which is much more light-weight:  asynchronous 
 I/O,  available through the low level select module, or more usefully 
 via the higher-level asyncore module.

Yeah, that would be my first choice too for a problem which isn't
clearly CPU-bound.  Or my second choice -- the first would be calling
on a utility like wget(1).

/Jorgen

-- 
  // Jorgen Grahn grahn@  Oo  o.   .  .
\X/ snipabacken.se   O  o   .
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Do I have to use threads?

2010-01-08 Thread r0g
Marco Salden wrote:
 On Jan 6, 5:36 am, Philip Semanchuk phi...@semanchuk.com wrote:
 On Jan 5, 2010, at 11:26 PM, aditya shukla wrote:

 Hello people,
 I have 5 directories corresponding 5  different urls .I want to  
 download
 images from those urls and place them in the respective  
 directories.I have
 to extract the contents and download them simultaneously.I can  
 extract the
 contents and do then one by one. My questions is for doing it  
 simultaneously
 do I have to use threads?
 No. You could spawn 5 copies of wget (or curl or a Python program that  
 you've written). Whether or not that will perform better or be easier  
 to code, debug and maintain depends on the other aspects of your  
 program(s).

 bye
 Philip
 
 Yep, the more easier and straightforward the approach, the better:
 threads are always (programmers')-error-prone by nature.
 But my question would be: does it REALLY need to be simultaneously:
 the CPU/OS only has more overhead doing this in parallel with
 processess. Measuring sequential processing and then trying to
 optimize (e.g. for user response or whatever) would be my prefered way
 to go. Less=More.
 
 regards,
 Marco



Threads aren't as hard a some people make out although it does depend on
the problem. If your processes are effectively independent then threads
are probably the right solution. You can turn any function into a thread
quite easily, I posted a function for this a while back...

http://groups.google.com/group/comp.lang.python/msg/3361a897db3834b4?dmode=source

Also it's often a good idea to build in a flag that switches your app
from multi threaded to single threaded as it's easier to debug the latter.

Roger.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Do I have to use threads?

2010-01-07 Thread Marco Salden
On Jan 6, 5:36 am, Philip Semanchuk phi...@semanchuk.com wrote:
 On Jan 5, 2010, at 11:26 PM, aditya shukla wrote:

  Hello people,

  I have 5 directories corresponding 5  different urls .I want to  
  download
  images from those urls and place them in the respective  
  directories.I have
  to extract the contents and download them simultaneously.I can  
  extract the
  contents and do then one by one. My questions is for doing it  
  simultaneously
  do I have to use threads?

 No. You could spawn 5 copies of wget (or curl or a Python program that  
 you've written). Whether or not that will perform better or be easier  
 to code, debug and maintain depends on the other aspects of your  
 program(s).

 bye
 Philip

Yep, the more easier and straightforward the approach, the better:
threads are always (programmers')-error-prone by nature.
But my question would be: does it REALLY need to be simultaneously:
the CPU/OS only has more overhead doing this in parallel with
processess. Measuring sequential processing and then trying to
optimize (e.g. for user response or whatever) would be my prefered way
to go. Less=More.

regards,
Marco
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Do I have to use threads?

2010-01-07 Thread Jorgen Grahn
On Thu, 2010-01-07, Marco Salden wrote:
 On Jan 6, 5:36 am, Philip Semanchuk phi...@semanchuk.com wrote:
 On Jan 5, 2010, at 11:26 PM, aditya shukla wrote:

  Hello people,

  I have 5 directories corresponding 5  different urls .I want to  
  download
  images from those urls and place them in the respective  
  directories.I have
  to extract the contents and download them simultaneously.I can  
  extract the
  contents and do then one by one. My questions is for doing it  
  simultaneously
  do I have to use threads?

 No. You could spawn 5 copies of wget (or curl or a Python program that  
 you've written). Whether or not that will perform better or be easier  
 to code, debug and maintain depends on the other aspects of your  
 program(s).

 bye
 Philip

 Yep, the more easier and straightforward the approach, the better:
 threads are always (programmers')-error-prone by nature.
 But my question would be: does it REALLY need to be simultaneously:
 the CPU/OS only has more overhead doing this in parallel with
 processess. Measuring sequential processing and then trying to
 optimize (e.g. for user response or whatever) would be my prefered way
 to go. Less=More.

Normally when you do HTTP in parallell over several TCP sockets, it
has nothing to do with CPU overhead. You just don't want every GET to
be delayed just because the server(s) are lazy responding to the first
few ones; or you might want to read the text of a web page and the CSS
before a few huge pictures have been downloaded.

His I have to [do them] simultaneously makes me want to ask Why?.

If he's expecting *many* pictures, I doubt that the parallel download
will buy him much.  Reusing the same TCP socket for all of them is
more likely to help, especially if the pictures aren't tiny. One
long-lived TCP connection is much more efficient than dozens of
short-lived ones.

Personally, I'd popen() wget and let it do the job for me.

/Jorgen

-- 
  // Jorgen Grahn grahn@  Oo  o.   .  .
\X/ snipabacken.se   O  o   .
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Do I have to use threads?

2010-01-07 Thread MRAB

Jorgen Grahn wrote:

On Thu, 2010-01-07, Marco Salden wrote:

On Jan 6, 5:36 am, Philip Semanchuk phi...@semanchuk.com wrote:

On Jan 5, 2010, at 11:26 PM, aditya shukla wrote:


Hello people,
I have 5 directories corresponding 5  different urls .I want to  
download
images from those urls and place them in the respective  
directories.I have
to extract the contents and download them simultaneously.I can  
extract the
contents and do then one by one. My questions is for doing it  
simultaneously

do I have to use threads?
No. You could spawn 5 copies of wget (or curl or a Python program that  
you've written). Whether or not that will perform better or be easier  
to code, debug and maintain depends on the other aspects of your  
program(s).


bye
Philip

Yep, the more easier and straightforward the approach, the better:
threads are always (programmers')-error-prone by nature.
But my question would be: does it REALLY need to be simultaneously:
the CPU/OS only has more overhead doing this in parallel with
processess. Measuring sequential processing and then trying to
optimize (e.g. for user response or whatever) would be my prefered way
to go. Less=More.


Normally when you do HTTP in parallell over several TCP sockets, it
has nothing to do with CPU overhead. You just don't want every GET to
be delayed just because the server(s) are lazy responding to the first
few ones; or you might want to read the text of a web page and the CSS
before a few huge pictures have been downloaded.

His I have to [do them] simultaneously makes me want to ask Why?.

If he's expecting *many* pictures, I doubt that the parallel download
will buy him much.  Reusing the same TCP socket for all of them is
more likely to help, especially if the pictures aren't tiny. One
long-lived TCP connection is much more efficient than dozens of
short-lived ones.

Personally, I'd popen() wget and let it do the job for me.


From my own experience:

I wanted to download a number of webpages.

I noticed that there was a significant delay before it would reply, and
an especially long delay for one of them, so I used a number of threads,
each one reading a URL from a queue, performing the download, and then
reading the next URL, until there were none left (actually, until it
read the sentinel None, which it put back for the other threads).

The result?

Shorter total download time because it could be downloading one webpage
while waiting for another to reply.

(Of course, I had to make sure that I didn't have too many threads,
because that might've put too many demands on the website, not a nice
thing to do!)
--
http://mail.python.org/mailman/listinfo/python-list


Re: Do I have to use threads?

2010-01-07 Thread Philip Semanchuk


On Jan 7, 2010, at 11:32 AM, Jorgen Grahn wrote:


On Thu, 2010-01-07, Marco Salden wrote:

On Jan 6, 5:36 am, Philip Semanchuk phi...@semanchuk.com wrote:

On Jan 5, 2010, at 11:26 PM, aditya shukla wrote:


Hello people,



I have 5 directories corresponding 5  different urls .I want to
download
images from those urls and place them in the respective
directories.I have
to extract the contents and download them simultaneously.I can
extract the
contents and do then one by one. My questions is for doing it
simultaneously
do I have to use threads?


No. You could spawn 5 copies of wget (or curl or a Python program  
that
you've written). Whether or not that will perform better or be  
easier

to code, debug and maintain depends on the other aspects of your
program(s).

bye
Philip


Yep, the more easier and straightforward the approach, the better:
threads are always (programmers')-error-prone by nature.
But my question would be: does it REALLY need to be simultaneously:
the CPU/OS only has more overhead doing this in parallel with
processess. Measuring sequential processing and then trying to
optimize (e.g. for user response or whatever) would be my prefered  
way

to go. Less=More.


Normally when you do HTTP in parallell over several TCP sockets, it
has nothing to do with CPU overhead. You just don't want every GET to
be delayed just because the server(s) are lazy responding to the first
few ones; or you might want to read the text of a web page and the CSS
before a few huge pictures have been downloaded.

His I have to [do them] simultaneously makes me want to ask Why?.


Exactly what I was thinking. He's surely doing something more  
complicated than his post suggests, and without that detail it's  
impossible to say whether threads, processes, asynch or voodoo is the  
best approach.



bye
P


--
http://mail.python.org/mailman/listinfo/python-list


Re: Do I have to use threads?

2010-01-06 Thread Philip Semanchuk


On Jan 6, 2010, at 12:45 AM, Brian J Mingus wrote:

On Tue, Jan 5, 2010 at 9:36 PM, Philip Semanchuk  
phi...@semanchuk.comwrote:




On Jan 5, 2010, at 11:26 PM, aditya shukla wrote:

Hello people,


I have 5 directories corresponding 5  different urls .I want to  
download
images from those urls and place them in the respective  
directories.I have
to extract the contents and download them simultaneously.I can  
extract the

contents and do then one by one. My questions is for doing it
simultaneously
do I have to use threads?



No. You could spawn 5 copies of wget (or curl or a Python program  
that
you've written). Whether or not that will perform better or be  
easier to
code, debug and maintain depends on the other aspects of your  
program(s).


bye
Philip



Obviously, spawning 5 copies of wget is equivalent to starting 5  
threads.

The answer is 'yes'.


???

Process != thread


--
http://mail.python.org/mailman/listinfo/python-list


Re: Do I have to use threads?

2010-01-06 Thread Brian J Mingus
On Wed, Jan 6, 2010 at 6:24 AM, Philip Semanchuk phi...@semanchuk.comwrote:


 On Jan 6, 2010, at 12:45 AM, Brian J Mingus wrote:

  On Tue, Jan 5, 2010 at 9:36 PM, Philip Semanchuk phi...@semanchuk.com
 wrote:


 On Jan 5, 2010, at 11:26 PM, aditya shukla wrote:

 Hello people,


 I have 5 directories corresponding 5  different urls .I want to download
 images from those urls and place them in the respective directories.I
 have
 to extract the contents and download them simultaneously.I can extract
 the
 contents and do then one by one. My questions is for doing it
 simultaneously
 do I have to use threads?


 No. You could spawn 5 copies of wget (or curl or a Python program that
 you've written). Whether or not that will perform better or be easier to
 code, debug and maintain depends on the other aspects of your program(s).

 bye
 Philip



 Obviously, spawning 5 copies of wget is equivalent to starting 5 threads.
 The answer is 'yes'.


 ???

 Process != thread


Just like the other nitpicker it is up to you to explain why the
differences, and not he similarities, are relevant to this problem.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Do I have to use threads?

2010-01-06 Thread exarkun

On 04:26 am, adityashukla1...@gmail.com wrote:

Hello people,

I have 5 directories corresponding 5  different urls .I want to 
download
images from those urls and place them in the respective directories.I 
have
to extract the contents and download them simultaneously.I can extract 
the
contents and do then one by one. My questions is for doing it 
simultaneously

do I have to use threads?

Please point me in the right direction.


See Twisted,

 http://twistedmatrix.com/

in particular, Twisted Web's asynchronous HTTP client,

 http://twistedmatrix.com/documents/current/web/howto/client.html
 http://twistedmatrix.com/documents/current/api/twisted.web.client.html

Jean-Paul
--
http://mail.python.org/mailman/listinfo/python-list


Do I have to use threads?

2010-01-05 Thread aditya shukla
Hello people,

I have 5 directories corresponding 5  different urls .I want to download
images from those urls and place them in the respective directories.I have
to extract the contents and download them simultaneously.I can extract the
contents and do then one by one. My questions is for doing it simultaneously
do I have to use threads?

Please point me in the right direction.


Thanks

Aditya
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Do I have to use threads?

2010-01-05 Thread Philip Semanchuk


On Jan 5, 2010, at 11:26 PM, aditya shukla wrote:


Hello people,

I have 5 directories corresponding 5  different urls .I want to  
download
images from those urls and place them in the respective  
directories.I have
to extract the contents and download them simultaneously.I can  
extract the
contents and do then one by one. My questions is for doing it  
simultaneously

do I have to use threads?


No. You could spawn 5 copies of wget (or curl or a Python program that  
you've written). Whether or not that will perform better or be easier  
to code, debug and maintain depends on the other aspects of your  
program(s).


bye
Philip

--
http://mail.python.org/mailman/listinfo/python-list


Re: Do I have to use threads?

2010-01-05 Thread Rodrick Brown
On Tue, Jan 5, 2010 at 11:26 PM, aditya shukla
adityashukla1...@gmail.comwrote:

 Hello people,

 I have 5 directories corresponding 5  different urls .I want to download
 images from those urls and place them in the respective directories.I have
 to extract the contents and download them simultaneously.I can extract the
 contents and do then one by one. My questions is for doing it simultaneously
 do I have to use threads?

 Please point me in the right direction.

 Threads in python are very easy to work with but not very efficient and for
most cases slower than running multiple processes. Look at
using multiple processes instead of going with threads performance will be
much better.


 Thanks

 Aditya

 --
 http://mail.python.org/mailman/listinfo/python-list




-- 
[ Rodrick R. Brown ]
http://www.rodrickbrown.com http://www.linkedin.com/in/rodrickbrown
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Do I have to use threads?

2010-01-05 Thread Brian J Mingus
On Tue, Jan 5, 2010 at 9:36 PM, Philip Semanchuk phi...@semanchuk.comwrote:


 On Jan 5, 2010, at 11:26 PM, aditya shukla wrote:

  Hello people,

 I have 5 directories corresponding 5  different urls .I want to download
 images from those urls and place them in the respective directories.I have
 to extract the contents and download them simultaneously.I can extract the
 contents and do then one by one. My questions is for doing it
 simultaneously
 do I have to use threads?


 No. You could spawn 5 copies of wget (or curl or a Python program that
 you've written). Whether or not that will perform better or be easier to
 code, debug and maintain depends on the other aspects of your program(s).

 bye
 Philip


Obviously, spawning 5 copies of wget is equivalent to starting 5 threads.
The answer is 'yes'.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Do I have to use threads?

2010-01-05 Thread aditya shukla
Thanks.i will look into multiprocessing.


Aditya
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Do I have to use threads?

2010-01-05 Thread Gary Herron

aditya shukla wrote:

Hello people,

I have 5 directories corresponding 5  different urls .I want to 
download images from those urls and place them in the respective 
directories.I have to extract the contents and download them 
simultaneously.I can extract the contents and do then one by one. My 
questions is for doing it simultaneously do I have to use threads?


Please point me in the right direction.


Thanks

Aditya


You've been given some bad advice here.

First -- threads are lighter-weight than processes, so threads are 
probably *more* efficient.  However, with only five thread/processes, 
the difference is probably not noticeable.(If the prejudice against 
threads comes from concerns over the GIL -- that also is a misplaced 
concern in this instance.  Since you only have network connection, you 
will receive only one packet at a time, so only one thread will be 
active at a time.   If the extraction process uses a significant enough 
amount of CPU time so that the extractions are all running at the same 
time *AND* if you are running on a machine with separate CPU/cores *AND* 
you would like the extractions to be running truly in parallel on those 
separate cores,  *THEN*, and only then, will processes be more efficient 
than threads.)


Second, running 5 wgets is equivalent to 5 processes not 5 threads.

And third -- you don't have to use either threads *or* processes.  There 
is another possibility which is much more light-weight:  asynchronous 
I/O,  available through the low level select module, or more usefully 
via the higher-level asyncore module.  (Although the learning curve 
might trip you up, and some people find the programming model for 
asyncore hard to fathom,  I find it more intuitive in this case than 
threads/processes.)


In fact, the asyncore manual page has a ~20 line class which implements 
a web page retrieval.  You could replace that example's single call to 
http_client with five calls, one for each of your ULRs.  Then when you 
enter the last line (that is the asyncore.loop() call) the five  will be 
downloading simultaneously.


See http://docs.python.org/library/asyncore.html

Gary Herron

--
http://mail.python.org/mailman/listinfo/python-list