Re: What are consequences of late curl_multi_perform call

2011-10-03 Thread Rich Gray

Marcin Adamski wrote:

I'm wondering what are consequences of late curl_multi_perform call. I guess 
that it may cause some timeouts to occur later than it should i.e. we set 
CURLOPT_TIMEOUT for 60s, but whole operation lasted 62s. But are there any 
significant consequences for ongoing transfer? Say we are downloading file via 
FTP and we call curl_multi_perform 3s too late. Can it be the reason for 
transfer failure? In my understanding all protocols that use TCP should handle 
this kind of delay. Am I right?

Marcin Adamski



Hi Daniel  all,

I'm just starting my first libcurl project and have the same kind of 
question.  The application is collecting information from printers, 
primarily via SNMP.  Unfortunately, printer manufactures aren't very 
open with a lot of information we want to collect, hence the need to 
engage in the ugly act of scraping information via HTTP.  The program is 
a single threaded state machine, driven by a select() function.  The 
curl_multi_fdset/curl_multi_perform architecture looks like it should 
drop in in BEAUTIFULLY!


Like Marcin, I think, I'm interested in how vigorously I need to call 
curl_multi_perform.  I'm not after high performance.  I will have a lot 
of traffic happening simultaneously (particularly UDP) and I want to 
give processing the UDP SNMP messages priority.  This is happening in a 
long-running daemon program (errr, Windows service.) If an HTTP scrape 
takes a little longer, I don't care.  How lackadaisical can I be about 
calling curl_multi_perform?


The 'perform man page says :
Before version 7.20.0: If you receive CURLM_CALL_MULTI_PERFORM, this 
basically means that you should call curl_multi_perform again, before 
you select() on more actions. You don't have to do it immediately, but 
the return code means that libcurl may have more data available to 
return or that there may be more data to send off before it is 
satisfied. Do note that curl_multi_perform(3) will return 
CURLM_CALL_MULTI_PERFORM only when it wants to be called again 
immediately. When things are fine and there is nothing IMMEDIATELY it 
wants done, it'll return CURLM_OK and you need to wait for action and 
then call this function again. 


I'm trying to figure out what this really means.  What about from 7.20.0 
onward?  The bolding of IMMEDIATELY makes it seem like one MUST (in the 
RFC sense) call 'perform if one gets CURLM_CALL_MULTI_PERFORM.  Is this 
really so?  I understand that it means 'perform has more work to do 
without going back to select(), but what if I have other things to do? 
 What if I want to go back to the select() because I want to check for 
and process any new UDP traffic first?  Is this harmful?  Is this really 
more along the lines of an RFC MAY?


For program timing, my select() timevals are always a second or less. 
So, if I can guarantee that 'perform will be called at least once a 
second, do I need to even mess with getting the timeout value from libcurl?



I'm hoping to make a comment post along the lines of libcurl - a 
newcomers first impressions when I have time.  For now, let me just 
say this: WOW!


Cheers!
Rich
http://www.plustechnologies.com

---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html


Re: What are consequences of late curl_multi_perform call

2011-10-03 Thread Daniel Stenberg

On Mon, 3 Oct 2011, Marcin Adamski wrote:

I'm wondering what are consequences of late curl_multi_perform call. I guess 
that it may cause some timeouts to occur later than it should i.e. we set 
CURLOPT_TIMEOUT for 60s, but whole operation lasted 62s. But are there any 
significant consequences for ongoing transfer? Say we are downloading file 
via FTP and we call curl_multi_perform 3s too late. Can it be the reason for 
transfer failure? In my understanding all protocols that use TCP should 
handle this kind of delay. Am I right?


Okay, let me first address your question(s) and then I'll see what is left to 
cover Rich's additional thoughts. Let me preface my explanation by saying that 
if you can figure out a wording and a place where we can insert this into the 
documentation to help users, please let me know!


First, libcurl provides the curl_multi_timeout() function to help applications 
know when to call curl_multi_perform(). That's the longest time you should 
wait before you call libcurl again to make sure it can keep its internal 
timers accurate.


libcurl has some timers and timeouts internally. libcurl itself bascially 
doesn't need to be called repeatedly to keep the timers accurate, that's more 
of a choice and effect that may be wanted by the application. There are some 
exceptions, like when doing UDP transfers (for example during c-ares name 
resolving or when doing TFTP transfers etc) then libcurl might need to handle 
packet retransmissions and during that time a very slow calling of 
curl_multi_perform() might hamper libcurl's ability to do good.


In all cases where TCP based transfers are used, I can't think of any moment 
in time where it would matter much if you call curl_multi_perform() too 
late. The connections themselves and their flow control etc will be dealt 
with using TCP magic. If you wait a very long time, you may hit TCP_KEEPALIVE 
limits or just that there isn't any traffic on the connection that makes your 
NATs or firewalls to consider the connection dead.


--

 / daniel.haxx.se
---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html


Re: What are consequences of late curl_multi_perform call

2011-10-03 Thread Daniel Stenberg

On Mon, 3 Oct 2011, Rich Gray wrote:

Before version 7.20.0: If you receive CURLM_CALL_MULTI_PERFORM, this 
basically means that you should call curl_multi_perform again, before you 
select() on more actions.



I'm trying to figure out what this really means.


If you use a recent libcurl I think you should ignore the entire paragraph! It 
is basically trying to describe how to act when CURLM_CALL_MULTI_PERFORM is 
returned, and that return code is never used in modern libcurl versions.


What about from 7.20.0 onward?  The bolding of IMMEDIATELY makes it seem 
like one MUST (in the RFC sense) call 'perform if one gets 
CURLM_CALL_MULTI_PERFORM.  Is this really so?  I understand that it means 
'perform has more work to do without going back to select(), but what if I 
have other things to do?  What if I want to go back to the select() because 
I want to check for and process any new UDP traffic first?  Is this harmful? 
Is this really more along the lines of an RFC MAY?


The return code means that libcurl has more work to do that is already known, 
so that waiting for more actions on libcurl's sockets would be wrong as they 
may not yet indicate that something needs to be done.


It was a confusing return code and a bad design choice to feature, which is 
why we've since removed it from use.


For program timing, my select() timevals are always a second or less. So, if 
I can guarantee that 'perform will be called at least once a second, do I 
need to even mess with getting the timeout value from libcurl?


No, then you'll be fine! Unless of course if you for some reason aim for 
sub-second resolution on timouts set to libcurl, but I think you figured that 
out already! =)


--

 / daniel.haxx.se
---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html


Re: What are consequences of late curl_multi_perform call

2011-10-03 Thread Rich Gray

Daniel Stenberg wrote:

On Mon, 3 Oct 2011, Rich Gray wrote:

Before version 7.20.0: If you receive CURLM_CALL_MULTI_PERFORM, this 
basically means that you should call curl_multi_perform again, before 
you select() on more actions.

I'm trying to figure out what this really means.


If you use a recent libcurl I think you should ignore the entire 
paragraph! It is basically trying to describe how to act when 
CURLM_CALL_MULTI_PERFORM is returned, and that return code is never 
used in modern libcurl versions.
Ah, I hadn't picked up that it was an obsolete return code.  I'll try to 
come up with some text for the man page and any other spots.  For today, 
I need to get the multi code working. ;)   I've already coded a simple 
test using the easy interface.
For program timing, my select() timevals are always a second or less. 
So, if I can guarantee that 'perform will be called at least once a 
second, do I need to even mess with getting the timeout value from 
libcurl?


No, then you'll be fine! Unless of course if you for some reason aim 
for sub-second resolution on timouts set to libcurl, but I think you 
figured that out already! =)
Nah, my timeouts will probably be on the order of 10s of seconds, 
probably 60 or so (enough to allow TCP a retry.)   I'll keep the UDP 
processing comments in mind, but I don't think they apply for this use.


Cheers!
Rich


---
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html