Re: What are consequences of late curl_multi_perform call
Marcin Adamski wrote: I'm wondering what are consequences of late curl_multi_perform call. I guess that it may cause some timeouts to occur later than it should i.e. we set CURLOPT_TIMEOUT for 60s, but whole operation lasted 62s. But are there any significant consequences for ongoing transfer? Say we are downloading file via FTP and we call curl_multi_perform 3s too late. Can it be the reason for transfer failure? In my understanding all protocols that use TCP should handle this kind of delay. Am I right? Marcin Adamski Hi Daniel all, I'm just starting my first libcurl project and have the same kind of question. The application is collecting information from printers, primarily via SNMP. Unfortunately, printer manufactures aren't very open with a lot of information we want to collect, hence the need to engage in the ugly act of scraping information via HTTP. The program is a single threaded state machine, driven by a select() function. The curl_multi_fdset/curl_multi_perform architecture looks like it should drop in in BEAUTIFULLY! Like Marcin, I think, I'm interested in how vigorously I need to call curl_multi_perform. I'm not after high performance. I will have a lot of traffic happening simultaneously (particularly UDP) and I want to give processing the UDP SNMP messages priority. This is happening in a long-running daemon program (errr, Windows service.) If an HTTP scrape takes a little longer, I don't care. How lackadaisical can I be about calling curl_multi_perform? The 'perform man page says : Before version 7.20.0: If you receive CURLM_CALL_MULTI_PERFORM, this basically means that you should call curl_multi_perform again, before you select() on more actions. You don't have to do it immediately, but the return code means that libcurl may have more data available to return or that there may be more data to send off before it is satisfied. Do note that curl_multi_perform(3) will return CURLM_CALL_MULTI_PERFORM only when it wants to be called again immediately. When things are fine and there is nothing IMMEDIATELY it wants done, it'll return CURLM_OK and you need to wait for action and then call this function again. I'm trying to figure out what this really means. What about from 7.20.0 onward? The bolding of IMMEDIATELY makes it seem like one MUST (in the RFC sense) call 'perform if one gets CURLM_CALL_MULTI_PERFORM. Is this really so? I understand that it means 'perform has more work to do without going back to select(), but what if I have other things to do? What if I want to go back to the select() because I want to check for and process any new UDP traffic first? Is this harmful? Is this really more along the lines of an RFC MAY? For program timing, my select() timevals are always a second or less. So, if I can guarantee that 'perform will be called at least once a second, do I need to even mess with getting the timeout value from libcurl? I'm hoping to make a comment post along the lines of libcurl - a newcomers first impressions when I have time. For now, let me just say this: WOW! Cheers! Rich http://www.plustechnologies.com --- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.html
Re: What are consequences of late curl_multi_perform call
On Mon, 3 Oct 2011, Marcin Adamski wrote: I'm wondering what are consequences of late curl_multi_perform call. I guess that it may cause some timeouts to occur later than it should i.e. we set CURLOPT_TIMEOUT for 60s, but whole operation lasted 62s. But are there any significant consequences for ongoing transfer? Say we are downloading file via FTP and we call curl_multi_perform 3s too late. Can it be the reason for transfer failure? In my understanding all protocols that use TCP should handle this kind of delay. Am I right? Okay, let me first address your question(s) and then I'll see what is left to cover Rich's additional thoughts. Let me preface my explanation by saying that if you can figure out a wording and a place where we can insert this into the documentation to help users, please let me know! First, libcurl provides the curl_multi_timeout() function to help applications know when to call curl_multi_perform(). That's the longest time you should wait before you call libcurl again to make sure it can keep its internal timers accurate. libcurl has some timers and timeouts internally. libcurl itself bascially doesn't need to be called repeatedly to keep the timers accurate, that's more of a choice and effect that may be wanted by the application. There are some exceptions, like when doing UDP transfers (for example during c-ares name resolving or when doing TFTP transfers etc) then libcurl might need to handle packet retransmissions and during that time a very slow calling of curl_multi_perform() might hamper libcurl's ability to do good. In all cases where TCP based transfers are used, I can't think of any moment in time where it would matter much if you call curl_multi_perform() too late. The connections themselves and their flow control etc will be dealt with using TCP magic. If you wait a very long time, you may hit TCP_KEEPALIVE limits or just that there isn't any traffic on the connection that makes your NATs or firewalls to consider the connection dead. -- / daniel.haxx.se --- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.html
Re: What are consequences of late curl_multi_perform call
On Mon, 3 Oct 2011, Rich Gray wrote: Before version 7.20.0: If you receive CURLM_CALL_MULTI_PERFORM, this basically means that you should call curl_multi_perform again, before you select() on more actions. I'm trying to figure out what this really means. If you use a recent libcurl I think you should ignore the entire paragraph! It is basically trying to describe how to act when CURLM_CALL_MULTI_PERFORM is returned, and that return code is never used in modern libcurl versions. What about from 7.20.0 onward? The bolding of IMMEDIATELY makes it seem like one MUST (in the RFC sense) call 'perform if one gets CURLM_CALL_MULTI_PERFORM. Is this really so? I understand that it means 'perform has more work to do without going back to select(), but what if I have other things to do? What if I want to go back to the select() because I want to check for and process any new UDP traffic first? Is this harmful? Is this really more along the lines of an RFC MAY? The return code means that libcurl has more work to do that is already known, so that waiting for more actions on libcurl's sockets would be wrong as they may not yet indicate that something needs to be done. It was a confusing return code and a bad design choice to feature, which is why we've since removed it from use. For program timing, my select() timevals are always a second or less. So, if I can guarantee that 'perform will be called at least once a second, do I need to even mess with getting the timeout value from libcurl? No, then you'll be fine! Unless of course if you for some reason aim for sub-second resolution on timouts set to libcurl, but I think you figured that out already! =) -- / daniel.haxx.se --- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.html
Re: What are consequences of late curl_multi_perform call
Daniel Stenberg wrote: On Mon, 3 Oct 2011, Rich Gray wrote: Before version 7.20.0: If you receive CURLM_CALL_MULTI_PERFORM, this basically means that you should call curl_multi_perform again, before you select() on more actions. I'm trying to figure out what this really means. If you use a recent libcurl I think you should ignore the entire paragraph! It is basically trying to describe how to act when CURLM_CALL_MULTI_PERFORM is returned, and that return code is never used in modern libcurl versions. Ah, I hadn't picked up that it was an obsolete return code. I'll try to come up with some text for the man page and any other spots. For today, I need to get the multi code working. ;) I've already coded a simple test using the easy interface. For program timing, my select() timevals are always a second or less. So, if I can guarantee that 'perform will be called at least once a second, do I need to even mess with getting the timeout value from libcurl? No, then you'll be fine! Unless of course if you for some reason aim for sub-second resolution on timouts set to libcurl, but I think you figured that out already! =) Nah, my timeouts will probably be on the order of 10s of seconds, probably 60 or so (enough to allow TCP a retry.) I'll keep the UDP processing comments in mind, but I don't think they apply for this use. Cheers! Rich --- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.html