Re: wget default behavior [was Re: working on patch to limit to percent of bandwidth]
Tony Godshall wrote: If it was me, I'd have it default to backing off to 95% by default and have options for more aggressive behavior, like the multiple connections, etc. I don't like a default back-off rule. I often encounter downloads with often changing download speeds. The idea that the first few seconds I only have a quite bad speed and could get much more out of it is just not satisfying. I'm surprised multiple connections would buy you anything, though. I guess I'll take a look through the archives and see what the argument is. Does one tcp connection back off on a lost packet and the other one gets to keep going? Hmmm. I guess you get improvements if e.g. on your side you have more free bandwidth than on the source-side. Having two connections than means, that you get almost twice the download speed, because you have two connections competing for free bandwidth and ideally every connection made two the sever is equally fast. So in cases, where you are the only one connecting, you probably win nothing. Greetings Matthias
Re: wget default behavior [was Re: working on patch to limit to percent of bandwidth]
On 10/17/07, Matthias Vill [EMAIL PROTECTED] wrote: Tony Godshall wrote: If it was me, I'd have it default to backing off to 95% by default and have options for more aggressive behavior, like the multiple connections, etc. I don't like a default back-off rule. I often encounter downloads with often changing download speeds. The idea that the first few seconds I only have a quite bad speed and could get much more out of it is just not satisfying. You might be surprised, but I totally agree with you. A default backoff rule would only make sense if the measuring was better. E.g. a periodic ramp-up/back-off behavior to achieve 95% of the maximum measured rate. I'm surprised multiple connections would buy you anything, though. I guess I'll take a look through the archives and see what the argument is. Does one tcp connection back off on a lost packet and the other one gets to keep going? Hmmm. I guess you get improvements if e.g. on your side you have more free bandwidth than on the source-side. Having two connections than means, that you get almost twice the download speed, because you have two connections competing for free bandwidth and ideally every connection made two the sever is equally fast. So in cases, where you are the only one connecting, you probably win nothing. Ah, I get it. People want to defeat sender rate-limiting or other QOS controls. The opposite of nice. We could call it --mean-mode. Or --meanness n, where n=2 means I want to have two threads/connections, i.e. twice as mean as the default. No, well, actually, I guess there can be cases where bad upstream configurations result in a situation where more connections don't necessarily mean one is taking more than one's fair share of bandwidth, but I bet this option will be result in more harm than good. Perhaps it should be one of those things that one can do oneself if one must but is generally frowned upon (like making a version of wget that ignores robots.txt). TG
Re: wget default behavior [was Re: working on patch to limit to percent of bandwidth]
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 (Accidentally sent private reply). Tony Godshall wrote: On 10/17/07, Matthias Vill [EMAIL PROTECTED] wrote: Tony Godshall wrote: If it was me, I'd have it default to backing off to 95% by default and have options for more aggressive behavior, like the multiple connections, etc. I don't like a default back-off rule. I often encounter downloads with often changing download speeds. The idea that the first few seconds I only have a quite bad speed and could get much more out of it is just not satisfying. You might be surprised, but I totally agree with you. A default backoff rule would only make sense if the measuring was better. E.g. a periodic ramp-up/back-off behavior to achieve 95% of the maximum measured rate. I'm surprised multiple connections would buy you anything, though. I guess I'll take a look through the archives and see what the argument is. Does one tcp connection back off on a lost packet and the other one gets to keep going? Hmmm. I guess you get improvements if e.g. on your side you have more free bandwidth than on the source-side. Having two connections than means, that you get almost twice the download speed, because you have two connections competing for free bandwidth and ideally every connection made two the sever is equally fast. So in cases, where you are the only one connecting, you probably win nothing. Ah, I get it. People want to defeat sender rate-limiting or other QOS controls. The opposite of nice. We could call it --mean-mode. Or --meanness n, where n=2 means I want to have two threads/connections, i.e. twice as mean as the default. Oh, think you misunderstand. I have no intention of providing such a thing. That's what download accelerators are for, and as much as some people may want Wget to be one, I'm against it. However, multiple simultaneous connections to _different_ hosts, could be very beneficial, as latency for one server won't mean we sit around waiting for it before downloading from others. And, up to two connections to the same host will also be supported: but probably only for separate downloads (that way, we can be sending requests on one connection while we're downloading on another). The HTTP spec says that clients _should_ have a maximum of two connections to any one host, so we appear to be justified to do that. However, it will absolutely not be done by default. Among other things, multiple connections will destroy the way we currently do logging, which in and of itself is a good reason not to do it, apart from niceness. No, well, actually, I guess there can be cases where bad upstream configurations result in a situation where more connections don't necessarily mean one is taking more than one's fair share of bandwidth, but I bet this option will be result in more harm than good. Perhaps it should be one of those things that one can do oneself if one must but is generally frowned upon (like making a version of wget that ignores robots.txt). You do know that Wget already can be configured to ignore robots.txt, right? Yeah, I'm already cringing at the idea that people will alter the two connections per host limit to higher values. Even if we limit it to _one_ per host, though, as long as we're including support for multiple connections of any sort, it'd be easy to modify Wget to allow them for the same host. And multiple connections to multiple hosts will be obviously beneficial, to avoid bottlenecks and the like. Plus, the planned (plugged-in) support for Metalink, using multiple connections to different hosts to obtain the _same_ file, could be very nice for large and/or very popular downloads. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHFmYO7M8hyUobTrERCEDOAJ0T/a70fraMdvQMGgIGPl2XXoprHACcDlI0 eqx0DfnsJ+NUAkzJhUMQS68= =EGay -END PGP SIGNATURE-
Re: wget default behavior [was Re: working on patch to limit to percent of bandwidth]
On 10/13/07, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 10/13/07, Tony Godshall [EMAIL PROTECTED] wrote: OK, so let's go back to basics for a moment. wget's default behavior is to use all available bandwidth. Is this the right thing to do? Or is it better to back off a little after a bit? Heh. Well, some people are saying that Wget should support accelerated downloads; several connections to download a single resource, which can sometimes give a speed increase at the expense of nice-ness. So you could say we're at a happy medium between those options! :) Actually, Wget probably will get support for multiple simultaneous connections; but number of connections to one host will be limited to a max of two. It's impossible for Wget to know how much is appropriate to back off, and in most situations I can think of, backing off isn't appropriate. In general, though, I agree that Wget's policy should be nice by default. If it was me, I'd have it default to backing off to 95% by default and have options for more aggressive behavior, like the multiple connections, etc. I'm surprised multiple connections would buy you anything, though. I guess I'll take a look through the archives and see what the argument is. Does one tcp connection back off on a lost packet and the other one gets to keep going? Hmmm. Josh Williams wrote: That's one of the reasons I believe this should be a module instead, because it's more or less a hack to patch what the environment should be doing for wget, not vice versa. At this point, since it seems to have some demand, I'll probably put it in for 1.12.x; but I may very well move it to a module when we have support for that. Thanks, yes that makes sense. Of course, Tony G indicated that he would prefer it to be conditionally-compiled, for concerns that the plugin architecture will add overhead to the wget binary. Wget is such a lightweight app, though, I'm not thinking that the plugin architecture is going to be very significant. It would be interesting to see if we can add support for some modules to be linked in directly, rather than dynamically; however, it'd still probably have to use the same mechanisms as the normal modules in order to work. Anyway, I'm sure we'll think about those things more when the time comes. Makes sense. Or you could be proactive and start work on http://wget.addictivecode.org/FeatureSpecifications/Plugins (non-existent, but already linked to from FeatureSpecifications). :) I'll look into that. On 10/14/07, Hrvoje Niksic [EMAIL PROTECTED] wrote: Tony Godshall [EMAIL PROTECTED] writes: OK, so let's go back to basics for a moment. wget's default behavior is to use all available bandwidth. And so is the default behavior of curl, Firefox, Opera, and so on. The expected behavior of a program that receives data over a TCP stream is to consume data as fast as it arrives. What was your point exactly? All the other kids do it? Tony G
Re: wget default behavior [was Re: working on patch to limit to percent of bandwidth]
On 10/13/07, Josh Williams [EMAIL PROTECTED] wrote: On 10/13/07, Tony Godshall [EMAIL PROTECTED] wrote: Well, you may have such problems but you are very much reaching in thinking that my --linux-percent has anything to do with any failing in linux. It's about dealing with unfair upstream switches, which, I'm quite sure, were not running Linux. Let's not hijack this into a linux-bash. I really don't know what you were trying to say here... You seemed to think --limit-percent was a solution for a misbehavior of linux. My experience with linux networking is that it's very effective and that upstream non-linux switches don't handle such an effective client well. When a linux box is my gateway/firewall I don't experience single-client monopolization at all. As to your linux issues, that's a topic that should probably discussed in another forum, but I will say that I'm quite happy with the latest Linux kernels- with the low-latency patch integrated and enabled my desktop experience is quite snappy, even on this four-year-old 1.2GHz laptop. And stay away from the distro server kernels- they are optimized for throughput at the cost of latency- they do their I/O in bigger chunks. And stay away from the RT kernels- they go too far in giving I/O priority over everything else and end up churning on IRQs unless they are very carefully tuned. And no, I won't call the linux kernel GNU/Linux, if that was what you were after. The kernel is after all the one Linux thing in a GNU/Linux system. .. I use GNU/Linux. Anyone try Debian GNU/BSD yet? Or Debian/Nexenta/GNU/Solaris? -- Best Regards. Please keep in touch.
wget default behavior [was Re: working on patch to limit to percent of bandwidth]
OK, so let's go back to basics for a moment. wget's default behavior is to use all available bandwidth. Is this the right thing to do? Or is it better to back off a little after a bit? Tony
Re: wget default behavior [was Re: working on patch to limit to percent of bandwidth]
On 10/13/07, Tony Godshall [EMAIL PROTECTED] wrote: OK, so let's go back to basics for a moment. wget's default behavior is to use all available bandwidth. Is this the right thing to do? Or is it better to back off a little after a bit? Tony IMO, this should be handled by the operating system, not the individual applications. That's one of the reasons I believe this should be a module instead, because it's more or less a hack to patch what the environment should be doing for wget, not vice versa. In my experience, GNU/Linux tends to consume all the resources unbiasedly, seemingly on a first come first serve *until you're done* basis. This should be brought to the attention of the LKML. However, other operating systems do not seem to have this problem as much. Even Windows networks seem to prioritise packets. This is a problem I've been having major headaches with lately. It would be nice if wget had a patch for this problem, but that would not solve the problem of my web browser or sftp client consuming all the network resources.
Re: wget default behavior [was Re: working on patch to limit to percent of bandwidth]
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 10/13/07, Tony Godshall [EMAIL PROTECTED] wrote: OK, so let's go back to basics for a moment. wget's default behavior is to use all available bandwidth. Is this the right thing to do? Or is it better to back off a little after a bit? Heh. Well, some people are saying that Wget should support accelerated downloads; several connections to download a single resource, which can sometimes give a speed increase at the expense of nice-ness. So you could say we're at a happy medium between those options! :) Actually, Wget probably will get support for multiple simultaneous connections; but number of connections to one host will be limited to a max of two. It's impossible for Wget to know how much is appropriate to back off, and in most situations I can think of, backing off isn't appropriate. In general, though, I agree that Wget's policy should be nice by default. Josh Williams wrote: That's one of the reasons I believe this should be a module instead, because it's more or less a hack to patch what the environment should be doing for wget, not vice versa. At this point, since it seems to have some demand, I'll probably put it in for 1.12.x; but I may very well move it to a module when we have support for that. Of course, Tony G indicated that he would prefer it to be conditionally-compiled, for concerns that the plugin architecture will add overhead to the wget binary. Wget is such a lightweight app, though, I'm not thinking that the plugin architecture is going to be very significant. It would be interesting to see if we can add support for some modules to be linked in directly, rather than dynamically; however, it'd still probably have to use the same mechanisms as the normal modules in order to work. Anyway, I'm sure we'll think about those things more when the time comes. Or you could be proactive and start work on http://wget.addictivecode.org/FeatureSpecifications/Plugins (non-existent, but already linked to from FeatureSpecifications). :) - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHESNi7M8hyUobTrERCChSAJ90KmWelT0bH9qQMlArapEdn1ocSACfRHcK JJmV8QaqcnKTRYam/v0/lwg= =TPsw -END PGP SIGNATURE-
Re: wget default behavior [was Re: working on patch to limit to percent of bandwidth]
On 10/13/07, Josh Williams [EMAIL PROTECTED] wrote: On 10/13/07, Tony Godshall [EMAIL PROTECTED] wrote: OK, so let's go back to basics for a moment. wget's default behavior is to use all available bandwidth. Is this the right thing to do? Or is it better to back off a little after a bit? Tony IMO, this should be handled by the operating system, not the individual applications. That's one of the reasons I believe this should be a module instead, because it's more or less a hack to patch what the environment should be doing for wget, not vice versa. In my experience, GNU/Linux tends to consume all the resources unbiasedly, seemingly on a first come first serve *until you're... Well, you may have such problems but you are very much reaching in thinking that my --linux-percent has anything to do with any failing in linux. It's about dealing with unfair upstream switches, which, I'm quite sure, were not running Linux. Let's not hijack this into a linux-bash. -- Best Regards. Tony
Re: wget default behavior [was Re: working on patch to limit to percent of bandwidth]
On 10/13/07, Tony Godshall [EMAIL PROTECTED] wrote: Well, you may have such problems but you are very much reaching in thinking that my --linux-percent has anything to do with any failing in linux. It's about dealing with unfair upstream switches, which, I'm quite sure, were not running Linux. Let's not hijack this into a linux-bash. I really don't know what you were trying to say here. I use GNU/Linux.
Re: wget default behavior [was Re: working on patch to limit to percent of bandwidth]
On 10/13/07, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 10/13/07, Tony Godshall [EMAIL PROTECTED] wrote: OK, so let's go back to basics for a moment. wget's default behavior is to use all available bandwidth. Is this the right thing to do? Or is it better to back off a little after a bit? Heh. Well, some people are saying that Wget should support accelerated downloads; several connections to download a single resource, which can sometimes give a speed increase at the expense of nice-ness. So you could say we're at a happy medium between those options! :) Actually, Wget probably will get support for multiple simultaneous connections; but number of connections to one host will be limited to a max of two. It's impossible for Wget to know how much is appropriate to back off, and in most situations I can think of, backing off isn't appropriate. In general, though, I agree that Wget's policy should be nice by default. Yeah, thanks, that's what I was trying to get at. Wget should be aggressive iff you tell it to be, and otherwise should be nice. In the presense of bad upstream switches I've found that even a --limit-rate of 95% is way more tolerable to others than the default 100% utilization. Josh Williams wrote: That's one of the reasons I believe this should be a module instead, because it's more or less a hack to patch what the environment should be doing for wget, not vice versa. At this point, since it seems to have some demand, I'll probably put it in for 1.12.x; but I may very well move it to a module when we have support for that. Of course, Tony G indicated that he would prefer it to be conditionally-compiled, for concerns that the plugin architecture will add overhead to the wget binary. Wget is such a lightweight app, though, I'm not thinking that the plugin architecture is going to be very significant. It would be interesting to see if we can add support for some modules to be linked in directly, rather than dynamically; however, it'd still probably have to use the same mechanisms as the normal modules in order to work. Anyway, I'm sure we'll think about those things more when the time comes. Good point. I guess someone who wants an ultralightweight wget will use the one in busybox instead of the normal one. Or you could be proactive and start work on http://wget.addictivecode.org/FeatureSpecifications/Plugins (non-existent, but already linked to from FeatureSpecifications). :) Interesting. I'll take a look. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHESNi7M8hyUobTrERCChSAJ90KmWelT0bH9qQMlArapEdn1ocSACfRHcK JJmV8QaqcnKTRYam/v0/lwg= =TPsw -END PGP SIGNATURE- -- Best Regards. Please keep in touch.