Re: wget default behavior [was Re: working on patch to limit to percent of bandwidth]

2007-10-17 Thread Matthias Vill
Tony Godshall wrote:
 If it was me, I'd have it default to backing off to 95% by default and
 have options for more aggressive behavior, like the multiple
 connections, etc.

I don't like a default back-off rule. I often encounter downloads with
often changing download speeds. The idea that the first few seconds I
only have a quite bad speed and could get much more out of it is just
not satisfying.

 I'm surprised multiple connections would buy you anything, though.  I
 guess I'll take a look through the archives and see what the argument
 is.  Does one tcp connection back off on a lost packet and the other
 one gets to keep going?  Hmmm.

I guess you get improvements if e.g. on your side you have more free
bandwidth than on the source-side. Having two connections than means,
that you get almost twice the download speed, because you have two
connections competing for free bandwidth and ideally every connection
made two the sever is equally fast.

So in cases, where you are the only one connecting, you probably win
nothing.

Greetings

Matthias


Re: wget default behavior [was Re: working on patch to limit to percent of bandwidth]

2007-10-17 Thread Tony Godshall
On 10/17/07, Matthias Vill [EMAIL PROTECTED] wrote:
 Tony Godshall wrote:
  If it was me, I'd have it default to backing off to 95% by default and
  have options for more aggressive behavior, like the multiple
  connections, etc.

 I don't like a default back-off rule. I often encounter downloads with
 often changing download speeds. The idea that the first few seconds I
 only have a quite bad speed and could get much more out of it is just
 not satisfying.

You might be surprised, but I totally agree with you.  A default
backoff rule would only make sense if the measuring was better.  E.g.
a periodic ramp-up/back-off behavior to achieve 95% of the maximum
measured rate.

  I'm surprised multiple connections would buy you anything, though.  I
  guess I'll take a look through the archives and see what the argument
  is.  Does one tcp connection back off on a lost packet and the other
  one gets to keep going?  Hmmm.

 I guess you get improvements if e.g. on your side you have more free
 bandwidth than on the source-side. Having two connections than means,
 that you get almost twice the download speed, because you have two
 connections competing for free bandwidth and ideally every connection
 made two the sever is equally fast.

 So in cases, where you are the only one connecting, you probably win
 nothing.

Ah, I get it.  People want to defeat sender rate-limiting or other QOS
controls.

The opposite of nice.  We could call it --mean-mode.  Or --meanness n,
where n=2 means I want to have two threads/connections, i.e. twice as
mean as the default.

No, well, actually, I guess there can be cases where bad upstream
configurations result in a situation where more connections don't
necessarily mean one is taking more than one's fair share of
bandwidth, but I bet this option will be result in more harm than
good.  Perhaps it should be one of those things that one can do
oneself if one must but is generally frowned upon (like making a
version of wget that ignores robots.txt).

TG


Re: wget default behavior [was Re: working on patch to limit to percent of bandwidth]

2007-10-17 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

(Accidentally sent private reply).

Tony Godshall wrote:
 On 10/17/07, Matthias Vill [EMAIL PROTECTED] wrote:
 Tony Godshall wrote:
 If it was me, I'd have it default to backing off to 95% by default and
 have options for more aggressive behavior, like the multiple
 connections, etc.
 I don't like a default back-off rule. I often encounter downloads with
 often changing download speeds. The idea that the first few seconds I
 only have a quite bad speed and could get much more out of it is just
 not satisfying.

 You might be surprised, but I totally agree with you.  A default
 backoff rule would only make sense if the measuring was better.  E.g.
 a periodic ramp-up/back-off behavior to achieve 95% of the maximum
 measured rate.

 I'm surprised multiple connections would buy you anything, though.  I
 guess I'll take a look through the archives and see what the argument
 is.  Does one tcp connection back off on a lost packet and the other
 one gets to keep going?  Hmmm.
 I guess you get improvements if e.g. on your side you have more free
 bandwidth than on the source-side. Having two connections than means,
 that you get almost twice the download speed, because you have two
 connections competing for free bandwidth and ideally every connection
 made two the sever is equally fast.

 So in cases, where you are the only one connecting, you probably win
 nothing.

 Ah, I get it.  People want to defeat sender rate-limiting or other QOS
 controls.

 The opposite of nice.  We could call it --mean-mode.  Or --meanness n,
 where n=2 means I want to have two threads/connections, i.e. twice as
 mean as the default.

Oh, think you misunderstand. I have no intention of providing such a
thing. That's what download accelerators are for, and as much as some
people may want Wget to be one, I'm against it.

However, multiple simultaneous connections to _different_ hosts, could
be very beneficial, as latency for one server won't mean we sit around
waiting for it before downloading from others. And, up to two
connections to the same host will also be supported: but probably only
for separate downloads (that way, we can be sending requests on one
connection while we're downloading on another). The HTTP spec says that
clients _should_ have a maximum of two connections to any one host, so
we appear to be justified to do that. However, it will absolutely not be
done by default. Among other things, multiple connections will destroy
the way we currently do logging, which in and of itself is a good reason
not to do it, apart from niceness.

 No, well, actually, I guess there can be cases where bad upstream
 configurations result in a situation where more connections don't
 necessarily mean one is taking more than one's fair share of
 bandwidth, but I bet this option will be result in more harm than
 good.  Perhaps it should be one of those things that one can do
 oneself if one must but is generally frowned upon (like making a
 version of wget that ignores robots.txt).

You do know that Wget already can be configured to ignore robots.txt, right?

Yeah, I'm already cringing at the idea that people will alter the two
connections per host limit to higher values. Even if we limit it to
_one_ per host, though, as long as we're including support for multiple
connections of any sort, it'd be easy to modify Wget to allow them for
the same host.

And multiple connections to multiple hosts will be obviously beneficial,
to avoid bottlenecks and the like. Plus, the planned (plugged-in)
support for Metalink, using multiple connections to different hosts to
obtain the _same_ file, could be very nice for large and/or very popular
downloads.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHFmYO7M8hyUobTrERCEDOAJ0T/a70fraMdvQMGgIGPl2XXoprHACcDlI0
eqx0DfnsJ+NUAkzJhUMQS68=
=EGay
-END PGP SIGNATURE-


Re: wget default behavior [was Re: working on patch to limit to percent of bandwidth]

2007-10-16 Thread Tony Godshall
On 10/13/07, Micah Cowan [EMAIL PROTECTED] wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256


  On 10/13/07, Tony Godshall [EMAIL PROTECTED] wrote:
  OK, so let's go back to basics for a moment.
 
  wget's default behavior is to use all available bandwidth.
 
  Is this the right thing to do?
 
  Or is it better to back off a little after a bit?

 Heh. Well, some people are saying that Wget should support accelerated
 downloads; several connections to download a single resource, which can
 sometimes give a speed increase at the expense of nice-ness.

 So you could say we're at a happy medium between those options! :)

 Actually, Wget probably will get support for multiple simultaneous
 connections; but number of connections to one host will be limited to a
 max of two.

 It's impossible for Wget to know how much is appropriate to back off,
 and in most situations I can think of, backing off isn't appropriate.

 In general, though, I agree that Wget's policy should be nice by default.

If it was me, I'd have it default to backing off to 95% by default and
have options for more aggressive behavior, like the multiple
connections, etc.

I'm surprised multiple connections would buy you anything, though.  I
guess I'll take a look through the archives and see what the argument
is.  Does one tcp connection back off on a lost packet and the other
one gets to keep going?  Hmmm.

 Josh Williams wrote:
  That's one of the reasons I believe this
  should be a module instead, because it's more or less a hack to patch
  what the environment should be doing for wget, not vice versa.

 At this point, since it seems to have some demand, I'll probably put it
 in for 1.12.x; but I may very well move it to a module when we have
 support for that.

Thanks, yes that makes sense.

 Of course, Tony G indicated that he would prefer it to be
 conditionally-compiled, for concerns that the plugin architecture will
 add overhead to the wget binary. Wget is such a lightweight app, though,
 I'm not thinking that the plugin architecture is going to be very
 significant. It would be interesting to see if we can add support for
 some modules to be linked in directly, rather than dynamically; however,
 it'd still probably have to use the same mechanisms as the normal
 modules in order to work. Anyway, I'm sure we'll think about those
 things more when the time comes.

Makes sense.

 Or you could be proactive and start work on
 http://wget.addictivecode.org/FeatureSpecifications/Plugins
 (non-existent, but already linked to from FeatureSpecifications). :)

I'll look into that.

 On 10/14/07, Hrvoje Niksic [EMAIL PROTECTED] wrote:
  Tony Godshall [EMAIL PROTECTED] writes:
 
   OK, so let's go back to basics for a moment.
  
   wget's default behavior is to use all available bandwidth.
 
  And so is the default behavior of curl, Firefox, Opera, and so on.
  The expected behavior of a program that receives data over a TCP
  stream is to consume data as fast as it arrives.

What was your point exactly?  All the other kids do it?

Tony G


Re: wget default behavior [was Re: working on patch to limit to percent of bandwidth]

2007-10-14 Thread Tony Godshall
On 10/13/07, Josh Williams [EMAIL PROTECTED] wrote:
 On 10/13/07, Tony Godshall [EMAIL PROTECTED] wrote:
  Well, you may have such problems but you are very much reaching in
  thinking that my --linux-percent has anything to do with any failing
  in linux.
 
  It's about dealing with unfair upstream switches, which, I'm quite
  sure, were not running Linux.
 
  Let's not hijack this into a linux-bash.

 I really don't know what you were trying to say here...

You seemed to think --limit-percent was a solution for a misbehavior of linux.

My experience with linux networking is that it's very effective and
that upstream non-linux switches don't handle such an effective client
well.

When a linux box is my gateway/firewall I don't experience
single-client monopolization at all.

As to your linux issues, that's a topic that should probably discussed
in another forum, but I will say that I'm quite happy with the latest
Linux kernels- with the low-latency patch integrated and enabled my
desktop experience is quite snappy, even on this four-year-old 1.2GHz
laptop.  And stay away from the distro server kernels- they are
optimized for throughput at the cost of latency- they do their I/O in
bigger chunks.  And stay away from the RT kernels- they go too far in
giving I/O priority over everything else and end up churning on IRQs
unless they are very carefully tuned.

And no, I won't call the linux kernel GNU/Linux, if that was what you
were after.  The kernel is after all the one Linux thing in a
GNU/Linux system.

 .. I use GNU/Linux.

Anyone try Debian GNU/BSD yet?  Or Debian/Nexenta/GNU/Solaris?

-- 
Best Regards.
Please keep in touch.


wget default behavior [was Re: working on patch to limit to percent of bandwidth]

2007-10-13 Thread Tony Godshall
OK, so let's go back to basics for a moment.

wget's default behavior is to use all available bandwidth.

Is this the right thing to do?

Or is it better to back off a little after a bit?

Tony


Re: wget default behavior [was Re: working on patch to limit to percent of bandwidth]

2007-10-13 Thread Josh Williams
On 10/13/07, Tony Godshall [EMAIL PROTECTED] wrote:
 OK, so let's go back to basics for a moment.

 wget's default behavior is to use all available bandwidth.

 Is this the right thing to do?

 Or is it better to back off a little after a bit?

 Tony

IMO, this should be handled by the operating system, not the
individual applications. That's one of the reasons I believe this
should be a module instead, because it's more or less a hack to patch
what the environment should be doing for wget, not vice versa.

In my experience, GNU/Linux tends to consume all the resources
unbiasedly, seemingly on a first come first serve *until you're
done* basis. This should be brought to the attention of the LKML.

However, other operating systems do not seem to have this problem as
much. Even Windows networks seem to prioritise packets.

This is a problem I've been having major headaches with lately. It
would be nice if wget had a patch for this problem, but that would not
solve the problem of my web browser or sftp client consuming all the
network resources.


Re: wget default behavior [was Re: working on patch to limit to percent of bandwidth]

2007-10-13 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256


 On 10/13/07, Tony Godshall [EMAIL PROTECTED] wrote:
 OK, so let's go back to basics for a moment.

 wget's default behavior is to use all available bandwidth.

 Is this the right thing to do?

 Or is it better to back off a little after a bit?

Heh. Well, some people are saying that Wget should support accelerated
downloads; several connections to download a single resource, which can
sometimes give a speed increase at the expense of nice-ness.

So you could say we're at a happy medium between those options! :)

Actually, Wget probably will get support for multiple simultaneous
connections; but number of connections to one host will be limited to a
max of two.

It's impossible for Wget to know how much is appropriate to back off,
and in most situations I can think of, backing off isn't appropriate.

In general, though, I agree that Wget's policy should be nice by default.

Josh Williams wrote:
 That's one of the reasons I believe this
 should be a module instead, because it's more or less a hack to patch
 what the environment should be doing for wget, not vice versa.

At this point, since it seems to have some demand, I'll probably put it
in for 1.12.x; but I may very well move it to a module when we have
support for that.

Of course, Tony G indicated that he would prefer it to be
conditionally-compiled, for concerns that the plugin architecture will
add overhead to the wget binary. Wget is such a lightweight app, though,
I'm not thinking that the plugin architecture is going to be very
significant. It would be interesting to see if we can add support for
some modules to be linked in directly, rather than dynamically; however,
it'd still probably have to use the same mechanisms as the normal
modules in order to work. Anyway, I'm sure we'll think about those
things more when the time comes.

Or you could be proactive and start work on
http://wget.addictivecode.org/FeatureSpecifications/Plugins
(non-existent, but already linked to from FeatureSpecifications). :)

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHESNi7M8hyUobTrERCChSAJ90KmWelT0bH9qQMlArapEdn1ocSACfRHcK
JJmV8QaqcnKTRYam/v0/lwg=
=TPsw
-END PGP SIGNATURE-


Re: wget default behavior [was Re: working on patch to limit to percent of bandwidth]

2007-10-13 Thread Tony Godshall
On 10/13/07, Josh Williams [EMAIL PROTECTED] wrote:
 On 10/13/07, Tony Godshall [EMAIL PROTECTED] wrote:
  OK, so let's go back to basics for a moment.
 
  wget's default behavior is to use all available bandwidth.
 
  Is this the right thing to do?
 
  Or is it better to back off a little after a bit?
 
  Tony

 IMO, this should be handled by the operating system, not the
 individual applications. That's one of the reasons I believe this
 should be a module instead, because it's more or less a hack to patch
 what the environment should be doing for wget, not vice versa.

 In my experience, GNU/Linux tends to consume all the resources
 unbiasedly, seemingly on a first come first serve *until you're...

Well, you may have such problems but you are very much reaching in
thinking that my --linux-percent has anything to do with any failing
in linux.

It's about dealing with unfair upstream switches, which, I'm quite
sure, were not running Linux.

Let's not hijack this into a linux-bash.

-- 
Best Regards.
Tony


Re: wget default behavior [was Re: working on patch to limit to percent of bandwidth]

2007-10-13 Thread Josh Williams
On 10/13/07, Tony Godshall [EMAIL PROTECTED] wrote:
 Well, you may have such problems but you are very much reaching in
 thinking that my --linux-percent has anything to do with any failing
 in linux.

 It's about dealing with unfair upstream switches, which, I'm quite
 sure, were not running Linux.

 Let's not hijack this into a linux-bash.

I really don't know what you were trying to say here. I use GNU/Linux.


Re: wget default behavior [was Re: working on patch to limit to percent of bandwidth]

2007-10-13 Thread Tony Godshall
On 10/13/07, Micah Cowan [EMAIL PROTECTED] wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256


  On 10/13/07, Tony Godshall [EMAIL PROTECTED] wrote:
  OK, so let's go back to basics for a moment.
 
  wget's default behavior is to use all available bandwidth.
 
  Is this the right thing to do?
 
  Or is it better to back off a little after a bit?

 Heh. Well, some people are saying that Wget should support accelerated
 downloads; several connections to download a single resource, which can
 sometimes give a speed increase at the expense of nice-ness.

 So you could say we're at a happy medium between those options! :)

 Actually, Wget probably will get support for multiple simultaneous
 connections; but number of connections to one host will be limited to a
 max of two.

 It's impossible for Wget to know how much is appropriate to back off,
 and in most situations I can think of, backing off isn't appropriate.

 In general, though, I agree that Wget's policy should be nice by default.

Yeah, thanks, that's what I was trying to get at.

Wget should be aggressive iff you tell it to be, and otherwise should be nice.

In the presense of bad upstream switches I've found that even a
--limit-rate of 95% is way more tolerable to others than the default
100% utilization.

 Josh Williams wrote:
  That's one of the reasons I believe this
  should be a module instead, because it's more or less a hack to patch
  what the environment should be doing for wget, not vice versa.

 At this point, since it seems to have some demand, I'll probably put it
 in for 1.12.x; but I may very well move it to a module when we have
 support for that.

 Of course, Tony G indicated that he would prefer it to be
 conditionally-compiled, for concerns that the plugin architecture will
 add overhead to the wget binary. Wget is such a lightweight app, though,
 I'm not thinking that the plugin architecture is going to be very
 significant. It would be interesting to see if we can add support for
 some modules to be linked in directly, rather than dynamically; however,
 it'd still probably have to use the same mechanisms as the normal
 modules in order to work. Anyway, I'm sure we'll think about those
 things more when the time comes.

Good point.  I guess someone who wants an ultralightweight wget will
use the one in busybox instead of the normal one.

 Or you could be proactive and start work on
 http://wget.addictivecode.org/FeatureSpecifications/Plugins
 (non-existent, but already linked to from FeatureSpecifications). :)

Interesting.  I'll take a look.

 - --
 Micah J. Cowan
 Programmer, musician, typesetting enthusiast, gamer...
 http://micah.cowan.name/

 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.6 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

 iD8DBQFHESNi7M8hyUobTrERCChSAJ90KmWelT0bH9qQMlArapEdn1ocSACfRHcK
 JJmV8QaqcnKTRYam/v0/lwg=
 =TPsw
 -END PGP SIGNATURE-



-- 
Best Regards.
Please keep in touch.