Re: Wget and Yahoo login?

2008-09-10 Thread Tony Godshall
And you'll probably have to do this again- I bet
yahoo expires the session cookies!


On Tue, Sep 9, 2008 at 2:18 PM, Donald Allen [EMAIL PROTECTED] wrote:
 After surprisingly little struggle, I got Plan B working -- logged into
 yahoo with wget, saved the cookies, including session cookies, and then
 proceeded to fetch pages using the saved cookies. Those pages came back
 logged in as me, with my customizations. Thanks to Tony, Daniel, and Micah
 -- you all provided critical advice in solving this problem.

 /Don

 On Tue, Sep 9, 2008 at 2:21 PM, Donald Allen [EMAIL PROTECTED] wrote:


 On Tue, Sep 9, 2008 at 1:51 PM, Micah Cowan [EMAIL PROTECTED] wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Donald Allen wrote:
 
 
  On Tue, Sep 9, 2008 at 1:41 PM, Micah Cowan [EMAIL PROTECTED]
  mailto:[EMAIL PROTECTED] wrote:
 
  Donald Allen wrote:
  I am doing the yahoo session login with firefox, not with wget,
  so I'm
  using the first and easier of your two suggested methods. I'm
  guessing
  you are thinking that I'm trying to login to the yahoo session with
  wget, and thus --keep-session-cookies and
  --save-cookies=foo.txt would
  make perfect sense to me, but that's not what I'm doing (yet --
  if I'm
  right about what's happening here, I'm going to have to resort to
  this).
  But using firefox to initiate the session, it looks to me like wget
  never gets to see the session cookies because I don't think firefox
  writes them to its cookie file (which actually makes sense -- if they
  only need to live as long as the session, why write them out?).
 
  Yes, and I understood this; the thing is, that if session cookies are
  involved (i.e., cookies that are marked for immediate expiration and
  are
  not meant to be saved to the cookies file), then I don't see how you
  have much choice other than to use the harder method, or else to fake
  the session cookies by manually inserting them to your cookies file or
  whatnot (not sure how well that may be expected to work). Or, yeah, add
  an explicit --header 'Cookie: ...'.
 
 
  Ah, the misunderstanding was that the stuff you thought I missed was
  intended to push me in the direction of Plan B -- log in to yahoo with
  wget.

 Yes; and that's entirely my fault, as I didn't explicitly say that.

 No problem.


  I understand now. I'll look at trying to make this work. Thanks
  for all the help, though I can't guarantee that you are done yet :-)
  But, hopefully, this exchange will benefit others.

 I was actually surprised you kept going after I pointed out that it
 required the Accept-Encoding header that results in gzipped content.

 That didn't faze me because the pages I'm after will be processed by a
 python program, so having to gunzip would not require a manual step.

 This behavior is a little surprising to me from Yahoo!. It's not
 surprising in _general_, but for a site that really wants to be as
 accessible as possible (I would think?), insisting on the latest
 browsers seems ill-advised.

 Ah, well. At least the days are _mostly_ gone when I'd fire up Netscape,
 visit a site, and get a server-generated page that's empty other than
 the phrase You're not using Internet Explorer. :p

 And taking it one step further, I'm greatly enjoying watching Microsoft
 thrash around, trying to save themselves, which I don't think they will.
 Perhaps they'll re-invent themselves, as IBM did, but their cash cow is not
 going to produce milk too much longer. I've just installed the Chrome beta
 on the Windows side of one of my machines (I grudgingly give it 10 Gb on
 each machine; Linux gets the rest), and it looks very, very nice. They've
 still got work to do, but they appear to be heading in a very good
 direction. These are smart people at Google. All signs seem to be pointing
 towards more and more computing happening on the server side in the coming
 years.

 /Don


 - --
 Micah J. Cowan
 Programmer, musician, typesetting enthusiast, gamer.
 GNU Maintainer: wget, screen, teseq
 http://micah.cowan.name/
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.7 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

 iD8DBQFIxreZ7M8hyUobTrERAslyAJwKfirhzth9ACgdunxp/rfQlR86mQCcClik
 3HbbATyqnrm0hAJXqNTqpl4=
 =3XD/
 -END PGP SIGNATURE-






-- 
Best Regards.
Please keep in touch.
This is unedited.
P-)


Re: .1, .2 before suffix rather than after

2007-11-29 Thread Tony Godshall
...
 At the release of Wget 1.11, it is my intention to try to attract as
 much developer interest as possible. At the moment, and despite Wget's
 pervasive presence, it has virtually no user or developer community.
 Given the amount of work that needs to be done, this is not good. The
 announcement of the first new release of GNU Wget in two years seems a
 great opportunity to solicit help!
...

That's sort of the nature of older tools with a well-defined mission-
they do their
job so well there's little itch to tweak them.  If it ain't broken,
you don't fix it.
Freshmeat lists wget as mature, which basically means the same thing.

I guess wget will have to get a bit immature to get some buzz going.  Some
pretty insane goals in a wget2 roadmap would probably do the trick.  How
about announcing plans implement DHT and make bittorrent obsolete?  That
should make slashdot ;-)

Tony

--
The above is not to be taken seriously.


Re: wget2

2007-11-29 Thread Tony Godshall
On Nov 29, 2007 3:48 PM, Alan Thomas [EMAIL PROTECTED] wrote:
 What is wget2?   Any plans to move to Java?   (Of course, the latter
 will not be controversial.  :)

Troll ;-)


Re: wget2

2007-11-29 Thread Tony Godshall
On Nov 29, 2007 4:02 PM, Micah Cowan [EMAIL PROTECTED] wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Alan Thomas wrote:
  What is wget2?   Any plans to move to Java?   (Of course, the latter
  will not be controversial.  :)

 Java is not likely. The most likely language is probably still C,
...

I think he's a troll because one of the top google hits for wget2 is a
short little java program he's apparently trying to draw attention to.


Re: Can't add ampersand to url I want to get

2007-11-20 Thread Tony Godshall
Single quotes will work when a URL includes a dollar sign.  Double quotes
won't.

On Nov 5, 2007 12:07 PM, Micah Cowan [EMAIL PROTECTED] wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 Alles, Kris wrote:
  I tried wrapping the url with double quotes instead of single quotes and
  it works. Please disregard previous message.

 Both single and double quotes should work in typical Unix shells.
 Unless, of course, the quoted text contains a quote (which URIs usually
 shouldn't).

 - --
 Micah J. Cowan
 Programmer, musician, typesetting enthusiast, gamer...
 http://micah.cowan.name/

 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.6 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

 iD8DBQFHL3f87M8hyUobTrERCE29AJ0cYTbE+ukuyO2QwlLmpL8Jl8VJXwCeMAfD
 qToM3B1IsbY6BCjtRD94JBU=
 =wkDU
 -END PGP SIGNATURE-




-- 
Best Regards.
Please keep in touch.


Re: Need help with wget from a password-protected URL

2007-11-10 Thread Tony Godshall
sounds like a shell issue.  assuming you are on a nix, try 'pass' (so
shell passed the weird chars literally.  If you are on Windows, it's
another story.

On 11/10/07, Uma Shankar [EMAIL PROTECTED] wrote:
 Hi -
 I've been struggling to download data from a protected site. The man pages
 intruct me to use the --http-user=USER and --http-passwd=PASS options when
 issuinig the wget command to the URL. I get error messages when wget
 encounters special chars in the password. Is there a way to get around this?
 I really need helpo downloading the data.
 Thanks,
 Uma Shankar, Research Associate
 Institute for the Environment
 Bank of America Plaza CB# 6116
 137 E. Franklin St Room 644
 Chapel Hill NC 27599-6116
 Phone: (919) 966-2102
 Fax (919) 843-3113
 Mobile: (919) 441-9202

 Where is the wisdom we have lost in knowledge? Where is the knowledge we
 have lost in information? -T. S. Eliot (1888-1965)




-- 
Best Regards.
Please keep in touch.


Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)

2007-11-01 Thread Tony Godshall
On 10/31/07, Micah Cowan [EMAIL PROTECTED] wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 Tony Godshall wrote:
  On 10/30/07, Micah Cowan [EMAIL PROTECTED] wrote:
  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA256
 
  Tony Godshall wrote:
  Perhaps the little wget could be called wg.  A quick google and
  wikipedia search shows no real namespace collisions.
  To reduce confusion/upgrade problems, I would think we would want to
  ensure that the traditional/little Wget keeps the current name, and
  any snazzified version gets a new one.
 
  Please not another -ng.  How about wget2 (since we're on 1.x).  And
  the current one remains in 1.x.

 I agree that -ng would not be appropriate. But since we're really
 talking about two separate beasts, I'd prefer not to limit what we can
 do with Wget (original)'s versioning. Who's to say a 2.0 release of the
 light version will not be warranted someday?

 At any rate, the snazzy one looks to be diverging from classic Wget in
 some rather significant ways, in which case, I'd kind of prefer to part
 names a bit more severely than just wget-ng or wget2. Reget,
 perhaps: that name could be both Recursive Get (describing what's
 still its primary feature), or Revised/Re-envisioned Wget. :)

 I think, too, that names such as wget2 are more often things that
 packagers (say, Debian) do, when they want to include
 backwards-incompatible, significantly new versions of software, but
 don't want to break people's usage of older stuff. Or, when they just
 want to offer both versions. Cf apache2 in Debian.

  And then eventually everyone's gotten used to used to and can't live
  without the new bittorrent-like almost-multithreaded features. ;-)

 :)

Pget.

Parallel get.

Tget.

Torrent-like-get.

Bget.

Bigger get.

BBWget.

Bigger Better wget.

OK, ok sorry.


Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)

2007-10-31 Thread Tony Godshall
On 10/30/07, Micah Cowan [EMAIL PROTECTED] wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 Tony Godshall wrote:
  Perhaps the little wget could be called wg.  A quick google and
  wikipedia search shows no real namespace collisions.

 To reduce confusion/upgrade problems, I would think we would want to
 ensure that the traditional/little Wget keeps the current name, and
 any snazzified version gets a new one.

Please not another -ng.  How about wget2 (since we're on 1.x).  And
the current one remains in 1.x.

And then eventually everyone's gotten used to used to and can't live
without the new bittorrent-like almost-multithreaded features. ;-)

Tony


Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)

2007-10-30 Thread Tony Godshall
On 10/26/07, Josh Williams [EMAIL PROTECTED] wrote:
 On 10/26/07, Micah Cowan [EMAIL PROTECTED] wrote:
  And, of course, when I say there would be two Wgets, what I really
  mean by that is that the more exotic-featured one would be something
  else entirely than a Wget, and would have a separate name.

 I think the idea of having two Wgets is good. I too have been
 concerned about the resources required in creating the all-out version
 2.0. The current code for Wget is a bit mangled, but I think the basic
 concepts surrounding it are very good ones. Although the code might
 suck for those trying to read it, I think it could be very great with
 a little regular maintenance.

Perhaps the little wget could be called wg.  A quick google and
wikipedia search shows no real namespace collisions.

 There still remains the question, though, of whether version 2 will
 require a complete rewrite. Considering how fundamental these changes
 are, I don't think we would have much of a choice. You mentioned that
 they could share code for recursion, but I don't see how. IIRC, the
 code for recursion in the current version is very dependent on the
 current methods of operation. It would probably have to be rewritten
 to be shared.

 As for libcurl, I see no reason why not. Also, would these be two
 separate GNU projects? Would they be packaged in the same source code,
 like finch and pidgin?

 I do believe the next question at hand is what version 2's official
 mascot will be. I purpose Lenny the tortoise ;)

Oooh- confusion with Debian testing

_  ..
 Lenny -  (_\/  \_,
 'uuuu~'



-- 
Best Regards.
Please keep in touch.


Re: More portability stuff [Re: gettext configuration]

2007-10-30 Thread Tony Godshall
On 10/29/07, Dražen Kačar [EMAIL PROTECTED] wrote:
 Micah Cowan wrote:

  AFAIK, _no_ system supports POSIX 100%,

 AIX and Solaris have certified POSIX support. That's for the latest,
 IEEE Std 1003.1-2001. More systems have certified POSIX support for the
 older POSIX release.

 OTOH, all POSIX releases have optional parts which don't have to be
 implemented.

Yeah, to be POSIX-compliant you just had to document which parts you
didn't implement (comply with).

T


Re: --limit-percent N versus --limit-rate N% ?

2007-10-20 Thread Tony Godshall
On 10/19/07, Matthew Woehlke [EMAIL PROTECTED] wrote:
 Micah Cowan wrote:
  Also: does the current proposed patch deal properly with situations such
  as where the first 15 seconds haven't been taken up by part of a single
  download, but rather several very small ones? I'm not very familiar yet
  with the rate-limiting stuff, so I really have no idea.

 If the point is to limit *your* bandwidth, well it's hard to say,
 although the consensus seems to be that overly conservative is the
 right thing to do, usually. Of course, if the point (as in one suggested
 use case) is to limit the amount of the /server's/ bandwidth consumed,
 then a new percent should be calculated for every host.

 Just some thoughts...

I think it kicks in on each URL but would have to study the code more
thoroughly.

The point is to limit one's consumption of available bandwidth though
upstream defective switches (that are unfair when saturated) and wifi
(which exhibits the same effect).  I was thinking especially of those
who share one's pipe, since that's the choke point in most of my
experience (the DSL modem, the WAN connection, the T-1).

I don't think it helps servers much -- they tend to be on better-grade
switches -- so a per-domain behavior doesn't make sense to me.

TG


Re: Port range option in bind-address implemented?

2007-10-18 Thread Tony Godshall
On 10/18/07, Micah Cowan [EMAIL PROTECTED] wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 Tony Godshall wrote:
  On 10/17/07, Tony Godshall [EMAIL PROTECTED] wrote:
  On 10/17/07, Micah Cowan [EMAIL PROTECTED] wrote:
  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA256
 
  Tony Godshall wrote:
  Well, I'm don't have much to say about about the other points but one
  certainly does not need to keep an array for something like this- with
  the classic pseudorandom shuffle algorithm you only need to keep a
  count of the ones visited.  Shall I pull out my Knuth?
  That... only applies if you actually keep a _queue_ around, of all the
  ports that you plan to try, and shuffle it. Surely that's more waste
  (65,535 shorts, versus 65,535 _bits_), not less? ...We're not shuffling,
  here, we're choosing.
  No, the point was that with a relative prime or two you can walk in a
  pseudorandom pattern though, hitting each point only once needing no
  array at all.
 
  Donald E. Knuth, The Art of Computer Programming, Volume 2, 3rd
  Edition, pp. 17-19

 For the record, this is not what pseudo-random shuffle means to me:
 for instance, http://en.wikipedia.org/wiki/Fisher-Yates_shuffle (aka
 Knuth shuffle), which does in fact require an in-memory set to be
 permuted.

Yeah, well, it's been 23 years since I took Data Structures, so sue me.

And the shuffle you refer to is an attempt at actual randomness, whereas
what I am talking about is explicitly a function of its *pseudo*-randomness-
it's taking advantage of a characteristic (defect?) of an earlier attempt at
real randomness.

 Yes, that appears to work quite well, as long as we seed it right;
 starting with a consistent X₀ would be just as bad as trying them
 sequentially, and choosing something that does not change several times
 a second (such as time()) still makes it likely that multiple
 invocations will choose the same first port. Probably, /dev/random as
 first choice, falling back to by gettimeofday() where that's available.
 I don't know what Windows would use. We could probably use time() as a
 last resort, though I'm not crazy about it; maybe it'd be better to fail
 to support port ranges if there's not a decent seed available, and
 support exact port specifications.

Since implementation for 2^n is relatively easy, I think people
usually write the algorithm to up to twice as many numbers as required
and then skip if out of range.

You know, I bet picking randomly and nonrepetetively from a range is a
common enough task that it's in one of the standard libraries.  If
not, it probably should be.

 Thanks for the suggestion, Tony.

If I have a though, I share.  Too much sometimes ;-)  or so my wife tells me.

Tony


Re: ... --limit-rate nn%

2007-10-17 Thread Tony Godshall
On 10/17/07, Micah Cowan [EMAIL PROTECTED] wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 Tony Godshall wrote:
  About the parser... I'm thinking I can hack the parser that now
  handles the K, M, etc. suffixes so it works as it did before but also
  sees a '%' suffix as valid- that would reduce the amount of code
  necessary to implement --limit-rate nn%.  Any reason not to do so?

 The current parser, and in particular, the actual code that handles, K,
 M, etc, is used by other options, for which percentages are not
 appropriate. Plus, whereas those options have been taking doubles,
 you'll now need some sort of struct to hold information as to whether
 there's a percentage or a direct rate specified.

Yes, that's true.

I guess I'll do a new parser, and to avoid duplicating code, I'll call
the old parser if it doens't have a '%' suffix.

Thanks


Re: wget default behavior [was Re: working on patch to limit to percent of bandwidth]

2007-10-17 Thread Tony Godshall
On 10/17/07, Matthias Vill [EMAIL PROTECTED] wrote:
 Tony Godshall wrote:
  If it was me, I'd have it default to backing off to 95% by default and
  have options for more aggressive behavior, like the multiple
  connections, etc.

 I don't like a default back-off rule. I often encounter downloads with
 often changing download speeds. The idea that the first few seconds I
 only have a quite bad speed and could get much more out of it is just
 not satisfying.

You might be surprised, but I totally agree with you.  A default
backoff rule would only make sense if the measuring was better.  E.g.
a periodic ramp-up/back-off behavior to achieve 95% of the maximum
measured rate.

  I'm surprised multiple connections would buy you anything, though.  I
  guess I'll take a look through the archives and see what the argument
  is.  Does one tcp connection back off on a lost packet and the other
  one gets to keep going?  Hmmm.

 I guess you get improvements if e.g. on your side you have more free
 bandwidth than on the source-side. Having two connections than means,
 that you get almost twice the download speed, because you have two
 connections competing for free bandwidth and ideally every connection
 made two the sever is equally fast.

 So in cases, where you are the only one connecting, you probably win
 nothing.

Ah, I get it.  People want to defeat sender rate-limiting or other QOS
controls.

The opposite of nice.  We could call it --mean-mode.  Or --meanness n,
where n=2 means I want to have two threads/connections, i.e. twice as
mean as the default.

No, well, actually, I guess there can be cases where bad upstream
configurations result in a situation where more connections don't
necessarily mean one is taking more than one's fair share of
bandwidth, but I bet this option will be result in more harm than
good.  Perhaps it should be one of those things that one can do
oneself if one must but is generally frowned upon (like making a
version of wget that ignores robots.txt).

TG


Ignoring robots.txt [was Re: wget default behavior...]

2007-10-17 Thread Tony Godshall
 ... Perhaps it should be one of those things that one can do
 oneself if one must but is generally frowned upon (like making a
 version of wget that ignores robots.txt).

Damn.  I was only joking about ignoring robots.txt, but now I'm
thinking[1] there may be good reasons to do so...  maybe it should be
in mainline wget.

T

[1] 
http://web.archive.org/web/20041013225557/http://www.differentstrings.info/archives/002813.html


Re: Ignoring robots.txt [was Re: wget default behavior...]

2007-10-17 Thread Tony Godshall
 Tony Godshall wrote:
  ... Perhaps it should be one of those things that one can do
  oneself if one must but is generally frowned upon (like making a
  version of wget that ignores robots.txt).
 
  Damn.  I was only joking about ignoring robots.txt, but now I'm
  thinking[1] there may be good reasons to do so...  maybe it should be
  in mainline wget.

 Actually, it is. -e robots=off. :)

 This also turns off obedience to the nofollow attribute sometimes
 found in meta and a tags.

Ah, my ignorance is showing.

I stand corrected.


Re: Port range option in bind-address implemented?

2007-10-17 Thread Tony Godshall
On 10/17/07, Micah Cowan [EMAIL PROTECTED] wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 Oleg Ace wrote:
  Greetings,
 
  Was the feature being discussed here
  http://www.mail-archive.com/wget@sunsite.dk/msg05546.html
  and here
  http://www.mail-archive.com/wget@sunsite.dk/msg05577.html
  ever get implemented?
 
  In other words, is it possible to do:
  wget --bind-address=1.2.3.4:2000-3000 http://...
 
 From trying it out and looking briefly at the code, it would appear it
  is not, but wanted to make sure.
 
  If that is the case, does anyone still have the old patch available,
  or has a similar new one?

 Looking at the threads you indicated, it appears that people were
 generally happy to include the feature, but were unhappy with the
 specific implementation from the patch:

   * parsing of --bind-address belongs in the getopt loop
   * sscanf() should be avoided for use in the parsing.
   * the ports should be chosen from that range at random, rather than
 sequentially, to address an issue pointed out by the Sockets FAQ.

 The third point above introduces its own problems: how many bind()
 attempts should we make before throwing in the towel? Or should we
 attempt every port in that range, keeping an 8k array of bits to track
 which ports we've tried already?

Well, I'm don't have much to say about about the other points but one
certainly does not need to keep an array for something like this- with
the classic pseudorandom shuffle algorithm you only need to keep a
count of the ones visited.  Shall I pull out my Knuth?

 Clearly, whatever approach we take will be _vastly_ less
 efficient/intelligent than the way the OS picks a port for us, and we'll
 need to point these limitations out in the documentation.

 I'm not going to write the code for this (at least not any time soon);
 if someone is interested enough to rewrite the patch to address these
 shortcomings, though, I'll be happy to include it, seeing as how it
 apparently met with Hrvoje's and Mauro's approval (and I see how it
 could be useful as well; though of course its primary use is probably to
 get around broken environments).

 I will submit a low-pri issue for it, in the meantime, in case someone
 sees it and wants to pick it up. :)

 - --
 Micah J. Cowan
 Programmer, musician, typesetting enthusiast, gamer...
 http://micah.cowan.name/

 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.6 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

 iD8DBQFHFm3C7M8hyUobTrERCKYdAJwMSsemuOoWGmDFLxK8vAzxNlXQWQCfV89r
 0gVv77+C2CIlI4lVULFLLC8=
 =xpu6
 -END PGP SIGNATURE-



-- 
Best Regards.
Please keep in touch.


Re: Port range option in bind-address implemented?

2007-10-17 Thread Tony Godshall
On 10/17/07, Micah Cowan [EMAIL PROTECTED] wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 Tony Godshall wrote:
  Well, I'm don't have much to say about about the other points but one
  certainly does not need to keep an array for something like this- with
  the classic pseudorandom shuffle algorithm you only need to keep a
  count of the ones visited.  Shall I pull out my Knuth?

 That... only applies if you actually keep a _queue_ around, of all the
 ports that you plan to try, and shuffle it. Surely that's more waste
 (65,535 shorts, versus 65,535 _bits_), not less? ...We're not shuffling,
 here, we're choosing.

No, the point was that with a relative prime or two you can walk in a
pseudorandom pattern though, hitting each point only once needing no
array at all.


Re: Port range option in bind-address implemented?

2007-10-17 Thread Tony Godshall
On 10/17/07, Tony Godshall [EMAIL PROTECTED] wrote:
 On 10/17/07, Micah Cowan [EMAIL PROTECTED] wrote:
  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA256
 
  Tony Godshall wrote:
   Well, I'm don't have much to say about about the other points but one
   certainly does not need to keep an array for something like this- with
   the classic pseudorandom shuffle algorithm you only need to keep a
   count of the ones visited.  Shall I pull out my Knuth?
 
  That... only applies if you actually keep a _queue_ around, of all the
  ports that you plan to try, and shuffle it. Surely that's more waste
  (65,535 shorts, versus 65,535 _bits_), not less? ...We're not shuffling,
  here, we're choosing.

 No, the point was that with a relative prime or two you can walk in a
 pseudorandom pattern though, hitting each point only once needing no
 array at all.


Donald E. Knuth, The Art of Computer Programming, Volume 2, 3rd
Edition, pp. 17-19

-- 
Best Regards.
Please keep in touch.


Re: Port range option in bind-address implemented?

2007-10-17 Thread Tony Godshall
On 10/17/07, Tony Godshall [EMAIL PROTECTED] wrote:
 On 10/17/07, Tony Godshall [EMAIL PROTECTED] wrote:
  On 10/17/07, Micah Cowan [EMAIL PROTECTED] wrote:
   -BEGIN PGP SIGNED MESSAGE-
   Hash: SHA256
  
   Tony Godshall wrote:
Well, I'm don't have much to say about about the other points but one
certainly does not need to keep an array for something like this- with
the classic pseudorandom shuffle algorithm you only need to keep a
count of the ones visited.  Shall I pull out my Knuth?
  
   That... only applies if you actually keep a _queue_ around, of all the
   ports that you plan to try, and shuffle it. Surely that's more waste
   (65,535 shorts, versus 65,535 _bits_), not less? ...We're not shuffling,
   here, we're choosing.
 
  No, the point was that with a relative prime or two you can walk in a
  pseudorandom pattern though, hitting each point only once needing no
  array at all.
 

 Donald E. Knuth, The Art of Computer Programming, Volume 2, 3rd
 Edition, pp. 17-19

...and probably closer at hand...

http://en.wikipedia.org/wiki/Linear_congruential_generator

TG


Re: ... --limit-rate nn%

2007-10-16 Thread Tony Godshall
On 10/15/07, Matthias Vill [EMAIL PROTECTED] wrote:
 Micah Cowan schrieb:
  Matthias Vill wrote:
  I would appreciate having a --limit-rate N% option.
 
  So now about those broken cases. You could do some least of both
  policy (which would of course still need the time to do measuring and
  can cut only afterwards).
  Or otherwise you could use a non-percent value as a minimum. This would
  be especially useful if you add it to your default options and stumble
  over some slow server only serving you 5KiB/s, where you most probably
  don't want to further lower the speed on your side.
 
  As third approach you would only use the last limiting option.
 
  Depending on how difficult the implementation is I would vote for the
  second behavior, although the first or third option might be more
  intuitive to some of the users not reading the docs.
 
  Third option should be more intuitive to the implementer, too. I vote
  for that, as I really want to avoid putting too much sophistication into
  this.

 I would expect, that you need to variables for holding percent/fixed
 values anyway so I was just wondering whether you could do it as I
 suggested secondly.
 IMHO that should be quite easy to do by a single
 if(fixedpercent)limit=max(a,b) and thus not even result in any
 overhead during actual download.

 Greetings

 Matthias

 P.S.: I'm subscribed via news://sunsite.dk, you don't need to CC me.


Thanks for the input, guys.

I'll see what I can do.

About the parser... I'm thinking I can hack the parser that now
handles the K, M, etc. suffixes so it works as it did before but also
sees a '%' suffix as valid- that would reduce the amount of code
necessary to implement --limit-rate nn%.  Any reason not to do so?


-- 
Best Regards.
Please keep in touch.


Re: wget default behavior [was Re: working on patch to limit to percent of bandwidth]

2007-10-16 Thread Tony Godshall
On 10/13/07, Micah Cowan [EMAIL PROTECTED] wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256


  On 10/13/07, Tony Godshall [EMAIL PROTECTED] wrote:
  OK, so let's go back to basics for a moment.
 
  wget's default behavior is to use all available bandwidth.
 
  Is this the right thing to do?
 
  Or is it better to back off a little after a bit?

 Heh. Well, some people are saying that Wget should support accelerated
 downloads; several connections to download a single resource, which can
 sometimes give a speed increase at the expense of nice-ness.

 So you could say we're at a happy medium between those options! :)

 Actually, Wget probably will get support for multiple simultaneous
 connections; but number of connections to one host will be limited to a
 max of two.

 It's impossible for Wget to know how much is appropriate to back off,
 and in most situations I can think of, backing off isn't appropriate.

 In general, though, I agree that Wget's policy should be nice by default.

If it was me, I'd have it default to backing off to 95% by default and
have options for more aggressive behavior, like the multiple
connections, etc.

I'm surprised multiple connections would buy you anything, though.  I
guess I'll take a look through the archives and see what the argument
is.  Does one tcp connection back off on a lost packet and the other
one gets to keep going?  Hmmm.

 Josh Williams wrote:
  That's one of the reasons I believe this
  should be a module instead, because it's more or less a hack to patch
  what the environment should be doing for wget, not vice versa.

 At this point, since it seems to have some demand, I'll probably put it
 in for 1.12.x; but I may very well move it to a module when we have
 support for that.

Thanks, yes that makes sense.

 Of course, Tony G indicated that he would prefer it to be
 conditionally-compiled, for concerns that the plugin architecture will
 add overhead to the wget binary. Wget is such a lightweight app, though,
 I'm not thinking that the plugin architecture is going to be very
 significant. It would be interesting to see if we can add support for
 some modules to be linked in directly, rather than dynamically; however,
 it'd still probably have to use the same mechanisms as the normal
 modules in order to work. Anyway, I'm sure we'll think about those
 things more when the time comes.

Makes sense.

 Or you could be proactive and start work on
 http://wget.addictivecode.org/FeatureSpecifications/Plugins
 (non-existent, but already linked to from FeatureSpecifications). :)

I'll look into that.

 On 10/14/07, Hrvoje Niksic [EMAIL PROTECTED] wrote:
  Tony Godshall [EMAIL PROTECTED] writes:
 
   OK, so let's go back to basics for a moment.
  
   wget's default behavior is to use all available bandwidth.
 
  And so is the default behavior of curl, Firefox, Opera, and so on.
  The expected behavior of a program that receives data over a TCP
  stream is to consume data as fast as it arrives.

What was your point exactly?  All the other kids do it?

Tony G


Re: PATCHES file removed

2007-10-15 Thread Tony Godshall
On 10/13/07, Micah Cowan [EMAIL PROTECTED] wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 Hrvoje Niksic wrote:
  Micah Cowan [EMAIL PROTECTED] writes:
 
  FYI, I've removed the PATCHES file. Not because I don't think it's
  useful, but because the information needed updating (now that we're
  using Mercurial rather than Subversion), I expect it to be updated
  again from time to time, and the Wgiki seems to be the right place
  to keep changing documentation
  (http://wget.addictivecode.org/PatchGuidelines).
 
  It's still obviously useful to have patch-submission information
  included as part of the Wget distribution itself;
 
  It would be nice for the distribution to contain that URL on a
  prominent place, such as in the README, or even a stub PATCHES file.

 It's in NEWS, but putting it in README can't hurt.

Hey, that's a handy link.  I'll follow it for my next rev (--limit-rate nn%)

I had read the README file but I'm not in the habit of looking for a NEWS file.

  Speaking of which, I've replaced the MAILING-LISTS file,
  regenerating it from the Mailing Lists section of the Texinfo
  manual. I suspect it had previously been generated from source, but
  it's not clear to me from what (perhaps the web page?), or what tool
  was used.
 
  It was simply hand-written.  :-)

 Oh, yeah, I don't want to do that in three places then (MAILING-LISTS,
 Wgiki, and manual)!

 It had a right-aligned -*- text -*- thing at the top, so I was
 thinking that was an indication of having been generated.

 - --
 Micah J. Cowan
 Programmer, musician, typesetting enthusiast, gamer...
 http://micah.cowan.name/

 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.6 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

 iD8DBQFHERDH7M8hyUobTrERCKqOAKCGuapIPLSYLpDktbteDDYyU2I2AgCfRWs9
 iznnPJ4ejopsaSgeY/APk78=
 =GHTD
 -END PGP SIGNATURE-



-- 
Best Regards.
Please keep in touch.


Re: wget default behavior [was Re: working on patch to limit to percent of bandwidth]

2007-10-14 Thread Tony Godshall
On 10/13/07, Josh Williams [EMAIL PROTECTED] wrote:
 On 10/13/07, Tony Godshall [EMAIL PROTECTED] wrote:
  Well, you may have such problems but you are very much reaching in
  thinking that my --linux-percent has anything to do with any failing
  in linux.
 
  It's about dealing with unfair upstream switches, which, I'm quite
  sure, were not running Linux.
 
  Let's not hijack this into a linux-bash.

 I really don't know what you were trying to say here...

You seemed to think --limit-percent was a solution for a misbehavior of linux.

My experience with linux networking is that it's very effective and
that upstream non-linux switches don't handle such an effective client
well.

When a linux box is my gateway/firewall I don't experience
single-client monopolization at all.

As to your linux issues, that's a topic that should probably discussed
in another forum, but I will say that I'm quite happy with the latest
Linux kernels- with the low-latency patch integrated and enabled my
desktop experience is quite snappy, even on this four-year-old 1.2GHz
laptop.  And stay away from the distro server kernels- they are
optimized for throughput at the cost of latency- they do their I/O in
bigger chunks.  And stay away from the RT kernels- they go too far in
giving I/O priority over everything else and end up churning on IRQs
unless they are very carefully tuned.

And no, I won't call the linux kernel GNU/Linux, if that was what you
were after.  The kernel is after all the one Linux thing in a
GNU/Linux system.

 .. I use GNU/Linux.

Anyone try Debian GNU/BSD yet?  Or Debian/Nexenta/GNU/Solaris?

-- 
Best Regards.
Please keep in touch.


Re: wget default behavior

2007-10-14 Thread Tony Godshall
On 10/14/07, Hrvoje Niksic [EMAIL PROTECTED] wrote:
 Tony Godshall [EMAIL PROTECTED] writes:

  OK, so let's go back to basics for a moment.
 
  wget's default behavior is to use all available bandwidth.

 And so is the default behavior of curl, Firefox, Opera, and so on.
 The expected behavior of a program that receives data over a TCP
 stream is to consume data as fast as it arrives.

Yup.


wget default behavior [was Re: working on patch to limit to percent of bandwidth]

2007-10-13 Thread Tony Godshall
OK, so let's go back to basics for a moment.

wget's default behavior is to use all available bandwidth.

Is this the right thing to do?

Or is it better to back off a little after a bit?

Tony


Re: working on patch to limit to percent of bandwidth

2007-10-13 Thread Tony Godshall
On 10/12/07, Hrvoje Niksic [EMAIL PROTECTED] wrote:
 Tony Godshall [EMAIL PROTECTED] writes:

  My point remains that the maximum initial rate (however you define
  initial in a protocol as unreliable as TCP/IP) can and will be
  wrong in a large number of cases, especially on shared connections.
 
  Again, would an algorithm where the rate is re-measured periodically
  and the initial-rate-error criticism were therefore addressed reduce
  your objection to the patch?

 Personally I don't see the value in attempting to find out the
 available bandwidth automatically.

You keep saying that.

 It seems too error prone, no
 matter how much heuristics you add into it.

But like someone said- the error is always in the nice direction.

  --limit-rate works
 because reading the data more slowly causes it to (eventually) also be
 sent more slowly.  --limit-percentage is impossible to define in
 precise terms, there's just too much guessing.

My patch --limit-percent does exactly the same thing except without
requiring foreknowledge.


-- 
Best Regards.
Please keep in touch.


Re: wget default behavior [was Re: working on patch to limit to percent of bandwidth]

2007-10-13 Thread Tony Godshall
On 10/13/07, Josh Williams [EMAIL PROTECTED] wrote:
 On 10/13/07, Tony Godshall [EMAIL PROTECTED] wrote:
  OK, so let's go back to basics for a moment.
 
  wget's default behavior is to use all available bandwidth.
 
  Is this the right thing to do?
 
  Or is it better to back off a little after a bit?
 
  Tony

 IMO, this should be handled by the operating system, not the
 individual applications. That's one of the reasons I believe this
 should be a module instead, because it's more or less a hack to patch
 what the environment should be doing for wget, not vice versa.

 In my experience, GNU/Linux tends to consume all the resources
 unbiasedly, seemingly on a first come first serve *until you're...

Well, you may have such problems but you are very much reaching in
thinking that my --linux-percent has anything to do with any failing
in linux.

It's about dealing with unfair upstream switches, which, I'm quite
sure, were not running Linux.

Let's not hijack this into a linux-bash.

-- 
Best Regards.
Tony


Re: wget default behavior [was Re: working on patch to limit to percent of bandwidth]

2007-10-13 Thread Tony Godshall
On 10/13/07, Micah Cowan [EMAIL PROTECTED] wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256


  On 10/13/07, Tony Godshall [EMAIL PROTECTED] wrote:
  OK, so let's go back to basics for a moment.
 
  wget's default behavior is to use all available bandwidth.
 
  Is this the right thing to do?
 
  Or is it better to back off a little after a bit?

 Heh. Well, some people are saying that Wget should support accelerated
 downloads; several connections to download a single resource, which can
 sometimes give a speed increase at the expense of nice-ness.

 So you could say we're at a happy medium between those options! :)

 Actually, Wget probably will get support for multiple simultaneous
 connections; but number of connections to one host will be limited to a
 max of two.

 It's impossible for Wget to know how much is appropriate to back off,
 and in most situations I can think of, backing off isn't appropriate.

 In general, though, I agree that Wget's policy should be nice by default.

Yeah, thanks, that's what I was trying to get at.

Wget should be aggressive iff you tell it to be, and otherwise should be nice.

In the presense of bad upstream switches I've found that even a
--limit-rate of 95% is way more tolerable to others than the default
100% utilization.

 Josh Williams wrote:
  That's one of the reasons I believe this
  should be a module instead, because it's more or less a hack to patch
  what the environment should be doing for wget, not vice versa.

 At this point, since it seems to have some demand, I'll probably put it
 in for 1.12.x; but I may very well move it to a module when we have
 support for that.

 Of course, Tony G indicated that he would prefer it to be
 conditionally-compiled, for concerns that the plugin architecture will
 add overhead to the wget binary. Wget is such a lightweight app, though,
 I'm not thinking that the plugin architecture is going to be very
 significant. It would be interesting to see if we can add support for
 some modules to be linked in directly, rather than dynamically; however,
 it'd still probably have to use the same mechanisms as the normal
 modules in order to work. Anyway, I'm sure we'll think about those
 things more when the time comes.

Good point.  I guess someone who wants an ultralightweight wget will
use the one in busybox instead of the normal one.

 Or you could be proactive and start work on
 http://wget.addictivecode.org/FeatureSpecifications/Plugins
 (non-existent, but already linked to from FeatureSpecifications). :)

Interesting.  I'll take a look.

 - --
 Micah J. Cowan
 Programmer, musician, typesetting enthusiast, gamer...
 http://micah.cowan.name/

 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.6 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

 iD8DBQFHESNi7M8hyUobTrERCChSAJ90KmWelT0bH9qQMlArapEdn1ocSACfRHcK
 JJmV8QaqcnKTRYam/v0/lwg=
 =TPsw
 -END PGP SIGNATURE-



-- 
Best Regards.
Please keep in touch.


Re: working on patch to limit to percent of bandwidth

2007-10-12 Thread Tony Godshall
On 10/12/07, Hrvoje Niksic [EMAIL PROTECTED] wrote:
 Tony Godshall [EMAIL PROTECTED] writes:

   available bandwidth and adjusts to that.  The usefullness is in
   trying to be unobtrusive to other users.
 
  The problem is that Wget simply doesn't have enough information to be
  unobtrusive.  Currently available bandwidth can and does change as new
  downloads are initiated and old ones are turned off.  Measuring
  initial bandwidth is simply insufficient to decide what bandwidth is
  really appropriate for Wget; only the user can know that, and that's
  what --limit-rate does.
 
  My patch (and the doc change in my patch) don't claim to be totally
  unobtrusive [...] Obviously people who the level of unobtrusiveness
  you define shouldn't be using it.

 It was never my intention to define a particular level of
 unobtrusiveness; the concept of being unobtrusive to other users was
 brought up by Jim and I was responding to that.  My point remains that
 the maximum initial rate (however you define initial in a protocol
 as unreliable as TCP/IP) can and will be wrong in a large number of
 cases, especially on shared connections.

Again, would an algorithm where the rate is re-measured periodically
and the initial-rate-error criticism were therefore addressed reduce
your objection to the patch?  Perhaps you can answer each idea I gave
separately:

a) full speed downloads (which re-measure channel capacity) followed
by long sleeps

b) speed ramps up to peak and then back down

 Not only is it impossible to
 be totally unobtrusive, but any *automated* attempts at being nice
 to other users are doomed to failure, either by taking too much (if
 the download starts when you're alone) or too little (if the download
 starts with shared connection).

Again, I do not claim to be unobtrusive.  Merely to reduce
obtrusiveness.  I do not and cannot claim to be making wget *nice*,
just nicER.

You can't deny that dialing back is nicer than not.

-- 
Best Regards.
Please keep in touch.


Re: working on patch to limit to percent of bandwidth

2007-10-12 Thread Tony Godshall
On 10/12/07, Josh Williams [EMAIL PROTECTED] wrote:
 On 10/12/07, Tony Godshall [EMAIL PROTECTED] wrote:
  Again, I do not claim to be unobtrusive.  Merely to reduce
  obtrusiveness.  I do not and cannot claim to be making wget *nice*,
  just nicER.
 
  You can't deny that dialing back is nicer than not.

 Personally, I think this is a great idea. But I do agree that the
 documentation is a bit messy right now (as well as the code). If this
 doesn't make it into the current trunk, I think it'd make a great
 module in version 2.

Thanks for the support


Re: anyone look at the actual patch? anyone try it? [Re: working on patch to limit to percent of bandwidth]

2007-10-12 Thread Tony Godshall
...
  I guess I'd like to see compile-time options so people could make a
  tiny version for their embedded system, with most options and all
  documentation stripped out, and a huge kitchen-sink all-the-bells
  version and complete documentation for the power user version.  I
  don't think you have to go to a totally new (plug in) architecture or
  make the hard choices.

[Jim]
 Well, we need the plugin architecture anyway. There are some planned
 features (JavaScript and MetaLink support being the main ones) that have
 no business in Wget proper, as far as I'm concerned, but are inarguably
 useful.

  I know when I put an app into an embedded app, I'd rather not even
  have the overhead of the plug-in mechanism, I want it smaller than
  that.

 You have a good point regarding customized compilation, though I think
 that most of the current features in Wget belong as core features. There
 are some small exceptions (egd sockets).

Thanks.

Well, when I'm building an embedded device, I look at the invocations
wget that are actually being called in the scripts.  Since the end
product has no interactive shell, I don't need to have all those extra
options enabled!  In fact, in wget's case, one can often dispense with
the tool entirely- the busybox version suffices.

  ... And when I'm running the gnu version of something I expect it
  to have verbose man pages and lots of double-dash options, that's what
  tools like less and grep are for.

 Well... many GNU tools actually lack verbose man pages, particularly
 since info is the preferred documentation system for GNU software.

Well, I guess I'm spoiled by Debian.  If it ain't broke, don't fix it.
 Debian makes man pages because tools should have manpages.  IIRC,
that was one of the divorce issues.

 Despite the fact that many important GNU utilities are very
 feature-packed, they also tend not to have options that are only useful
 to a relatively small number of people--particularly when equivalent
 effects are possible with preexisting options.

 As to the overhead of the plugin mechanism, you're right, and I may well
 decide to make that optionally compiled.

Well, I'd rather have rate-limiting things be optionally compiled than
plugged-in, since they'd be useful for embedded devices.

[Micah]
  It's not really about this option, it's about a class of options. I'm in
  the unenviable position of having to determine whether small patches
  that add options are sufficiently useful to justify the addition of the
  option. Adding one new option/rc command is not a problem. But when,
  over time, fifty people suggest little patches that offer options with
  small benefits, we've suddenly got fifty new options cluttering up the
  documentation and --help output.

[Jim]
  I would posit that the vast majority of wget options are used in some
  extremely small percentage of wget invocations.  Should they be removed?

[Micah]
 Such as which ones?

 I don't think we're talking about the same extremely small percentages.

OK, so so far there are three of us, I think, that find it potentially
useful.  And you have not addressed the use cases I brought up.  So I
think your extremely small percentages assumption may be faulty.

 Looking through the options listed with --help, I can find very few
 options that I've never used or would not consider vital in some
 situations I (or someone else) might encounter.

 This doesn't look to me like a vital function, one that a large number
 of users will find mildly useful, or one that a mild number of users
 will find extremely useful. This looks like one that a mild number of
 users will find mildly useful. Only slightly more useful, in fact, than
 what is already done.

You keep saying that.  You seem to think unknown upstream bandwidth is
a rare thing.  Or that wanting to be nice to other bandwidth users in
such a circumstance is a rare thing.  I wish I lived in your universe.
 Mine's a lot more sloppy.

 It's also one of those fuzzy features that addresses a scenario that
 has no right solution (JavaScript support is in that domain). These
 sorts of features tend to invite a gang of friends to help get a little
 bit closer to the unreachable target. For instance, if we include this
 option, then the same users will find another option to control the
 period of time spent full-bore just as useful. A pulse feature might
 be useful, but then you'll probably want an option to control the
 spacing between those, too. And someone else may wish to introduce an
 option that saves bandwidth information persistently, and uses this to
 make a good estimate from the beginning.

Ah, finally, some meat.  You see this as opening a door.  Especially
as I inquire as too whether anyone has feedback on my implementation,
you see it mushrooming into a plethora of options.

 And all of this would amount to a very mild improvement over what
 already exists.

Your universe crisp, mine sloppy (above) ;-)

  In my view, wget is a useful and flexible 

Re: working on patch to limit to percent of bandwidth

2007-10-11 Thread Tony Godshall
On 10/10/07, Micah Cowan [EMAIL PROTECTED] wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 Tony Godshall wrote:
  The scenario I was picturing was where you'd want to make sure some
  bandwidth was left available so that unfair routers wouldn't screw
  your net-neighbors.  I really don't see this as an attempt to be
  unobtrusive at all.  This is not an attempt to hide one's traffic,
  it's an attempt to not overwhelm in the presence of unfair switching.
  If I say --limit-pct 75% and the network is congested, yes, what I
  want is to use no more than 75% of the available bandwidth, not the
  total bandwidth.  So, yes, if the network is more congensted just now,
  then let this download get a lower bitrate, that's fine.

 I'm pretty sure that's what Jim meant by being unobtrusive; it surely
 had nothing to do with traffic-hiding.

 My current impression is that this is a useful addition for some limited
 scenarios, but not particularly more useful than --limit-rate already
 is. That's part of what makes it a good candidate as a plugin.

I guess I don't see how picking a reasonable rate automatically is
less useful then having to know what the maximum upstream bandwidth is
ahead of time.  (If the argument is about the rare case where maxing
out the download even briefly is unacceptable, then the whole
technique wget uses is really not appropriate- even limit-rate does
not back off till it has retrieved enough bytes to start measuring and
then they come in as a burst- that's the nature of starting a TCP
connection.)


anyone look at the actual patch? anyone try it? [Re: working on patch to limit to percent of bandwidth]

2007-10-11 Thread Tony Godshall
On 10/11/07, Micah Cowan [EMAIL PROTECTED] wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 Tony Godshall wrote:
  On 10/10/07, Micah Cowan [EMAIL PROTECTED] wrote:
  My current impression is that this is a useful addition for some limited
  scenarios, but not particularly more useful than --limit-rate already
  is. That's part of what makes it a good candidate as a plugin.
 
  I guess I don't see how picking a reasonable rate automatically is
  less useful then having to know what the maximum upstream bandwidth is
  ahead of time.

 I never claimed it was less useful. In fact, I said it was more useful.
 My doubt is as to whether it is _significantly_ more useful.

For me, yes.  For you, apparently not.  It's a small patch, really.
Did you even look at it?

 I'm still open, just need more convincing.

Well, I've said my piece.

Anyone want to comment on the actual code?

Anybody try it?


Re: anyone look at the actual patch? anyone try it? [Re: working on patch to limit to percent of bandwidth]

2007-10-11 Thread Tony Godshall
...
 I have, yes. And yes, it's a very small patch. The issue isn't so much
 about the extra code or code maintenance; it's more about extra
 documentation, and avoiding too much clutter of documentation and lists
 of options/rc-commands. I'm not very picky about adding little
 improvements to Wget; I'm a little pickier about adding new options.

 It's not really about this option, it's about a class of options. I'm in
 the unenviable position of having to determine whether small patches
 that add options are sufficiently useful to justify the addition of the
 option. Adding one new option/rc command is not a problem. But when,
 over time, fifty people suggest little patches that offer options with
 small benefits, we've suddenly got fifty new options cluttering up the
 documentation and --help output.

Would it be better, then, if I made it --limit-rate nn% instead of
limit-percent nn?
And made the descrip briefer?

  If the benefits are such that only a
 handful of people will ever use any of them, then they may not have been
 worth the addition, and I'm probably not doing my job properly. ...

I guess I'd like to see compile-time options so people could make a
tiny version for their embedded system, with most options and all
documentation stripped out, and a huge kitchen-sink all-the-bells
version and complete documentation for the power user version.  I
don't think you have to go to a totally new (plug in) architecture or
make the hard choices.

I know when I put an app into an embedded app, I'd rather not even
have the overhead of the plug-in mechanism, I want it smaller than
that.  And when I'm running the gnu version of something I expect it
to have verbose man pages and lots of double-dash options, that's what
tools like less and grep are for.

Tony


Re: anyone look at the actual patch? anyone try it? [Re: working on patch to limit to percent of bandwidth]

2007-10-11 Thread Tony Godshall
On 10/11/07, Tony Godshall [EMAIL PROTECTED] wrote:
 ...
  I have, yes. And yes, it's a very small patch. The issue isn't so much
  about the extra code or code maintenance; it's more about extra
  documentation, and avoiding too much clutter of documentation and lists
  of options/rc-commands. I'm not very picky about adding little
  improvements to Wget; I'm a little pickier about adding new options.
 
  It's not really about this option, it's about a class of options. I'm in
  the unenviable position of having to determine whether small patches
  that add options are sufficiently useful to justify the addition of the
  option. Adding one new option/rc command is not a problem. But when,
  over time, fifty people suggest little patches that offer options with
  small benefits, we've suddenly got fifty new options cluttering up the
  documentation and --help output.

 Would it be better, then, if I made it --limit-rate nn% instead of
 limit-percent nn?
 And made the descrip briefer?

Also would it help if the behavior was changed so it pulsed
occasionally and therefore wouldn't suffer from the
initial-measurement-error case.

I'm trying to judge whether I should spend more time touching it up
into something acceptable or just let it remain a personal hack.

   If the benefits are such that only a
  handful of people will ever use any of them, then they may not have been
  worth the addition, and I'm probably not doing my job properly. ...

 I guess I'd like to see compile-time options so people could make a
 tiny version for their embedded system, with most options and all
 documentation stripped out, and a huge kitchen-sink all-the-bells
 version and complete documentation for the power user version.  I
 don't think you have to go to a totally new (plug in) architecture or
 make the hard choices.

 I know when I put an app into an embedded app, I'd rather not even
 have the overhead of the plug-in mechanism, I want it smaller than
 that.  And when I'm running the gnu version of something I expect it
 to have verbose man pages and lots of double-dash options, that's what
 tools like less and grep are for.

 Tony



-- 
Best Regards.
Please keep in touch.


Re: working on patch to limit to percent of bandwidth

2007-10-10 Thread Tony Godshall
   - --limit-rate will find your version handy, but I want to hear from
   them. :)

   I would appreciate and have use for such an option.  We often access
   instruments in remote locations (think a tiny island in the Aleutians)
   where we share bandwidth with other organizations.

  A limitation in percentage doesn't make sense if you don't know
  exactly how much bandwidth is available.  Trying to determine full
  bandwidth and backing off from there is IMHO doomed to failure because
  the initial speed Wget gets can be quite different from the actual
  link bandiwdth, at least in a shared link scenario.  A --limit-percent
  implemented as proposed here would only limit the retrieval speed to
  the specified fraction of the speed Wget happened to get at the
  beginning of the download.  That is not only incorrect, but also quite
  non-deterministic.

  If there were way to query the network for the connection speed, I
  would support the limit-percent idea.  But since that's not
  possible, I think it's better to stick with the current --limit-rate,
  where we give the user an option to simply tell Wget how much
  bandwidth to consume.

 I think there is still a case for attempting percent limiting.  I agree
 with your point that we can not discover the full bandwidth of the
 link and adjust to that.  The approach discovers the current available
 bandwidth and adjusts to that.  The usefullness is in trying to be
 unobtrusive to other users.

Network conditions do change and my initial all-out estimate is
certainly not ideal but it works in many situations.

Another way would be to transfer in a more bursty mode rather than
meter at small granularity- that way one could measure the rate of
each burst an one could take the channel capacity.  I worry that that
might be more harmful to those sharing channel in cases like Hvroje's
than the initial-burst measurement, and in fact have am thinking that
one could cache the initial-measure value and use it for future
connections.

An alternative to a bursty mode would be to start at full speed and
then ramp down till we hit the desired rate and then ramp back up till
we bump the limit and down again.  That way we could update the max
rate estimate periodically and recover from any error that might have
occurred in the initial estimate.  Any thoughts on this behavior?
Less harmful?  More?

Tony


Re: working on patch to limit to percent of bandwidth

2007-10-10 Thread Tony Godshall
 ... I worry that that might be more harmful to those sharing channel in cases 
 like Hvroje's ...

Sorry, Hvroje, Jim, I meant Jim's case.

Tony


Re: working on patch to limit to percent of bandwidth

2007-10-10 Thread Tony Godshall
 Jim Wright wrote:
  I think there is still a case for attempting percent limiting.  I agree
  with your point that we can not discover the full bandwidth of the
  link and adjust to that.  The approach discovers the current available
  bandwidth and adjusts to that.  The usefullness is in trying to be
  unobtrusive to other users.

 Does it really fit that description, though? Given that it runs
 full-bore for 15 seconds (not that that's very long)...

I guess it depends on the type of users you are sharing with and the
upstream switches and routers.

My experience is that with some routers and switches a single user
wget'ing an iso can cause web-browsing people to experience slow
response.  That kind of application is not one where 15sec will make
much difference, and in fact there's a big backoff after that first
15sec.

OTOH if you are sharing with latency-sensitive apps (VOIP, realtime
control, etc.) and a wget bogs your app down, you better fix your
switches and routers- you will be affected by anybody in an
interactive web browser streaming youtube or whatever too.  This patch
is not a solution for that use case, and I agree that there really
isn't one that an app like wget can reasonably implement (without
delving into nonportable OS stuff).

Tony


Re: working on patch to limit to percent of bandwidth

2007-10-10 Thread Tony Godshall
On 10/10/07, Tony Lewis [EMAIL PROTECTED] wrote:
 Hrvoje Niksic wrote:

  Measuring initial bandwidth is simply insufficient to decide what
  bandwidth is really appropriate for Wget; only the user can know
  that, and that's what --limit-rate does.

 The user might be able to make a reasonable guess as to the download rate if
 wget reported its average rate at the end of a session. That way the user
 can collect rates over time and try to give --limit-rate a reasonable value.

 Tony [L]

It reports rate during, and on my connections it's constant enough to
make a good estimate for --limit-rate.  I just automated that, really.

Tony G


Re: working on patch to limit to percent of bandwidth

2007-10-10 Thread Tony Godshall
Indeed.
On 10/10/07, Hrvoje Niksic [EMAIL PROTECTED] wrote:
 Jim Wright [EMAIL PROTECTED] writes:

  I think there is still a case for attempting percent limiting.  I
  agree with your point that we can not discover the full bandwidth of
  the link and adjust to that.  The approach discovers the current
  available bandwidth and adjusts to that.  The usefullness is in
  trying to be unobtrusive to other users.

 The problem is that Wget simply doesn't have enough information to be
 unobtrusive.  Currently available bandwidth can and does change as new
 downloads are initiated and old ones are turned off.  Measuring
 initial bandwidth is simply insufficient to decide what bandwidth is
 really appropriate for Wget; only the user can know that, and that's
 what --limit-rate does.

My patch (and the doc change in my patch) don't claim to be totally
unobtrusive - it has a particular behavior that's documented, which is
to try to be less obtrusive than your typical
get-it-for-me-right-now-as-fast-as-you-can download.

Obviously people who the level of unobtrusiveness you define shouldn't
be using it.  Then again, people who require that level probably need
to get routers implement a little more fairness or QOS- that don't
let one TCP connection lock out other users.

My patch just does automatically what I used to do manually- start
obtrusive and then scale back to less obtrusive for the rest of the
download.

Even competent non-sys-admin people often are not appraised of the
technical details of the networks they use, but they may still want to
be reasonably nice, for example, to the other people using the wifi at
the cybercafe.

It's certainly a step above the naive behavior- the naive user doesn't
even know then (their typical tools, like MSIE don't even tell them!)

Tony G


Re: working on patch to limit to percent of bandwidth

2007-10-10 Thread Tony Godshall
  I think there is still a case for attempting percent limiting.  I
  agree with your point that we can not discover the full bandwidth of
  the link and adjust to that.  The approach discovers the current
  available bandwidth and adjusts to that.  The usefullness is in
  trying to be unobtrusive to other users.

  The problem is that Wget simply doesn't have enough information to be
  unobtrusive.  Currently available bandwidth can and does change as new
  downloads are initiated and old ones are turned off.  Measuring
  initial bandwidth is simply insufficient to decide what bandwidth is
  really appropriate for Wget; only the user can know that, and that's
  what --limit-rate does.

 So far, I'm inclined to agree.

 For instance, if one just sticks limit_percent = 25 in their wgetrc,
 then on some occasions, Wget will limit to far too _low_ a rate, when
 most of the available bandwidth is already being consumed by other things.
...

Well, if you are using 25% then you are trying to be *really* nice and
there's no such thing as far too low, is there.

The scenario I was picturing was where you'd want to make sure some
bandwidth was left available so that unfair routers wouldn't screw
your net-neighbors.  I really don't see this as an attempt to be
unobtrusive at all.  This is not an attempt to hide one's traffic,
it's an attempt to not overwhelm in the presence of unfair switching.
If I say --limit-pct 75% and the network is congested, yes, what I
want is to use no more than 75% of the available bandwidth, not the
total bandwidth.  So, yes, if the network is more congensted just now,
then let this download get a lower bitrate, that's fine.

Tony


not dominating bandwidth caching a value [Re: ... patch to limit to percent of bandwidth]

2007-10-09 Thread Tony Godshall
 [private response to limit list clutter]

or not.  oops.

...
 Note though that my patch *does* dominate the bandwidth for about 15 seconds
 to measure the available bandwidth before it falls back.  On my
 network, it seemed
 to take a few seconds before enough bytes were transferred to get a reasonable
 measure.

What I'd actually like to do is fold the argument into the limit-rate option,
like so...

  --limit-rate 20K means limit rate to 20KBps
  --limit-rate 20% means limit average rate to 20% of initial measured bandwidth

I wonder if it would make sense to cache the measured rate so that the
second time
you limit by percentage it skips the measurement step and thus the temporary
domination.  I guess then I'd also want to provide a --measure-rate
option to force
re-measurement when network changes.

Are there any reasons not to cache like that?  Does wget currently
save any state
someplace or would I need to implement that whole bit?

Tony


Initial draft- patch to limit bandwidth by percent of measured rate

2007-10-08 Thread Tony Godshall
Please find attached...

The quick test:

If you run wget with --limit-percent 50, you should see it run at full
blast for 15 seconds and then back off till it's downloading at 50%
the rate it acheived in the first 15 seconds.

This is only the initial Works For Me version of the patch.  Comments
welcome, as anyone else who wants to run with it is welcome to do so.

Best Regards.
Tony

PS to Micah: Yes, I changed pct to percent.
diff --git a/src/init.c b/src/init.c
--- a/src/init.c
+++ b/src/init.c
@@ -179,6 +179,7 @@ static const struct {
 #endif
   { input,opt.input_filename,cmd_file },
   { keepsessioncookies, opt.keep_session_cookies, cmd_boolean },
+  { limitpercent, opt.limit_percent, cmd_number },
   { limitrate,opt.limit_rate,cmd_bytes },
   { loadcookies,  opt.cookies_input, cmd_file },
   { logfile,  opt.lfilename, cmd_file },
diff --git a/src/main.c b/src/main.c
--- a/src/main.c
+++ b/src/main.c
@@ -189,6 +189,7 @@ static struct cmdline_option option_data
 { input-file, 'i', OPT_VALUE, input, -1 },
 { keep-session-cookies, 0, OPT_BOOLEAN, keepsessioncookies, -1 },
 { level, 'l', OPT_VALUE, reclevel, -1 },
+{ limit-percent, 0, OPT_VALUE, limitpercent, -1 },
 { limit-rate, 0, OPT_VALUE, limitrate, -1 },
 { load-cookies, 0, OPT_VALUE, loadcookies, -1 },
 { max-redirect, 0, OPT_VALUE, maxredirect, -1 },
@@ -453,6 +454,10 @@ Download:\n),
   -Q,  --quota=NUMBERset retrieval quota to NUMBER.\n),
 N_(\
--bind-address=ADDRESSbind to ADDRESS (hostname or IP) on local host.\n),
+N_(\
+   --limit-percent=NUMBERlimit download rate to NUMBER percent of measured initial burst\n),
+N_(\
+ or rate specified by --limit-rate\n),
 N_(\
--limit-rate=RATE limit download rate to RATE.\n),
 N_(\
diff --git a/src/options.h b/src/options.h
--- a/src/options.h
+++ b/src/options.h
@@ -115,6 +115,8 @@ struct options
   double waitretry;		/* The wait period between retries. - HEH */
   bool use_robots;		/* Do we heed robots.txt? */
 
+  wgint limit_percent;		/* Limit the download rate to this percentage
+   of initial measured burst rate. */
   wgint limit_rate;		/* Limit the download rate to this
    many bps. */
   SUM_SIZE_INT quota;		/* Maximum file size to download and
diff --git a/src/retr.c b/src/retr.c
--- a/src/retr.c
+++ b/src/retr.c
@@ -86,14 +86,61 @@ limit_bandwidth (wgint bytes, struct pti
 limit_bandwidth (wgint bytes, struct ptimer *timer)
 {
   double delta_t = ptimer_read (timer) - limit_data.chunk_start;
-  double expected;
+  double expected= 0.0;
 
   limit_data.chunk_bytes += bytes;
 
+  static wgint measured_limit= 0;
+
+  wgint limit= 0;
+
+  if ( opt.limit_rate )
+  {
+limit= opt.limit_rate;
+
+if ( opt.limit_percent )
+{
+  limit= limit * opt.limit_percent / 100;
+}
+DEBUGP((fixed limit governs: %lld bps\n, limit));
+  }
+  else
+  if ( opt.limit_percent )
+  {
+if ( ! measured_limit )
+{
+  static double total_sec= 0.0;
+  static wgint total_bytes= 0;
+
+  total_sec += delta_t;
+  total_bytes += bytes;
+  const double MEASURE_SEC= 15.0;
+
+  if ( total_sec  MEASURE_SEC )
+  {
+DEBUGP((After %.3f seconds we saw %lld bytes so our measured limit is %lld bps\n,
+total_sec / 1000, total_bytes, measured_limit ));
+measured_limit = total_bytes / total_sec * opt.limit_percent / 100.0;
+  }
+}
+
+if ( measured_limit )
+{
+  if ( !limit || measured_limit  limit )
+  {
+limit= measured_limit;
+  }
+}
+  }
+
+  if ( limit )
+  {
+
   /* Calculate the amount of time we expect downloading the chunk
- should take.  If in reality it took less time, sleep to
- compensate for the difference.  */
-  expected = (double) limit_data.chunk_bytes / opt.limit_rate;
+ should take at this fixed rate.  If in reality it took less time, 
+ sleep to compensate for the difference.  */
+
+  expected = (double) limit_data.chunk_bytes / limit;
 
   if (expected  delta_t)
 {
@@ -127,6 +174,8 @@ limit_bandwidth (wgint bytes, struct pti
 limit_data.sleep_adjust = -0.5;
 }
 
+  }
+
   limit_data.chunk_bytes = 0;
   limit_data.chunk_start = ptimer_read (timer);
 }
@@ -229,13 +278,13 @@ fd_read_body (int fd, FILE *out, wgint t
   progress_interactive = progress_interactive_p (progress);
 }
 
-  if (opt.limit_rate)
+  if (opt.limit_rate || opt.limit_percent)
 limit_bandwidth_reset ();
 
   /* A timer is needed for tracking progress, for throttling, and for
  tracking elapsed time.  If either of these are requested, start
  the timer.  */
-  if (progress || opt.limit_rate || elapsed)
+  if (progress || opt.limit_rate || opt.limit_percent || elapsed)
 {
   timer = ptimer_new ();
   last_successful_read_tm = 0;
@@ -286,7