Re: Wget and Yahoo login?
And you'll probably have to do this again- I bet yahoo expires the session cookies! On Tue, Sep 9, 2008 at 2:18 PM, Donald Allen [EMAIL PROTECTED] wrote: After surprisingly little struggle, I got Plan B working -- logged into yahoo with wget, saved the cookies, including session cookies, and then proceeded to fetch pages using the saved cookies. Those pages came back logged in as me, with my customizations. Thanks to Tony, Daniel, and Micah -- you all provided critical advice in solving this problem. /Don On Tue, Sep 9, 2008 at 2:21 PM, Donald Allen [EMAIL PROTECTED] wrote: On Tue, Sep 9, 2008 at 1:51 PM, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Donald Allen wrote: On Tue, Sep 9, 2008 at 1:41 PM, Micah Cowan [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Donald Allen wrote: I am doing the yahoo session login with firefox, not with wget, so I'm using the first and easier of your two suggested methods. I'm guessing you are thinking that I'm trying to login to the yahoo session with wget, and thus --keep-session-cookies and --save-cookies=foo.txt would make perfect sense to me, but that's not what I'm doing (yet -- if I'm right about what's happening here, I'm going to have to resort to this). But using firefox to initiate the session, it looks to me like wget never gets to see the session cookies because I don't think firefox writes them to its cookie file (which actually makes sense -- if they only need to live as long as the session, why write them out?). Yes, and I understood this; the thing is, that if session cookies are involved (i.e., cookies that are marked for immediate expiration and are not meant to be saved to the cookies file), then I don't see how you have much choice other than to use the harder method, or else to fake the session cookies by manually inserting them to your cookies file or whatnot (not sure how well that may be expected to work). Or, yeah, add an explicit --header 'Cookie: ...'. Ah, the misunderstanding was that the stuff you thought I missed was intended to push me in the direction of Plan B -- log in to yahoo with wget. Yes; and that's entirely my fault, as I didn't explicitly say that. No problem. I understand now. I'll look at trying to make this work. Thanks for all the help, though I can't guarantee that you are done yet :-) But, hopefully, this exchange will benefit others. I was actually surprised you kept going after I pointed out that it required the Accept-Encoding header that results in gzipped content. That didn't faze me because the pages I'm after will be processed by a python program, so having to gunzip would not require a manual step. This behavior is a little surprising to me from Yahoo!. It's not surprising in _general_, but for a site that really wants to be as accessible as possible (I would think?), insisting on the latest browsers seems ill-advised. Ah, well. At least the days are _mostly_ gone when I'd fire up Netscape, visit a site, and get a server-generated page that's empty other than the phrase You're not using Internet Explorer. :p And taking it one step further, I'm greatly enjoying watching Microsoft thrash around, trying to save themselves, which I don't think they will. Perhaps they'll re-invent themselves, as IBM did, but their cash cow is not going to produce milk too much longer. I've just installed the Chrome beta on the Windows side of one of my machines (I grudgingly give it 10 Gb on each machine; Linux gets the rest), and it looks very, very nice. They've still got work to do, but they appear to be heading in a very good direction. These are smart people at Google. All signs seem to be pointing towards more and more computing happening on the server side in the coming years. /Don - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxreZ7M8hyUobTrERAslyAJwKfirhzth9ACgdunxp/rfQlR86mQCcClik 3HbbATyqnrm0hAJXqNTqpl4= =3XD/ -END PGP SIGNATURE- -- Best Regards. Please keep in touch. This is unedited. P-)
Re: .1, .2 before suffix rather than after
... At the release of Wget 1.11, it is my intention to try to attract as much developer interest as possible. At the moment, and despite Wget's pervasive presence, it has virtually no user or developer community. Given the amount of work that needs to be done, this is not good. The announcement of the first new release of GNU Wget in two years seems a great opportunity to solicit help! ... That's sort of the nature of older tools with a well-defined mission- they do their job so well there's little itch to tweak them. If it ain't broken, you don't fix it. Freshmeat lists wget as mature, which basically means the same thing. I guess wget will have to get a bit immature to get some buzz going. Some pretty insane goals in a wget2 roadmap would probably do the trick. How about announcing plans implement DHT and make bittorrent obsolete? That should make slashdot ;-) Tony -- The above is not to be taken seriously.
Re: wget2
On Nov 29, 2007 3:48 PM, Alan Thomas [EMAIL PROTECTED] wrote: What is wget2? Any plans to move to Java? (Of course, the latter will not be controversial. :) Troll ;-)
Re: wget2
On Nov 29, 2007 4:02 PM, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Alan Thomas wrote: What is wget2? Any plans to move to Java? (Of course, the latter will not be controversial. :) Java is not likely. The most likely language is probably still C, ... I think he's a troll because one of the top google hits for wget2 is a short little java program he's apparently trying to draw attention to.
Re: Can't add ampersand to url I want to get
Single quotes will work when a URL includes a dollar sign. Double quotes won't. On Nov 5, 2007 12:07 PM, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Alles, Kris wrote: I tried wrapping the url with double quotes instead of single quotes and it works. Please disregard previous message. Both single and double quotes should work in typical Unix shells. Unless, of course, the quoted text contains a quote (which URIs usually shouldn't). - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHL3f87M8hyUobTrERCE29AJ0cYTbE+ukuyO2QwlLmpL8Jl8VJXwCeMAfD qToM3B1IsbY6BCjtRD94JBU= =wkDU -END PGP SIGNATURE- -- Best Regards. Please keep in touch.
Re: Need help with wget from a password-protected URL
sounds like a shell issue. assuming you are on a nix, try 'pass' (so shell passed the weird chars literally. If you are on Windows, it's another story. On 11/10/07, Uma Shankar [EMAIL PROTECTED] wrote: Hi - I've been struggling to download data from a protected site. The man pages intruct me to use the --http-user=USER and --http-passwd=PASS options when issuinig the wget command to the URL. I get error messages when wget encounters special chars in the password. Is there a way to get around this? I really need helpo downloading the data. Thanks, Uma Shankar, Research Associate Institute for the Environment Bank of America Plaza CB# 6116 137 E. Franklin St Room 644 Chapel Hill NC 27599-6116 Phone: (919) 966-2102 Fax (919) 843-3113 Mobile: (919) 441-9202 Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information? -T. S. Eliot (1888-1965) -- Best Regards. Please keep in touch.
Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)
On 10/31/07, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Tony Godshall wrote: On 10/30/07, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Tony Godshall wrote: Perhaps the little wget could be called wg. A quick google and wikipedia search shows no real namespace collisions. To reduce confusion/upgrade problems, I would think we would want to ensure that the traditional/little Wget keeps the current name, and any snazzified version gets a new one. Please not another -ng. How about wget2 (since we're on 1.x). And the current one remains in 1.x. I agree that -ng would not be appropriate. But since we're really talking about two separate beasts, I'd prefer not to limit what we can do with Wget (original)'s versioning. Who's to say a 2.0 release of the light version will not be warranted someday? At any rate, the snazzy one looks to be diverging from classic Wget in some rather significant ways, in which case, I'd kind of prefer to part names a bit more severely than just wget-ng or wget2. Reget, perhaps: that name could be both Recursive Get (describing what's still its primary feature), or Revised/Re-envisioned Wget. :) I think, too, that names such as wget2 are more often things that packagers (say, Debian) do, when they want to include backwards-incompatible, significantly new versions of software, but don't want to break people's usage of older stuff. Or, when they just want to offer both versions. Cf apache2 in Debian. And then eventually everyone's gotten used to used to and can't live without the new bittorrent-like almost-multithreaded features. ;-) :) Pget. Parallel get. Tget. Torrent-like-get. Bget. Bigger get. BBWget. Bigger Better wget. OK, ok sorry.
Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)
On 10/30/07, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Tony Godshall wrote: Perhaps the little wget could be called wg. A quick google and wikipedia search shows no real namespace collisions. To reduce confusion/upgrade problems, I would think we would want to ensure that the traditional/little Wget keeps the current name, and any snazzified version gets a new one. Please not another -ng. How about wget2 (since we're on 1.x). And the current one remains in 1.x. And then eventually everyone's gotten used to used to and can't live without the new bittorrent-like almost-multithreaded features. ;-) Tony
Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)
On 10/26/07, Josh Williams [EMAIL PROTECTED] wrote: On 10/26/07, Micah Cowan [EMAIL PROTECTED] wrote: And, of course, when I say there would be two Wgets, what I really mean by that is that the more exotic-featured one would be something else entirely than a Wget, and would have a separate name. I think the idea of having two Wgets is good. I too have been concerned about the resources required in creating the all-out version 2.0. The current code for Wget is a bit mangled, but I think the basic concepts surrounding it are very good ones. Although the code might suck for those trying to read it, I think it could be very great with a little regular maintenance. Perhaps the little wget could be called wg. A quick google and wikipedia search shows no real namespace collisions. There still remains the question, though, of whether version 2 will require a complete rewrite. Considering how fundamental these changes are, I don't think we would have much of a choice. You mentioned that they could share code for recursion, but I don't see how. IIRC, the code for recursion in the current version is very dependent on the current methods of operation. It would probably have to be rewritten to be shared. As for libcurl, I see no reason why not. Also, would these be two separate GNU projects? Would they be packaged in the same source code, like finch and pidgin? I do believe the next question at hand is what version 2's official mascot will be. I purpose Lenny the tortoise ;) Oooh- confusion with Debian testing _ .. Lenny - (_\/ \_, 'uuuu~' -- Best Regards. Please keep in touch.
Re: More portability stuff [Re: gettext configuration]
On 10/29/07, Dražen Kačar [EMAIL PROTECTED] wrote: Micah Cowan wrote: AFAIK, _no_ system supports POSIX 100%, AIX and Solaris have certified POSIX support. That's for the latest, IEEE Std 1003.1-2001. More systems have certified POSIX support for the older POSIX release. OTOH, all POSIX releases have optional parts which don't have to be implemented. Yeah, to be POSIX-compliant you just had to document which parts you didn't implement (comply with). T
Re: --limit-percent N versus --limit-rate N% ?
On 10/19/07, Matthew Woehlke [EMAIL PROTECTED] wrote: Micah Cowan wrote: Also: does the current proposed patch deal properly with situations such as where the first 15 seconds haven't been taken up by part of a single download, but rather several very small ones? I'm not very familiar yet with the rate-limiting stuff, so I really have no idea. If the point is to limit *your* bandwidth, well it's hard to say, although the consensus seems to be that overly conservative is the right thing to do, usually. Of course, if the point (as in one suggested use case) is to limit the amount of the /server's/ bandwidth consumed, then a new percent should be calculated for every host. Just some thoughts... I think it kicks in on each URL but would have to study the code more thoroughly. The point is to limit one's consumption of available bandwidth though upstream defective switches (that are unfair when saturated) and wifi (which exhibits the same effect). I was thinking especially of those who share one's pipe, since that's the choke point in most of my experience (the DSL modem, the WAN connection, the T-1). I don't think it helps servers much -- they tend to be on better-grade switches -- so a per-domain behavior doesn't make sense to me. TG
Re: Port range option in bind-address implemented?
On 10/18/07, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Tony Godshall wrote: On 10/17/07, Tony Godshall [EMAIL PROTECTED] wrote: On 10/17/07, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Tony Godshall wrote: Well, I'm don't have much to say about about the other points but one certainly does not need to keep an array for something like this- with the classic pseudorandom shuffle algorithm you only need to keep a count of the ones visited. Shall I pull out my Knuth? That... only applies if you actually keep a _queue_ around, of all the ports that you plan to try, and shuffle it. Surely that's more waste (65,535 shorts, versus 65,535 _bits_), not less? ...We're not shuffling, here, we're choosing. No, the point was that with a relative prime or two you can walk in a pseudorandom pattern though, hitting each point only once needing no array at all. Donald E. Knuth, The Art of Computer Programming, Volume 2, 3rd Edition, pp. 17-19 For the record, this is not what pseudo-random shuffle means to me: for instance, http://en.wikipedia.org/wiki/Fisher-Yates_shuffle (aka Knuth shuffle), which does in fact require an in-memory set to be permuted. Yeah, well, it's been 23 years since I took Data Structures, so sue me. And the shuffle you refer to is an attempt at actual randomness, whereas what I am talking about is explicitly a function of its *pseudo*-randomness- it's taking advantage of a characteristic (defect?) of an earlier attempt at real randomness. Yes, that appears to work quite well, as long as we seed it right; starting with a consistent X₀ would be just as bad as trying them sequentially, and choosing something that does not change several times a second (such as time()) still makes it likely that multiple invocations will choose the same first port. Probably, /dev/random as first choice, falling back to by gettimeofday() where that's available. I don't know what Windows would use. We could probably use time() as a last resort, though I'm not crazy about it; maybe it'd be better to fail to support port ranges if there's not a decent seed available, and support exact port specifications. Since implementation for 2^n is relatively easy, I think people usually write the algorithm to up to twice as many numbers as required and then skip if out of range. You know, I bet picking randomly and nonrepetetively from a range is a common enough task that it's in one of the standard libraries. If not, it probably should be. Thanks for the suggestion, Tony. If I have a though, I share. Too much sometimes ;-) or so my wife tells me. Tony
Re: ... --limit-rate nn%
On 10/17/07, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Tony Godshall wrote: About the parser... I'm thinking I can hack the parser that now handles the K, M, etc. suffixes so it works as it did before but also sees a '%' suffix as valid- that would reduce the amount of code necessary to implement --limit-rate nn%. Any reason not to do so? The current parser, and in particular, the actual code that handles, K, M, etc, is used by other options, for which percentages are not appropriate. Plus, whereas those options have been taking doubles, you'll now need some sort of struct to hold information as to whether there's a percentage or a direct rate specified. Yes, that's true. I guess I'll do a new parser, and to avoid duplicating code, I'll call the old parser if it doens't have a '%' suffix. Thanks
Re: wget default behavior [was Re: working on patch to limit to percent of bandwidth]
On 10/17/07, Matthias Vill [EMAIL PROTECTED] wrote: Tony Godshall wrote: If it was me, I'd have it default to backing off to 95% by default and have options for more aggressive behavior, like the multiple connections, etc. I don't like a default back-off rule. I often encounter downloads with often changing download speeds. The idea that the first few seconds I only have a quite bad speed and could get much more out of it is just not satisfying. You might be surprised, but I totally agree with you. A default backoff rule would only make sense if the measuring was better. E.g. a periodic ramp-up/back-off behavior to achieve 95% of the maximum measured rate. I'm surprised multiple connections would buy you anything, though. I guess I'll take a look through the archives and see what the argument is. Does one tcp connection back off on a lost packet and the other one gets to keep going? Hmmm. I guess you get improvements if e.g. on your side you have more free bandwidth than on the source-side. Having two connections than means, that you get almost twice the download speed, because you have two connections competing for free bandwidth and ideally every connection made two the sever is equally fast. So in cases, where you are the only one connecting, you probably win nothing. Ah, I get it. People want to defeat sender rate-limiting or other QOS controls. The opposite of nice. We could call it --mean-mode. Or --meanness n, where n=2 means I want to have two threads/connections, i.e. twice as mean as the default. No, well, actually, I guess there can be cases where bad upstream configurations result in a situation where more connections don't necessarily mean one is taking more than one's fair share of bandwidth, but I bet this option will be result in more harm than good. Perhaps it should be one of those things that one can do oneself if one must but is generally frowned upon (like making a version of wget that ignores robots.txt). TG
Ignoring robots.txt [was Re: wget default behavior...]
... Perhaps it should be one of those things that one can do oneself if one must but is generally frowned upon (like making a version of wget that ignores robots.txt). Damn. I was only joking about ignoring robots.txt, but now I'm thinking[1] there may be good reasons to do so... maybe it should be in mainline wget. T [1] http://web.archive.org/web/20041013225557/http://www.differentstrings.info/archives/002813.html
Re: Ignoring robots.txt [was Re: wget default behavior...]
Tony Godshall wrote: ... Perhaps it should be one of those things that one can do oneself if one must but is generally frowned upon (like making a version of wget that ignores robots.txt). Damn. I was only joking about ignoring robots.txt, but now I'm thinking[1] there may be good reasons to do so... maybe it should be in mainline wget. Actually, it is. -e robots=off. :) This also turns off obedience to the nofollow attribute sometimes found in meta and a tags. Ah, my ignorance is showing. I stand corrected.
Re: Port range option in bind-address implemented?
On 10/17/07, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Oleg Ace wrote: Greetings, Was the feature being discussed here http://www.mail-archive.com/wget@sunsite.dk/msg05546.html and here http://www.mail-archive.com/wget@sunsite.dk/msg05577.html ever get implemented? In other words, is it possible to do: wget --bind-address=1.2.3.4:2000-3000 http://... From trying it out and looking briefly at the code, it would appear it is not, but wanted to make sure. If that is the case, does anyone still have the old patch available, or has a similar new one? Looking at the threads you indicated, it appears that people were generally happy to include the feature, but were unhappy with the specific implementation from the patch: * parsing of --bind-address belongs in the getopt loop * sscanf() should be avoided for use in the parsing. * the ports should be chosen from that range at random, rather than sequentially, to address an issue pointed out by the Sockets FAQ. The third point above introduces its own problems: how many bind() attempts should we make before throwing in the towel? Or should we attempt every port in that range, keeping an 8k array of bits to track which ports we've tried already? Well, I'm don't have much to say about about the other points but one certainly does not need to keep an array for something like this- with the classic pseudorandom shuffle algorithm you only need to keep a count of the ones visited. Shall I pull out my Knuth? Clearly, whatever approach we take will be _vastly_ less efficient/intelligent than the way the OS picks a port for us, and we'll need to point these limitations out in the documentation. I'm not going to write the code for this (at least not any time soon); if someone is interested enough to rewrite the patch to address these shortcomings, though, I'll be happy to include it, seeing as how it apparently met with Hrvoje's and Mauro's approval (and I see how it could be useful as well; though of course its primary use is probably to get around broken environments). I will submit a low-pri issue for it, in the meantime, in case someone sees it and wants to pick it up. :) - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHFm3C7M8hyUobTrERCKYdAJwMSsemuOoWGmDFLxK8vAzxNlXQWQCfV89r 0gVv77+C2CIlI4lVULFLLC8= =xpu6 -END PGP SIGNATURE- -- Best Regards. Please keep in touch.
Re: Port range option in bind-address implemented?
On 10/17/07, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Tony Godshall wrote: Well, I'm don't have much to say about about the other points but one certainly does not need to keep an array for something like this- with the classic pseudorandom shuffle algorithm you only need to keep a count of the ones visited. Shall I pull out my Knuth? That... only applies if you actually keep a _queue_ around, of all the ports that you plan to try, and shuffle it. Surely that's more waste (65,535 shorts, versus 65,535 _bits_), not less? ...We're not shuffling, here, we're choosing. No, the point was that with a relative prime or two you can walk in a pseudorandom pattern though, hitting each point only once needing no array at all.
Re: Port range option in bind-address implemented?
On 10/17/07, Tony Godshall [EMAIL PROTECTED] wrote: On 10/17/07, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Tony Godshall wrote: Well, I'm don't have much to say about about the other points but one certainly does not need to keep an array for something like this- with the classic pseudorandom shuffle algorithm you only need to keep a count of the ones visited. Shall I pull out my Knuth? That... only applies if you actually keep a _queue_ around, of all the ports that you plan to try, and shuffle it. Surely that's more waste (65,535 shorts, versus 65,535 _bits_), not less? ...We're not shuffling, here, we're choosing. No, the point was that with a relative prime or two you can walk in a pseudorandom pattern though, hitting each point only once needing no array at all. Donald E. Knuth, The Art of Computer Programming, Volume 2, 3rd Edition, pp. 17-19 -- Best Regards. Please keep in touch.
Re: Port range option in bind-address implemented?
On 10/17/07, Tony Godshall [EMAIL PROTECTED] wrote: On 10/17/07, Tony Godshall [EMAIL PROTECTED] wrote: On 10/17/07, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Tony Godshall wrote: Well, I'm don't have much to say about about the other points but one certainly does not need to keep an array for something like this- with the classic pseudorandom shuffle algorithm you only need to keep a count of the ones visited. Shall I pull out my Knuth? That... only applies if you actually keep a _queue_ around, of all the ports that you plan to try, and shuffle it. Surely that's more waste (65,535 shorts, versus 65,535 _bits_), not less? ...We're not shuffling, here, we're choosing. No, the point was that with a relative prime or two you can walk in a pseudorandom pattern though, hitting each point only once needing no array at all. Donald E. Knuth, The Art of Computer Programming, Volume 2, 3rd Edition, pp. 17-19 ...and probably closer at hand... http://en.wikipedia.org/wiki/Linear_congruential_generator TG
Re: ... --limit-rate nn%
On 10/15/07, Matthias Vill [EMAIL PROTECTED] wrote: Micah Cowan schrieb: Matthias Vill wrote: I would appreciate having a --limit-rate N% option. So now about those broken cases. You could do some least of both policy (which would of course still need the time to do measuring and can cut only afterwards). Or otherwise you could use a non-percent value as a minimum. This would be especially useful if you add it to your default options and stumble over some slow server only serving you 5KiB/s, where you most probably don't want to further lower the speed on your side. As third approach you would only use the last limiting option. Depending on how difficult the implementation is I would vote for the second behavior, although the first or third option might be more intuitive to some of the users not reading the docs. Third option should be more intuitive to the implementer, too. I vote for that, as I really want to avoid putting too much sophistication into this. I would expect, that you need to variables for holding percent/fixed values anyway so I was just wondering whether you could do it as I suggested secondly. IMHO that should be quite easy to do by a single if(fixedpercent)limit=max(a,b) and thus not even result in any overhead during actual download. Greetings Matthias P.S.: I'm subscribed via news://sunsite.dk, you don't need to CC me. Thanks for the input, guys. I'll see what I can do. About the parser... I'm thinking I can hack the parser that now handles the K, M, etc. suffixes so it works as it did before but also sees a '%' suffix as valid- that would reduce the amount of code necessary to implement --limit-rate nn%. Any reason not to do so? -- Best Regards. Please keep in touch.
Re: wget default behavior [was Re: working on patch to limit to percent of bandwidth]
On 10/13/07, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 10/13/07, Tony Godshall [EMAIL PROTECTED] wrote: OK, so let's go back to basics for a moment. wget's default behavior is to use all available bandwidth. Is this the right thing to do? Or is it better to back off a little after a bit? Heh. Well, some people are saying that Wget should support accelerated downloads; several connections to download a single resource, which can sometimes give a speed increase at the expense of nice-ness. So you could say we're at a happy medium between those options! :) Actually, Wget probably will get support for multiple simultaneous connections; but number of connections to one host will be limited to a max of two. It's impossible for Wget to know how much is appropriate to back off, and in most situations I can think of, backing off isn't appropriate. In general, though, I agree that Wget's policy should be nice by default. If it was me, I'd have it default to backing off to 95% by default and have options for more aggressive behavior, like the multiple connections, etc. I'm surprised multiple connections would buy you anything, though. I guess I'll take a look through the archives and see what the argument is. Does one tcp connection back off on a lost packet and the other one gets to keep going? Hmmm. Josh Williams wrote: That's one of the reasons I believe this should be a module instead, because it's more or less a hack to patch what the environment should be doing for wget, not vice versa. At this point, since it seems to have some demand, I'll probably put it in for 1.12.x; but I may very well move it to a module when we have support for that. Thanks, yes that makes sense. Of course, Tony G indicated that he would prefer it to be conditionally-compiled, for concerns that the plugin architecture will add overhead to the wget binary. Wget is such a lightweight app, though, I'm not thinking that the plugin architecture is going to be very significant. It would be interesting to see if we can add support for some modules to be linked in directly, rather than dynamically; however, it'd still probably have to use the same mechanisms as the normal modules in order to work. Anyway, I'm sure we'll think about those things more when the time comes. Makes sense. Or you could be proactive and start work on http://wget.addictivecode.org/FeatureSpecifications/Plugins (non-existent, but already linked to from FeatureSpecifications). :) I'll look into that. On 10/14/07, Hrvoje Niksic [EMAIL PROTECTED] wrote: Tony Godshall [EMAIL PROTECTED] writes: OK, so let's go back to basics for a moment. wget's default behavior is to use all available bandwidth. And so is the default behavior of curl, Firefox, Opera, and so on. The expected behavior of a program that receives data over a TCP stream is to consume data as fast as it arrives. What was your point exactly? All the other kids do it? Tony G
Re: PATCHES file removed
On 10/13/07, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hrvoje Niksic wrote: Micah Cowan [EMAIL PROTECTED] writes: FYI, I've removed the PATCHES file. Not because I don't think it's useful, but because the information needed updating (now that we're using Mercurial rather than Subversion), I expect it to be updated again from time to time, and the Wgiki seems to be the right place to keep changing documentation (http://wget.addictivecode.org/PatchGuidelines). It's still obviously useful to have patch-submission information included as part of the Wget distribution itself; It would be nice for the distribution to contain that URL on a prominent place, such as in the README, or even a stub PATCHES file. It's in NEWS, but putting it in README can't hurt. Hey, that's a handy link. I'll follow it for my next rev (--limit-rate nn%) I had read the README file but I'm not in the habit of looking for a NEWS file. Speaking of which, I've replaced the MAILING-LISTS file, regenerating it from the Mailing Lists section of the Texinfo manual. I suspect it had previously been generated from source, but it's not clear to me from what (perhaps the web page?), or what tool was used. It was simply hand-written. :-) Oh, yeah, I don't want to do that in three places then (MAILING-LISTS, Wgiki, and manual)! It had a right-aligned -*- text -*- thing at the top, so I was thinking that was an indication of having been generated. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHERDH7M8hyUobTrERCKqOAKCGuapIPLSYLpDktbteDDYyU2I2AgCfRWs9 iznnPJ4ejopsaSgeY/APk78= =GHTD -END PGP SIGNATURE- -- Best Regards. Please keep in touch.
Re: wget default behavior [was Re: working on patch to limit to percent of bandwidth]
On 10/13/07, Josh Williams [EMAIL PROTECTED] wrote: On 10/13/07, Tony Godshall [EMAIL PROTECTED] wrote: Well, you may have such problems but you are very much reaching in thinking that my --linux-percent has anything to do with any failing in linux. It's about dealing with unfair upstream switches, which, I'm quite sure, were not running Linux. Let's not hijack this into a linux-bash. I really don't know what you were trying to say here... You seemed to think --limit-percent was a solution for a misbehavior of linux. My experience with linux networking is that it's very effective and that upstream non-linux switches don't handle such an effective client well. When a linux box is my gateway/firewall I don't experience single-client monopolization at all. As to your linux issues, that's a topic that should probably discussed in another forum, but I will say that I'm quite happy with the latest Linux kernels- with the low-latency patch integrated and enabled my desktop experience is quite snappy, even on this four-year-old 1.2GHz laptop. And stay away from the distro server kernels- they are optimized for throughput at the cost of latency- they do their I/O in bigger chunks. And stay away from the RT kernels- they go too far in giving I/O priority over everything else and end up churning on IRQs unless they are very carefully tuned. And no, I won't call the linux kernel GNU/Linux, if that was what you were after. The kernel is after all the one Linux thing in a GNU/Linux system. .. I use GNU/Linux. Anyone try Debian GNU/BSD yet? Or Debian/Nexenta/GNU/Solaris? -- Best Regards. Please keep in touch.
Re: wget default behavior
On 10/14/07, Hrvoje Niksic [EMAIL PROTECTED] wrote: Tony Godshall [EMAIL PROTECTED] writes: OK, so let's go back to basics for a moment. wget's default behavior is to use all available bandwidth. And so is the default behavior of curl, Firefox, Opera, and so on. The expected behavior of a program that receives data over a TCP stream is to consume data as fast as it arrives. Yup.
wget default behavior [was Re: working on patch to limit to percent of bandwidth]
OK, so let's go back to basics for a moment. wget's default behavior is to use all available bandwidth. Is this the right thing to do? Or is it better to back off a little after a bit? Tony
Re: working on patch to limit to percent of bandwidth
On 10/12/07, Hrvoje Niksic [EMAIL PROTECTED] wrote: Tony Godshall [EMAIL PROTECTED] writes: My point remains that the maximum initial rate (however you define initial in a protocol as unreliable as TCP/IP) can and will be wrong in a large number of cases, especially on shared connections. Again, would an algorithm where the rate is re-measured periodically and the initial-rate-error criticism were therefore addressed reduce your objection to the patch? Personally I don't see the value in attempting to find out the available bandwidth automatically. You keep saying that. It seems too error prone, no matter how much heuristics you add into it. But like someone said- the error is always in the nice direction. --limit-rate works because reading the data more slowly causes it to (eventually) also be sent more slowly. --limit-percentage is impossible to define in precise terms, there's just too much guessing. My patch --limit-percent does exactly the same thing except without requiring foreknowledge. -- Best Regards. Please keep in touch.
Re: wget default behavior [was Re: working on patch to limit to percent of bandwidth]
On 10/13/07, Josh Williams [EMAIL PROTECTED] wrote: On 10/13/07, Tony Godshall [EMAIL PROTECTED] wrote: OK, so let's go back to basics for a moment. wget's default behavior is to use all available bandwidth. Is this the right thing to do? Or is it better to back off a little after a bit? Tony IMO, this should be handled by the operating system, not the individual applications. That's one of the reasons I believe this should be a module instead, because it's more or less a hack to patch what the environment should be doing for wget, not vice versa. In my experience, GNU/Linux tends to consume all the resources unbiasedly, seemingly on a first come first serve *until you're... Well, you may have such problems but you are very much reaching in thinking that my --linux-percent has anything to do with any failing in linux. It's about dealing with unfair upstream switches, which, I'm quite sure, were not running Linux. Let's not hijack this into a linux-bash. -- Best Regards. Tony
Re: wget default behavior [was Re: working on patch to limit to percent of bandwidth]
On 10/13/07, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 10/13/07, Tony Godshall [EMAIL PROTECTED] wrote: OK, so let's go back to basics for a moment. wget's default behavior is to use all available bandwidth. Is this the right thing to do? Or is it better to back off a little after a bit? Heh. Well, some people are saying that Wget should support accelerated downloads; several connections to download a single resource, which can sometimes give a speed increase at the expense of nice-ness. So you could say we're at a happy medium between those options! :) Actually, Wget probably will get support for multiple simultaneous connections; but number of connections to one host will be limited to a max of two. It's impossible for Wget to know how much is appropriate to back off, and in most situations I can think of, backing off isn't appropriate. In general, though, I agree that Wget's policy should be nice by default. Yeah, thanks, that's what I was trying to get at. Wget should be aggressive iff you tell it to be, and otherwise should be nice. In the presense of bad upstream switches I've found that even a --limit-rate of 95% is way more tolerable to others than the default 100% utilization. Josh Williams wrote: That's one of the reasons I believe this should be a module instead, because it's more or less a hack to patch what the environment should be doing for wget, not vice versa. At this point, since it seems to have some demand, I'll probably put it in for 1.12.x; but I may very well move it to a module when we have support for that. Of course, Tony G indicated that he would prefer it to be conditionally-compiled, for concerns that the plugin architecture will add overhead to the wget binary. Wget is such a lightweight app, though, I'm not thinking that the plugin architecture is going to be very significant. It would be interesting to see if we can add support for some modules to be linked in directly, rather than dynamically; however, it'd still probably have to use the same mechanisms as the normal modules in order to work. Anyway, I'm sure we'll think about those things more when the time comes. Good point. I guess someone who wants an ultralightweight wget will use the one in busybox instead of the normal one. Or you could be proactive and start work on http://wget.addictivecode.org/FeatureSpecifications/Plugins (non-existent, but already linked to from FeatureSpecifications). :) Interesting. I'll take a look. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHESNi7M8hyUobTrERCChSAJ90KmWelT0bH9qQMlArapEdn1ocSACfRHcK JJmV8QaqcnKTRYam/v0/lwg= =TPsw -END PGP SIGNATURE- -- Best Regards. Please keep in touch.
Re: working on patch to limit to percent of bandwidth
On 10/12/07, Hrvoje Niksic [EMAIL PROTECTED] wrote: Tony Godshall [EMAIL PROTECTED] writes: available bandwidth and adjusts to that. The usefullness is in trying to be unobtrusive to other users. The problem is that Wget simply doesn't have enough information to be unobtrusive. Currently available bandwidth can and does change as new downloads are initiated and old ones are turned off. Measuring initial bandwidth is simply insufficient to decide what bandwidth is really appropriate for Wget; only the user can know that, and that's what --limit-rate does. My patch (and the doc change in my patch) don't claim to be totally unobtrusive [...] Obviously people who the level of unobtrusiveness you define shouldn't be using it. It was never my intention to define a particular level of unobtrusiveness; the concept of being unobtrusive to other users was brought up by Jim and I was responding to that. My point remains that the maximum initial rate (however you define initial in a protocol as unreliable as TCP/IP) can and will be wrong in a large number of cases, especially on shared connections. Again, would an algorithm where the rate is re-measured periodically and the initial-rate-error criticism were therefore addressed reduce your objection to the patch? Perhaps you can answer each idea I gave separately: a) full speed downloads (which re-measure channel capacity) followed by long sleeps b) speed ramps up to peak and then back down Not only is it impossible to be totally unobtrusive, but any *automated* attempts at being nice to other users are doomed to failure, either by taking too much (if the download starts when you're alone) or too little (if the download starts with shared connection). Again, I do not claim to be unobtrusive. Merely to reduce obtrusiveness. I do not and cannot claim to be making wget *nice*, just nicER. You can't deny that dialing back is nicer than not. -- Best Regards. Please keep in touch.
Re: working on patch to limit to percent of bandwidth
On 10/12/07, Josh Williams [EMAIL PROTECTED] wrote: On 10/12/07, Tony Godshall [EMAIL PROTECTED] wrote: Again, I do not claim to be unobtrusive. Merely to reduce obtrusiveness. I do not and cannot claim to be making wget *nice*, just nicER. You can't deny that dialing back is nicer than not. Personally, I think this is a great idea. But I do agree that the documentation is a bit messy right now (as well as the code). If this doesn't make it into the current trunk, I think it'd make a great module in version 2. Thanks for the support
Re: anyone look at the actual patch? anyone try it? [Re: working on patch to limit to percent of bandwidth]
... I guess I'd like to see compile-time options so people could make a tiny version for their embedded system, with most options and all documentation stripped out, and a huge kitchen-sink all-the-bells version and complete documentation for the power user version. I don't think you have to go to a totally new (plug in) architecture or make the hard choices. [Jim] Well, we need the plugin architecture anyway. There are some planned features (JavaScript and MetaLink support being the main ones) that have no business in Wget proper, as far as I'm concerned, but are inarguably useful. I know when I put an app into an embedded app, I'd rather not even have the overhead of the plug-in mechanism, I want it smaller than that. You have a good point regarding customized compilation, though I think that most of the current features in Wget belong as core features. There are some small exceptions (egd sockets). Thanks. Well, when I'm building an embedded device, I look at the invocations wget that are actually being called in the scripts. Since the end product has no interactive shell, I don't need to have all those extra options enabled! In fact, in wget's case, one can often dispense with the tool entirely- the busybox version suffices. ... And when I'm running the gnu version of something I expect it to have verbose man pages and lots of double-dash options, that's what tools like less and grep are for. Well... many GNU tools actually lack verbose man pages, particularly since info is the preferred documentation system for GNU software. Well, I guess I'm spoiled by Debian. If it ain't broke, don't fix it. Debian makes man pages because tools should have manpages. IIRC, that was one of the divorce issues. Despite the fact that many important GNU utilities are very feature-packed, they also tend not to have options that are only useful to a relatively small number of people--particularly when equivalent effects are possible with preexisting options. As to the overhead of the plugin mechanism, you're right, and I may well decide to make that optionally compiled. Well, I'd rather have rate-limiting things be optionally compiled than plugged-in, since they'd be useful for embedded devices. [Micah] It's not really about this option, it's about a class of options. I'm in the unenviable position of having to determine whether small patches that add options are sufficiently useful to justify the addition of the option. Adding one new option/rc command is not a problem. But when, over time, fifty people suggest little patches that offer options with small benefits, we've suddenly got fifty new options cluttering up the documentation and --help output. [Jim] I would posit that the vast majority of wget options are used in some extremely small percentage of wget invocations. Should they be removed? [Micah] Such as which ones? I don't think we're talking about the same extremely small percentages. OK, so so far there are three of us, I think, that find it potentially useful. And you have not addressed the use cases I brought up. So I think your extremely small percentages assumption may be faulty. Looking through the options listed with --help, I can find very few options that I've never used or would not consider vital in some situations I (or someone else) might encounter. This doesn't look to me like a vital function, one that a large number of users will find mildly useful, or one that a mild number of users will find extremely useful. This looks like one that a mild number of users will find mildly useful. Only slightly more useful, in fact, than what is already done. You keep saying that. You seem to think unknown upstream bandwidth is a rare thing. Or that wanting to be nice to other bandwidth users in such a circumstance is a rare thing. I wish I lived in your universe. Mine's a lot more sloppy. It's also one of those fuzzy features that addresses a scenario that has no right solution (JavaScript support is in that domain). These sorts of features tend to invite a gang of friends to help get a little bit closer to the unreachable target. For instance, if we include this option, then the same users will find another option to control the period of time spent full-bore just as useful. A pulse feature might be useful, but then you'll probably want an option to control the spacing between those, too. And someone else may wish to introduce an option that saves bandwidth information persistently, and uses this to make a good estimate from the beginning. Ah, finally, some meat. You see this as opening a door. Especially as I inquire as too whether anyone has feedback on my implementation, you see it mushrooming into a plethora of options. And all of this would amount to a very mild improvement over what already exists. Your universe crisp, mine sloppy (above) ;-) In my view, wget is a useful and flexible
Re: working on patch to limit to percent of bandwidth
On 10/10/07, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Tony Godshall wrote: The scenario I was picturing was where you'd want to make sure some bandwidth was left available so that unfair routers wouldn't screw your net-neighbors. I really don't see this as an attempt to be unobtrusive at all. This is not an attempt to hide one's traffic, it's an attempt to not overwhelm in the presence of unfair switching. If I say --limit-pct 75% and the network is congested, yes, what I want is to use no more than 75% of the available bandwidth, not the total bandwidth. So, yes, if the network is more congensted just now, then let this download get a lower bitrate, that's fine. I'm pretty sure that's what Jim meant by being unobtrusive; it surely had nothing to do with traffic-hiding. My current impression is that this is a useful addition for some limited scenarios, but not particularly more useful than --limit-rate already is. That's part of what makes it a good candidate as a plugin. I guess I don't see how picking a reasonable rate automatically is less useful then having to know what the maximum upstream bandwidth is ahead of time. (If the argument is about the rare case where maxing out the download even briefly is unacceptable, then the whole technique wget uses is really not appropriate- even limit-rate does not back off till it has retrieved enough bytes to start measuring and then they come in as a burst- that's the nature of starting a TCP connection.)
anyone look at the actual patch? anyone try it? [Re: working on patch to limit to percent of bandwidth]
On 10/11/07, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Tony Godshall wrote: On 10/10/07, Micah Cowan [EMAIL PROTECTED] wrote: My current impression is that this is a useful addition for some limited scenarios, but not particularly more useful than --limit-rate already is. That's part of what makes it a good candidate as a plugin. I guess I don't see how picking a reasonable rate automatically is less useful then having to know what the maximum upstream bandwidth is ahead of time. I never claimed it was less useful. In fact, I said it was more useful. My doubt is as to whether it is _significantly_ more useful. For me, yes. For you, apparently not. It's a small patch, really. Did you even look at it? I'm still open, just need more convincing. Well, I've said my piece. Anyone want to comment on the actual code? Anybody try it?
Re: anyone look at the actual patch? anyone try it? [Re: working on patch to limit to percent of bandwidth]
... I have, yes. And yes, it's a very small patch. The issue isn't so much about the extra code or code maintenance; it's more about extra documentation, and avoiding too much clutter of documentation and lists of options/rc-commands. I'm not very picky about adding little improvements to Wget; I'm a little pickier about adding new options. It's not really about this option, it's about a class of options. I'm in the unenviable position of having to determine whether small patches that add options are sufficiently useful to justify the addition of the option. Adding one new option/rc command is not a problem. But when, over time, fifty people suggest little patches that offer options with small benefits, we've suddenly got fifty new options cluttering up the documentation and --help output. Would it be better, then, if I made it --limit-rate nn% instead of limit-percent nn? And made the descrip briefer? If the benefits are such that only a handful of people will ever use any of them, then they may not have been worth the addition, and I'm probably not doing my job properly. ... I guess I'd like to see compile-time options so people could make a tiny version for their embedded system, with most options and all documentation stripped out, and a huge kitchen-sink all-the-bells version and complete documentation for the power user version. I don't think you have to go to a totally new (plug in) architecture or make the hard choices. I know when I put an app into an embedded app, I'd rather not even have the overhead of the plug-in mechanism, I want it smaller than that. And when I'm running the gnu version of something I expect it to have verbose man pages and lots of double-dash options, that's what tools like less and grep are for. Tony
Re: anyone look at the actual patch? anyone try it? [Re: working on patch to limit to percent of bandwidth]
On 10/11/07, Tony Godshall [EMAIL PROTECTED] wrote: ... I have, yes. And yes, it's a very small patch. The issue isn't so much about the extra code or code maintenance; it's more about extra documentation, and avoiding too much clutter of documentation and lists of options/rc-commands. I'm not very picky about adding little improvements to Wget; I'm a little pickier about adding new options. It's not really about this option, it's about a class of options. I'm in the unenviable position of having to determine whether small patches that add options are sufficiently useful to justify the addition of the option. Adding one new option/rc command is not a problem. But when, over time, fifty people suggest little patches that offer options with small benefits, we've suddenly got fifty new options cluttering up the documentation and --help output. Would it be better, then, if I made it --limit-rate nn% instead of limit-percent nn? And made the descrip briefer? Also would it help if the behavior was changed so it pulsed occasionally and therefore wouldn't suffer from the initial-measurement-error case. I'm trying to judge whether I should spend more time touching it up into something acceptable or just let it remain a personal hack. If the benefits are such that only a handful of people will ever use any of them, then they may not have been worth the addition, and I'm probably not doing my job properly. ... I guess I'd like to see compile-time options so people could make a tiny version for their embedded system, with most options and all documentation stripped out, and a huge kitchen-sink all-the-bells version and complete documentation for the power user version. I don't think you have to go to a totally new (plug in) architecture or make the hard choices. I know when I put an app into an embedded app, I'd rather not even have the overhead of the plug-in mechanism, I want it smaller than that. And when I'm running the gnu version of something I expect it to have verbose man pages and lots of double-dash options, that's what tools like less and grep are for. Tony -- Best Regards. Please keep in touch.
Re: working on patch to limit to percent of bandwidth
- --limit-rate will find your version handy, but I want to hear from them. :) I would appreciate and have use for such an option. We often access instruments in remote locations (think a tiny island in the Aleutians) where we share bandwidth with other organizations. A limitation in percentage doesn't make sense if you don't know exactly how much bandwidth is available. Trying to determine full bandwidth and backing off from there is IMHO doomed to failure because the initial speed Wget gets can be quite different from the actual link bandiwdth, at least in a shared link scenario. A --limit-percent implemented as proposed here would only limit the retrieval speed to the specified fraction of the speed Wget happened to get at the beginning of the download. That is not only incorrect, but also quite non-deterministic. If there were way to query the network for the connection speed, I would support the limit-percent idea. But since that's not possible, I think it's better to stick with the current --limit-rate, where we give the user an option to simply tell Wget how much bandwidth to consume. I think there is still a case for attempting percent limiting. I agree with your point that we can not discover the full bandwidth of the link and adjust to that. The approach discovers the current available bandwidth and adjusts to that. The usefullness is in trying to be unobtrusive to other users. Network conditions do change and my initial all-out estimate is certainly not ideal but it works in many situations. Another way would be to transfer in a more bursty mode rather than meter at small granularity- that way one could measure the rate of each burst an one could take the channel capacity. I worry that that might be more harmful to those sharing channel in cases like Hvroje's than the initial-burst measurement, and in fact have am thinking that one could cache the initial-measure value and use it for future connections. An alternative to a bursty mode would be to start at full speed and then ramp down till we hit the desired rate and then ramp back up till we bump the limit and down again. That way we could update the max rate estimate periodically and recover from any error that might have occurred in the initial estimate. Any thoughts on this behavior? Less harmful? More? Tony
Re: working on patch to limit to percent of bandwidth
... I worry that that might be more harmful to those sharing channel in cases like Hvroje's ... Sorry, Hvroje, Jim, I meant Jim's case. Tony
Re: working on patch to limit to percent of bandwidth
Jim Wright wrote: I think there is still a case for attempting percent limiting. I agree with your point that we can not discover the full bandwidth of the link and adjust to that. The approach discovers the current available bandwidth and adjusts to that. The usefullness is in trying to be unobtrusive to other users. Does it really fit that description, though? Given that it runs full-bore for 15 seconds (not that that's very long)... I guess it depends on the type of users you are sharing with and the upstream switches and routers. My experience is that with some routers and switches a single user wget'ing an iso can cause web-browsing people to experience slow response. That kind of application is not one where 15sec will make much difference, and in fact there's a big backoff after that first 15sec. OTOH if you are sharing with latency-sensitive apps (VOIP, realtime control, etc.) and a wget bogs your app down, you better fix your switches and routers- you will be affected by anybody in an interactive web browser streaming youtube or whatever too. This patch is not a solution for that use case, and I agree that there really isn't one that an app like wget can reasonably implement (without delving into nonportable OS stuff). Tony
Re: working on patch to limit to percent of bandwidth
On 10/10/07, Tony Lewis [EMAIL PROTECTED] wrote: Hrvoje Niksic wrote: Measuring initial bandwidth is simply insufficient to decide what bandwidth is really appropriate for Wget; only the user can know that, and that's what --limit-rate does. The user might be able to make a reasonable guess as to the download rate if wget reported its average rate at the end of a session. That way the user can collect rates over time and try to give --limit-rate a reasonable value. Tony [L] It reports rate during, and on my connections it's constant enough to make a good estimate for --limit-rate. I just automated that, really. Tony G
Re: working on patch to limit to percent of bandwidth
Indeed. On 10/10/07, Hrvoje Niksic [EMAIL PROTECTED] wrote: Jim Wright [EMAIL PROTECTED] writes: I think there is still a case for attempting percent limiting. I agree with your point that we can not discover the full bandwidth of the link and adjust to that. The approach discovers the current available bandwidth and adjusts to that. The usefullness is in trying to be unobtrusive to other users. The problem is that Wget simply doesn't have enough information to be unobtrusive. Currently available bandwidth can and does change as new downloads are initiated and old ones are turned off. Measuring initial bandwidth is simply insufficient to decide what bandwidth is really appropriate for Wget; only the user can know that, and that's what --limit-rate does. My patch (and the doc change in my patch) don't claim to be totally unobtrusive - it has a particular behavior that's documented, which is to try to be less obtrusive than your typical get-it-for-me-right-now-as-fast-as-you-can download. Obviously people who the level of unobtrusiveness you define shouldn't be using it. Then again, people who require that level probably need to get routers implement a little more fairness or QOS- that don't let one TCP connection lock out other users. My patch just does automatically what I used to do manually- start obtrusive and then scale back to less obtrusive for the rest of the download. Even competent non-sys-admin people often are not appraised of the technical details of the networks they use, but they may still want to be reasonably nice, for example, to the other people using the wifi at the cybercafe. It's certainly a step above the naive behavior- the naive user doesn't even know then (their typical tools, like MSIE don't even tell them!) Tony G
Re: working on patch to limit to percent of bandwidth
I think there is still a case for attempting percent limiting. I agree with your point that we can not discover the full bandwidth of the link and adjust to that. The approach discovers the current available bandwidth and adjusts to that. The usefullness is in trying to be unobtrusive to other users. The problem is that Wget simply doesn't have enough information to be unobtrusive. Currently available bandwidth can and does change as new downloads are initiated and old ones are turned off. Measuring initial bandwidth is simply insufficient to decide what bandwidth is really appropriate for Wget; only the user can know that, and that's what --limit-rate does. So far, I'm inclined to agree. For instance, if one just sticks limit_percent = 25 in their wgetrc, then on some occasions, Wget will limit to far too _low_ a rate, when most of the available bandwidth is already being consumed by other things. ... Well, if you are using 25% then you are trying to be *really* nice and there's no such thing as far too low, is there. The scenario I was picturing was where you'd want to make sure some bandwidth was left available so that unfair routers wouldn't screw your net-neighbors. I really don't see this as an attempt to be unobtrusive at all. This is not an attempt to hide one's traffic, it's an attempt to not overwhelm in the presence of unfair switching. If I say --limit-pct 75% and the network is congested, yes, what I want is to use no more than 75% of the available bandwidth, not the total bandwidth. So, yes, if the network is more congensted just now, then let this download get a lower bitrate, that's fine. Tony
not dominating bandwidth caching a value [Re: ... patch to limit to percent of bandwidth]
[private response to limit list clutter] or not. oops. ... Note though that my patch *does* dominate the bandwidth for about 15 seconds to measure the available bandwidth before it falls back. On my network, it seemed to take a few seconds before enough bytes were transferred to get a reasonable measure. What I'd actually like to do is fold the argument into the limit-rate option, like so... --limit-rate 20K means limit rate to 20KBps --limit-rate 20% means limit average rate to 20% of initial measured bandwidth I wonder if it would make sense to cache the measured rate so that the second time you limit by percentage it skips the measurement step and thus the temporary domination. I guess then I'd also want to provide a --measure-rate option to force re-measurement when network changes. Are there any reasons not to cache like that? Does wget currently save any state someplace or would I need to implement that whole bit? Tony
Initial draft- patch to limit bandwidth by percent of measured rate
Please find attached... The quick test: If you run wget with --limit-percent 50, you should see it run at full blast for 15 seconds and then back off till it's downloading at 50% the rate it acheived in the first 15 seconds. This is only the initial Works For Me version of the patch. Comments welcome, as anyone else who wants to run with it is welcome to do so. Best Regards. Tony PS to Micah: Yes, I changed pct to percent. diff --git a/src/init.c b/src/init.c --- a/src/init.c +++ b/src/init.c @@ -179,6 +179,7 @@ static const struct { #endif { input,opt.input_filename,cmd_file }, { keepsessioncookies, opt.keep_session_cookies, cmd_boolean }, + { limitpercent, opt.limit_percent, cmd_number }, { limitrate,opt.limit_rate,cmd_bytes }, { loadcookies, opt.cookies_input, cmd_file }, { logfile, opt.lfilename, cmd_file }, diff --git a/src/main.c b/src/main.c --- a/src/main.c +++ b/src/main.c @@ -189,6 +189,7 @@ static struct cmdline_option option_data { input-file, 'i', OPT_VALUE, input, -1 }, { keep-session-cookies, 0, OPT_BOOLEAN, keepsessioncookies, -1 }, { level, 'l', OPT_VALUE, reclevel, -1 }, +{ limit-percent, 0, OPT_VALUE, limitpercent, -1 }, { limit-rate, 0, OPT_VALUE, limitrate, -1 }, { load-cookies, 0, OPT_VALUE, loadcookies, -1 }, { max-redirect, 0, OPT_VALUE, maxredirect, -1 }, @@ -453,6 +454,10 @@ Download:\n), -Q, --quota=NUMBERset retrieval quota to NUMBER.\n), N_(\ --bind-address=ADDRESSbind to ADDRESS (hostname or IP) on local host.\n), +N_(\ + --limit-percent=NUMBERlimit download rate to NUMBER percent of measured initial burst\n), +N_(\ + or rate specified by --limit-rate\n), N_(\ --limit-rate=RATE limit download rate to RATE.\n), N_(\ diff --git a/src/options.h b/src/options.h --- a/src/options.h +++ b/src/options.h @@ -115,6 +115,8 @@ struct options double waitretry; /* The wait period between retries. - HEH */ bool use_robots; /* Do we heed robots.txt? */ + wgint limit_percent; /* Limit the download rate to this percentage + of initial measured burst rate. */ wgint limit_rate; /* Limit the download rate to this many bps. */ SUM_SIZE_INT quota; /* Maximum file size to download and diff --git a/src/retr.c b/src/retr.c --- a/src/retr.c +++ b/src/retr.c @@ -86,14 +86,61 @@ limit_bandwidth (wgint bytes, struct pti limit_bandwidth (wgint bytes, struct ptimer *timer) { double delta_t = ptimer_read (timer) - limit_data.chunk_start; - double expected; + double expected= 0.0; limit_data.chunk_bytes += bytes; + static wgint measured_limit= 0; + + wgint limit= 0; + + if ( opt.limit_rate ) + { +limit= opt.limit_rate; + +if ( opt.limit_percent ) +{ + limit= limit * opt.limit_percent / 100; +} +DEBUGP((fixed limit governs: %lld bps\n, limit)); + } + else + if ( opt.limit_percent ) + { +if ( ! measured_limit ) +{ + static double total_sec= 0.0; + static wgint total_bytes= 0; + + total_sec += delta_t; + total_bytes += bytes; + const double MEASURE_SEC= 15.0; + + if ( total_sec MEASURE_SEC ) + { +DEBUGP((After %.3f seconds we saw %lld bytes so our measured limit is %lld bps\n, +total_sec / 1000, total_bytes, measured_limit )); +measured_limit = total_bytes / total_sec * opt.limit_percent / 100.0; + } +} + +if ( measured_limit ) +{ + if ( !limit || measured_limit limit ) + { +limit= measured_limit; + } +} + } + + if ( limit ) + { + /* Calculate the amount of time we expect downloading the chunk - should take. If in reality it took less time, sleep to - compensate for the difference. */ - expected = (double) limit_data.chunk_bytes / opt.limit_rate; + should take at this fixed rate. If in reality it took less time, + sleep to compensate for the difference. */ + + expected = (double) limit_data.chunk_bytes / limit; if (expected delta_t) { @@ -127,6 +174,8 @@ limit_bandwidth (wgint bytes, struct pti limit_data.sleep_adjust = -0.5; } + } + limit_data.chunk_bytes = 0; limit_data.chunk_start = ptimer_read (timer); } @@ -229,13 +278,13 @@ fd_read_body (int fd, FILE *out, wgint t progress_interactive = progress_interactive_p (progress); } - if (opt.limit_rate) + if (opt.limit_rate || opt.limit_percent) limit_bandwidth_reset (); /* A timer is needed for tracking progress, for throttling, and for tracking elapsed time. If either of these are requested, start the timer. */ - if (progress || opt.limit_rate || elapsed) + if (progress || opt.limit_rate || opt.limit_percent || elapsed) { timer = ptimer_new (); last_successful_read_tm = 0; @@ -286,7