Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)
On 10/31/07, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Tony Godshall wrote: On 10/30/07, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Tony Godshall wrote: Perhaps the little wget could be called wg. A quick google and wikipedia search shows no real namespace collisions. To reduce confusion/upgrade problems, I would think we would want to ensure that the traditional/little Wget keeps the current name, and any snazzified version gets a new one. Please not another -ng. How about wget2 (since we're on 1.x). And the current one remains in 1.x. I agree that -ng would not be appropriate. But since we're really talking about two separate beasts, I'd prefer not to limit what we can do with Wget (original)'s versioning. Who's to say a 2.0 release of the light version will not be warranted someday? At any rate, the snazzy one looks to be diverging from classic Wget in some rather significant ways, in which case, I'd kind of prefer to part names a bit more severely than just wget-ng or wget2. Reget, perhaps: that name could be both Recursive Get (describing what's still its primary feature), or Revised/Re-envisioned Wget. :) I think, too, that names such as wget2 are more often things that packagers (say, Debian) do, when they want to include backwards-incompatible, significantly new versions of software, but don't want to break people's usage of older stuff. Or, when they just want to offer both versions. Cf apache2 in Debian. And then eventually everyone's gotten used to used to and can't live without the new bittorrent-like almost-multithreaded features. ;-) :) Pget. Parallel get. Tget. Torrent-like-get. Bget. Bigger get. BBWget. Bigger Better wget. OK, ok sorry.
Re: RFE: run-time change of limit-rate multi-stream download
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 L Walsh wrote: Don't know if anyone else has this problem, but sometimes when I'm downloading large files (500MB; most recently 4.2GB), I want to change the limit-rate without terminating the download and restarting. For example, during the day when I'm on my computer, I might not want it to use more than ~1/3rd my bandwidth (~100KB/s), but if I'm going out to lunch or going to bed for the night, I might want to give it the full bandwidth. Even then, say I can't sleep (a too frequent occurrence), and I want to get back on and read some website or another -- I'd like to be able to reduce its bandwidth. Heh, I can relate about the can't sleep. Being a caffeine addict probably plays a role in that. An interesting idea that Tony Lewis came up with was the ability to send Wget an interrupt (Ctrl-C), bringing up an interactive mode that allows one to modify configuration on-the-fly. He told me about it in response to an issue report I filed suggesting that Wget allow users the option to skip the current file on interrupt, rather than quit completely; or to exit gracefully (for instance, by completing any outstanding -k conversions). That core functionality is expected to go in at some point for 1.12, but probably not the config-altering; that might be easier to put in for the snazzier re-envisioning of Wget, though. Say one runs the first wget. Lets say it is a simple 1-DVD download. Then you start a 2nd download of another DVD. Instead of 2 copies of wget running and competing with each other, what if the 2nd copy told the 1st copy about the 2nd download, and the 2nd download was 'enqueued' in a 'line' behind 1st. Yes, I've thought of this same thing. Some Wget that works down a queue, and other Wget invocations add to the queue. Or, in your first scenario, modifies certain configuration parameters of the first Wget. At this point in time, I think that this would make a good candidate for a separate plugin module (though, probably one that ships with Wget). Though, of course, even as a plugin, the core bits of Wget (and other plugin modules) would still need to be written with the consideration that configuration settings might not remain constant. However, there are ways in which we could make this relatively easy to accomplish. The configuration stuff is already going to change with the new Wget pretty considerably: mainly in the idea for permitting configuration settings that apply only to specific URIs. The abstraction that would be necessary for such a thing could probably also accommodate on-the-fly configuration changes. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHKpRH7M8hyUobTrERCEYyAJ9AwohHmMqm4JBm/SOMxqp/bqWBLwCfWSL3 4orJeR8Q3Z7QYfBLhPKE/bI= =hxNX -END PGP SIGNATURE-
Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 L Walsh wrote: Honest -- I hadn't read all the threads before my post... Great ideas Micah! :-) On the idea of 2 wgets -- there is a clever way to get by with 1. Put the optional functionality into separate run-time loadable files. SGI's Unix (and MS Windows) do this. The small wget then checks to see which libraries are accessible -- those that aren't simply mean the features for those libs are disabled. In a way, it's like how 'vim' can optionally load perllib or python-lib at runtime (at least under windows) if they are present. If they are not present, those features are disabled. Too bad linux didn't take this route with its libraries (have asked, it is possible, but there's no framework for it, and that might need work as well. I'm not sure what you mean about the linux thing; there are many instances of runtime loadable modules on Linux. dlopen() and friends are the standard way of doing this on any Unix kernel flavor. Keeping a single Wget and using runtime libraries (which we were terming plugins) was actually the original concept (there's mention of this in the first post of this thread, actually); the issue is that there are core bits of functionality (such as the multi-stream support) that are too intrinsic to separate into loadable modules, and that, to be done properly (and with a minimum of maintenance commitment) would also depend on other libraries (that is, doing asynchronous I/O wouldn't technically require the use of other libraries, but it can be a lot of work to do efficiently and portably across OSses, and there are already Free libraries to do that for us). - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHKo867M8hyUobTrERCBxGAJ44coJN48fRGhORfYv+uN2J6RVz7gCePxva UYeGYTW0sfY+QRcGkpSB9Ls= =wOVv -END PGP SIGNATURE-
Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)
Honest -- I hadn't read all the threads before my post... Great ideas Micah! :-) On the idea of 2 wgets -- there is a clever way to get by with 1. Put the optional functionality into separate run-time loadable files. SGI's Unix (and MS Windows) do this. The small wget then checks to see which libraries are accessible -- those that aren't simply mean the features for those libs are disabled. In a way, it's like how 'vim' can optionally load perllib or python-lib at runtime (at least under windows) if they are present. If they are not present, those features are disabled. Too bad linux didn't take this route with its libraries (have asked, it is possible, but there's no framework for it, and that might need work as well. My 2 cents, Linda
RFE: run-time change of limit-rate multi-stream download
Don't know if anyone else has this problem, but sometimes when I'm downloading large files (500MB; most recently 4.2GB), I want to change the limit-rate without terminating the download and restarting. For example, during the day when I'm on my computer, I might not want it to use more than ~1/3rd my bandwidth (~100KB/s), but if I'm going out to lunch or going to bed for the night, I might want to give it the full bandwidth. Even then, say I can't sleep (a too frequent occurrence), and I want to get back on and read some website or another -- I'd like to be able to reduce its bandwidth. I'm not sure what the best way to relay the desired speed to the currently running program -- I can think of more than one method, but none of them 'grab' me as elegant... Uh...well there is one, but it's pretty ambitious, perhaps dovetailing into my other RFE - multi-threaded downloading. Suppose wget was multi-threaded. I'm saying multithreaded out of some ignorance -- as it might be more efficient to simply make it multi-streamed, and use poll(3) to wait on the multiple streams and process them as they become available. But whichever, I believe the idea of multiple downloads going at the same time is conceptually the same. Say one runs the first wget. Lets say it is a simple 1-DVD download. Then you start a 2nd download of another DVD. Instead of 2 copies of wget running and competing with each other, what if the 2nd copy told the 1st copy about the 2nd download, and the 2nd download was 'enqueued' in a 'line' behind 1st. If both downloads are from the same site, there isn't much to do and the most efficient download would be (I think(?)) finish the first, then do the 2nd. Alternatively, if the size of both downloads was known in advance (as is true for http, though not for ftp), it _could_ download whichever file had less data yet to download. If the downloads are from different sites, then depending on the download speed of each file -- if a specific site is slow (say your max rate is 300, and your download is running at an average speed of 150, then it probably would be safe to do another download (from a different site), and intersperse the writes... There are so many possible priority or scheduling algorithms, I can't possibly think of them all. To complete the idea -- If wget has multiple files it needs to download (after parsing an HTML file, looking for page-requisites or deeper recursion (if enabled), it could also use the multi-threaded download feature to download those in parallel. A recursive download can quickly 'blossom' as each level of a directory tree is downloaded and expanded. Many of these files may be short files. A significant delay or wait time is incurred when a download has to 'pause' and wait for the server process the request for another file and start sending it. Being able to run more than one thread means your network can still be running full throttle in downloading in an alternate stream, while one thread (or stream) is waiting for the server to respond. Anyway...that's about it. Don't know how difficult it would be to add. Seems the 'download' speed change could be 'simply' implemented via some sort of message-passing mechanism, using a 2nd invocation of wget to send the message to the first. But that's still somewhat vague (pass message how?). Comments? Good ideas? Bad ideas? etc... Linda
Re: RFE: run-time change of limit-rate multi-stream download
From: L Walsh Say one runs the first wget. Lets say it is a simple 1-DVD download. Then you start a 2nd download of another DVD. Instead of 2 copies of wget running and competing with each other, what if the 2nd copy told the 1st copy about the 2nd download, and the 2nd download was 'enqueued' in a 'line' behind 1st. Perhaps you need an operating system. On VMS, one could create a wget-specific batch queue, set its job limit to one, and submit all the non-compete wget jobs to it. The queue manager would run the submitted jobs one at a time, first-come=first-served, with the terminal output logged to a file (of your choice). If you ask (SUBMIT /NOTIFY), you can get a message broadcast to your terminal(s) when a job ends. http://h71000.www7.hp.com/index.html (Where would you like to put the axle on that new wheel?) Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547