Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)

2007-11-01 Thread Tony Godshall
On 10/31/07, Micah Cowan [EMAIL PROTECTED] wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 Tony Godshall wrote:
  On 10/30/07, Micah Cowan [EMAIL PROTECTED] wrote:
  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA256
 
  Tony Godshall wrote:
  Perhaps the little wget could be called wg.  A quick google and
  wikipedia search shows no real namespace collisions.
  To reduce confusion/upgrade problems, I would think we would want to
  ensure that the traditional/little Wget keeps the current name, and
  any snazzified version gets a new one.
 
  Please not another -ng.  How about wget2 (since we're on 1.x).  And
  the current one remains in 1.x.

 I agree that -ng would not be appropriate. But since we're really
 talking about two separate beasts, I'd prefer not to limit what we can
 do with Wget (original)'s versioning. Who's to say a 2.0 release of the
 light version will not be warranted someday?

 At any rate, the snazzy one looks to be diverging from classic Wget in
 some rather significant ways, in which case, I'd kind of prefer to part
 names a bit more severely than just wget-ng or wget2. Reget,
 perhaps: that name could be both Recursive Get (describing what's
 still its primary feature), or Revised/Re-envisioned Wget. :)

 I think, too, that names such as wget2 are more often things that
 packagers (say, Debian) do, when they want to include
 backwards-incompatible, significantly new versions of software, but
 don't want to break people's usage of older stuff. Or, when they just
 want to offer both versions. Cf apache2 in Debian.

  And then eventually everyone's gotten used to used to and can't live
  without the new bittorrent-like almost-multithreaded features. ;-)

 :)

Pget.

Parallel get.

Tget.

Torrent-like-get.

Bget.

Bigger get.

BBWget.

Bigger Better wget.

OK, ok sorry.


Re: RFE: run-time change of limit-rate multi-stream download

2007-11-01 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

L Walsh wrote:
 Don't know if anyone else has this problem, but sometimes when I'm
 downloading large files (500MB; most recently 4.2GB), I want to change the
 limit-rate without terminating the download and restarting.
 
 For example, during the day when I'm on my computer, I might not want it
 to use more than ~1/3rd my bandwidth (~100KB/s), but if I'm going out to
 lunch or going to bed for the night, I might want to give it the full
 bandwidth.
 
 Even then, say I can't sleep (a too frequent occurrence), and I want
 to get back on and read some website or another -- I'd like to be
 able to reduce its bandwidth.

Heh, I can relate about the can't sleep. Being a caffeine addict
probably plays a role in that.

An interesting idea that Tony Lewis came up with was the ability to send
Wget an interrupt (Ctrl-C), bringing up an interactive mode that allows
one to modify configuration on-the-fly. He told me about it in response
to an issue report I filed suggesting that Wget allow users the option
to skip the current file on interrupt, rather than quit completely; or
to exit gracefully (for instance, by completing any outstanding -k
conversions). That core functionality is expected to go in at some point
for 1.12, but probably not the config-altering; that might be easier to
put in for the snazzier re-envisioning of Wget, though.

 Say one runs the first wget.  Lets say it is a simple 1-DVD download.
 Then you start a 2nd download of another DVD.  Instead of 2 copies
 of wget running and competing with each other, what if the 2nd copy
 told the 1st copy about the 2nd download, and the 2nd download
 was 'enqueued' in a 'line' behind 1st.

Yes, I've thought of this same thing. Some Wget that works down a queue,
and other Wget invocations add to the queue. Or, in your first scenario,
modifies certain configuration parameters of the first Wget.

At this point in time, I think that this would make a good candidate for
a separate plugin module (though, probably one that ships with Wget).
Though, of course, even as a plugin, the core bits of Wget (and other
plugin modules) would still need to be written with the consideration
that configuration settings might not remain constant. However, there
are ways in which we could make this relatively easy to accomplish.

The configuration stuff is already going to change with the new Wget
pretty considerably: mainly in the idea for permitting configuration
settings that apply only to specific URIs. The abstraction that would be
necessary for such a thing could probably also accommodate on-the-fly
configuration changes.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHKpRH7M8hyUobTrERCEYyAJ9AwohHmMqm4JBm/SOMxqp/bqWBLwCfWSL3
4orJeR8Q3Z7QYfBLhPKE/bI=
=hxNX
-END PGP SIGNATURE-


Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)

2007-11-01 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

L Walsh wrote:
 Honest -- I hadn't read all the threads before my post...
 
 Great ideas Micah! :-)
 
 On the idea of 2 wgets -- there is a clever way to get
 by with 1.  Put the optional functionality into separate
 run-time loadable files.  SGI's Unix (and MS Windows) do this.
 The small wget then checks to see which libraries are
 accessible -- those that aren't simply mean the features
 for those libs are disabled.  In a way, it's like how
 'vim' can optionally load perllib or python-lib at runtime
 (at least under windows) if they are present.  If they are
 not present, those features are disabled.  Too bad linux
 didn't take this route with its libraries (have asked,
 it is possible, but there's no framework for it, and
 that might need work as well.

I'm not sure what you mean about the linux thing; there are many
instances of runtime loadable modules on Linux. dlopen() and friends are
the standard way of doing this on any Unix kernel flavor.

Keeping a single Wget and using runtime libraries (which we were terming
plugins) was actually the original concept (there's mention of this in
the first post of this thread, actually); the issue is that there are
core bits of functionality (such as the multi-stream support) that are
too intrinsic to separate into loadable modules, and that, to be done
properly (and with a minimum of maintenance commitment) would also
depend on other libraries (that is, doing asynchronous I/O wouldn't
technically require the use of other libraries, but it can be a lot of
work to do efficiently and portably across OSses, and there are already
Free libraries to do that for us).

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHKo867M8hyUobTrERCBxGAJ44coJN48fRGhORfYv+uN2J6RVz7gCePxva
UYeGYTW0sfY+QRcGkpSB9Ls=
=wOVv
-END PGP SIGNATURE-


Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)

2007-11-01 Thread L Walsh

Honest -- I hadn't read all the threads before my post...

Great ideas Micah! :-)

On the idea of 2 wgets -- there is a clever way to get
by with 1.  Put the optional functionality into separate
run-time loadable files.  SGI's Unix (and MS Windows) do this.
The small wget then checks to see which libraries are
accessible -- those that aren't simply mean the features
for those libs are disabled.  In a way, it's like how
'vim' can optionally load perllib or python-lib at runtime
(at least under windows) if they are present.  If they are
not present, those features are disabled.  Too bad linux
didn't take this route with its libraries (have asked,
it is possible, but there's no framework for it, and
that might need work as well.

My 2 cents,
Linda



RFE: run-time change of limit-rate multi-stream download

2007-11-01 Thread L Walsh

Don't know if anyone else has this problem, but sometimes when I'm
downloading large files (500MB; most recently 4.2GB), I want to change the
limit-rate without terminating the download and restarting.

For example, during the day when I'm on my computer, I might not want it
to use more than ~1/3rd my bandwidth (~100KB/s), but if I'm going out to
lunch or going to bed for the night, I might want to give it the full
bandwidth.

Even then, say I can't sleep (a too frequent occurrence), and I want
to get back on and read some website or another -- I'd like to be
able to reduce its bandwidth.

I'm not sure what the best way to relay the desired speed to the currently
running program -- I can think of more than one method, but none of them
'grab' me as elegant...

Uh...well there is one, but it's pretty ambitious, perhaps dovetailing
into my other RFE - multi-threaded downloading.

Suppose wget was multi-threaded.  I'm saying multithreaded out
of some ignorance -- as it might be more efficient to simply make
it multi-streamed, and use poll(3) to wait on the multiple
streams and process them as they become available.  But whichever,
I believe the idea of multiple downloads going at the same time is
conceptually the same.

Say one runs the first wget.  Lets say it is a simple 1-DVD download.
Then you start a 2nd download of another DVD.  Instead of 2 copies
of wget running and competing with each other, what if the 2nd copy
told the 1st copy about the 2nd download, and the 2nd download
was 'enqueued' in a 'line' behind 1st.

If both downloads are from the same site, there isn't much to do and
the most efficient download would be (I think(?)) finish the first,
then do the 2nd.  Alternatively, if the size of both downloads was known
in advance (as is true for http, though not for ftp), it _could_
download whichever file had less data yet to download.

If the downloads are from different sites, then depending on the
download speed of each file -- if a specific site is slow (say
your max rate is 300, and your download is running at an average
speed of 150, then it probably would be safe to do another
download (from a different site), and intersperse the writes...

There are so many possible priority or scheduling algorithms,
I can't possibly think of them all.

To complete the idea -- If wget has multiple files it needs
to download (after parsing an HTML file, looking for page-requisites
or deeper recursion (if enabled), it could also use the
multi-threaded download feature to download those in parallel.

A recursive download can quickly 'blossom' as each level of
a directory tree is downloaded and expanded.  Many of these
files may be short files.  A significant delay or wait time
is incurred when a download has to 'pause' and wait for
the server process the request for another file and start
sending it.  Being able to run more than one thread means
your network can still be running full throttle in downloading
in an alternate stream, while one thread (or stream) is
waiting for the server to respond.

Anyway...that's about it.  Don't know how difficult it would
be to add.  Seems the 'download' speed change could be
'simply' implemented via some sort of message-passing
mechanism, using a 2nd invocation of wget to send the message
to the first.  But that's still somewhat vague (pass message
how?).

Comments?  Good ideas? Bad ideas?  etc...
Linda


Re: RFE: run-time change of limit-rate multi-stream download

2007-11-01 Thread Steven M. Schweda
From: L Walsh

 Say one runs the first wget.  Lets say it is a simple 1-DVD download.
 Then you start a 2nd download of another DVD.  Instead of 2 copies
 of wget running and competing with each other, what if the 2nd copy
 told the 1st copy about the 2nd download, and the 2nd download
 was 'enqueued' in a 'line' behind 1st.

   Perhaps you need an operating system.  On VMS, one could create a
wget-specific batch queue, set its job limit to one, and submit all the
non-compete wget jobs to it.  The queue manager would run the submitted
jobs one at a time, first-come=first-served, with the terminal output
logged to a file (of your choice).  If you ask (SUBMIT /NOTIFY), you can
get a message broadcast to your terminal(s) when a job ends.

  http://h71000.www7.hp.com/index.html

(Where would you like to put the axle on that new wheel?)



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547