Hash: SHA1

Micah Cowan wrote:
> * A "getter" command is mentioned more than once in the above. Note that
> this is not mutually exclusive with the concept of letting a single
> process govern connection persistence, which would handle the real work;
> the "getter" would probaby be a tool for communicating with the main driver.


> - Using existing tools to implement protocols Wget doesn't understand
> (want scp support? Just register it as an scp:// scheme handler), and
> instantly add support to Wget for the latest, greatest protocols without
> hacking Wget or waiting until we get around to implementing it.

Of course, one drawback is that it then becomes difficult to sanely
handle a feature for multiple simultaneous connections, or even
persistent connections, when outside programs come into play. Using a
getter we have control over, that can communicate with a
connection-managing program, would allow this to work, but that won't
work with outside programs that aren't "in the know", such as the scp
command, or other "getter" programs. You can fork multiple scps for
"multiple connections", but what will keep the number of simultaneous
connections to a reasonable limit?

Plus, even the idea of our "own" getter program communicating via a Unix
socket or some such to a "connections manager" program, irks me: it
obliterates the independence that makes pipelines useful. I guess, to be
useful, a pipelined Wget would need to have wholly independent tools;
but the loss of persistent connections would be too great a loss to
bear, I think (not that Wget handles them particularly well now:
HTTP/1.1 should significantly improve it, though).

Still, there were already plans to allow arbitrary "content handler"
commands, and "URL filters"; we can certainly continue to move in that
direction. We could still split off the HTML and CSS parsers as
completely autonomous (and interchangeable with alternatives) programs.
But it seems to me that content-_fetching_ (protocol support) will need
to continue to be fully integrated in Wget's core. Decisions on whether
URLs are followed or not could also be outsourced.

Previously, I said that we might lose Windows support by making Wget
more pipeline-y; but that's not necessarily true. It's just harder to
implement in Windows, but can be done. Hell, if need be, we could have
Wget write input to a file, then have the parser read it and spit out
another file. That's obviously lame, but OTOH it's how Wget already
parses HTML currently (except that no additional programs are used). I
suspect, though, that such a program would see a Unix-oriented release
some time before the Windows port would appear; unless there were
ongoing collaboration on a Windows port simultaneous to the Unix-ish

If in fact everything except for connections could be handled as an
external command, then there might be little advantage to be gained by
library-izing Wget, and it might make more sense to leaving Wget as a
program, and letting connection handlers be plugins (which are expected
to use Wget's connection management system, rather than direct connections).

Such a project should still probably get a new name (I was going to say
"be a fork", but it'd probably be a rearchitecture anyway, with little
in common to current Wget); Wget proper should continue to be a project
that appeals to folks that need a tool that's sufficiently lightweight
to install as a core system component, without a lot of fluff (or at
least, not too much more fluff than it already has).

BTW, I added a couple new name concepts to
http://wget.addictivecode.org/Wget2Names: "xget" (x being the letter
after w), and "niwt" (which I like best so far: Nifty Integrated Web Tools).

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


Reply via email to