-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Okay, so there's been a lot of thought in the past, regarding better extensibility features for Wget. Things like hooks for adding support for traversal of new Content-Types besides text/html, or adding some form of JavaScript support, or support for MetaLink. Also, support for being able to filter results pre- and post-processing by Wget: for example, being able to do some filtering on the HTML to change how Wget sees it before parsing for links, but without affecting the actual downloaded version; or filtering the links themselves to alter what Wget fetches.
The original concept before I came onboard, was plugin modules. After some thought, I'd decided I didn't like this overly much, and have mainly been leading toward the idea of a next-gen Wget-as-a-library thing, probably wrapping libcurl (and with a command-client version, like curl). This obviously wouldn't have been a Wget any more, so would have been a separate project, with a different name. However, another thing that's been vaguely itching at me lately, is the fact that Wget's design is not particularly unix-y. Instead of doing one thing, and doing it well, it does a lot of things, some well, some not. So the last couple days I've been thinking, maybe "wget-ng" should be a suite of interoperating shell utilities, rather than a library or a single app. This could have some really huge advantages: users could choose their own html-parser to use, they can plug in parsers for whatever filetypes they desire, people who want to implement exotic features can do that... Of course, at this point we're talking about something that's fundamentally different from "Wget". Just as we were when we were considering making a next-gen library version. It'd be a completely separate project. And I'm still not going to start it right away (though I think some preliminary requirements and design discussions would be a good idea). Wget's not going to die, nor is everyone going to want to switch to some new-fangled re-envisioning of it. But the thing everyone loves about Unix and GNU (and certainly the thing that drew me to them), is the bunch-of-tools-on-a-crazy-pipeline paradigm, which is what enables you to mix-and-match different tools to cover the different areas of functionality. Wget doesn't fit very well into that scheme, and I think it could become even much more powerful than it already is, by being broken into smaller, more discreet, projects. Or, to be more precise, to offer an alternative that does the equivalent. So far, the following principles have struck me as advisable for a project such as this: - The tools themselves, as much as possible, should be written in an easily-hackable scripting language. Python makes a good candidate. Where we want efficiency, we can implement modules in C to do the work. - While efficiency won't be the highest priority (else we'd just stick to the monolith), it's still important. Spawning off separate processes to each fetch their own page, initiating a new connection each time, would be a lousy idea. So, the architectural model should center around a "URL-getter" driver, that manages connections and such, reusing persistent ones as much as possible. Of course, there might be distinct commands to handle separate types of URLs, (or alternative methods for handling them, such as MetaLink), and perhaps not all of these would be able to do persistence (a dead-simple way to add support for scp, etc, might be to simply call the command-line program). - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIlEcX7M8hyUobTrERAqvSAJ9rx99xhU7Zo/xwbKXDbWCWp4jSQwCfbbQM zmY9j1zYuGq0eNkZnsqR+Jo= =8wcf -----END PGP SIGNATURE-----