Micah,
Yes, I was thinking of a library, not realizing how difficult that
would be, as I have never looked at the wget source code. Also, I am new to
Java. I know that there is a lot of built-in support for HTTP etc., but
I`ve only used a few things. After looking at a couple of HTML and XHTML
files, I think my needs might be met if I download them and make a few
substitutions (hrefs, img src`s, etc.) for absolute or local file-based
references.
I wanted to avoid single downloads, so that is why the "-O" option
will not suffice.
Regardless, wget has a lot of nice features and plans for good
improvements. While it won`t yet meet this one need, I will certainly
continue to use it for other purposes.
Thanks, Alan
----- Original Message -----
From: "Micah Cowan" <[EMAIL PROTECTED]>
To: "Alan Thomas" <[EMAIL PROTECTED]>
Cc: <[email protected]>
Sent: Thursday, October 04, 2007 12:55 PM
Subject: Re: Software interface to wget
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Alan Thomas wrote:
> > Idea for future wget versions: It would be nice if I could
> > invoke wget programmatically and have options like returning data in
> > buffers versus files (so data can be searched and/or manipulated in
> > memory),
>
> This can already be done by using wget's -O switch, which directs the
> output to a specified file (including standard output). A wrapper
> program could simply read wget's stdout directly into a buffer. However,
> - -O is only really useful for single downloads, as there is no
> delineation between separate files. And, I'll admit that I'm not clear
> how easy this is to do with "100% Pure Java"; it's quite straightforward
> on Unix systems in most languages.
>
> > Then it could be more easily and seamlessly
> > integrated into other software that needs this capability. I would
> > especially like to be able to invoke wget from Java code.
>
> It sounds to me like you're asking for a library version of Wget. There
> aren't specific plans to support this at the moment, and I'm not sure
> how much it'd really buy you: high level programming languages such as
> Java, Python, Perl, etc, tend to ship with good HTTP and HTML-parsing
> libraries, in which case rigging your own code to do a good chunk of
> what Wget does, is probably less work than trying to adapt Wget into
> library form. I'm not saying I'm ruling it out, but I'd need to hear
> some good cases for it, in contrast to using what's already available on
> those platforms.
>
> However, some changes are in the works (early early planning stages) for
> Wget to sport a plugin architecture, and if a bit of glue to call out to
> higher-level languages is added, plugins written in languages such as
> Java wouldn't be a big sretch. It may well be that restructuring Wget as
> a library instead of as a standalone app that runs plugins, may be a
> better solution; it bears discussion.
>
> Also planned is a more flexible output system, allowing for arbitrary
> formatting of downloaded resources (such as .mht's, or tarballs, or
> whatever), making delineation in a single output stream possible; also,
> a metadata system for preserving information about what files have been
> completely downloaded and which were interrupted, what their original
> URLs were, etc.
>
> All of this, however, is a long way from even really being started,
> especially given our current developer resources.
>
> - --
> HTH¸
> Micah J. Cowan
> Programmer, musician, typesetting enthusiast, gamer...
> http://micah.cowan.name/
>