-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
asm c wrote:
I've recently been using wget, and got it working for the most part, but
there's one issue that's really been bugging me. One of the parameters I
use is '-R *action=*,*oldid=*' (side note on the platform: ZSH on
NetBSD on the SDF public access unix system, although I've also used it
on windows with the same result). The purpose of this parameter is so
that, when wget crawls a mid-sized wiki I'd like to have a local copy
of, it doesn't bother with all the history pages, edit pages, and so
forth. Not downloading these would save me an enormous amount of time.
Unfortunately, the parameter is ignored until after the php page is
downloaded. So, because it waits until it's downloaded to delete it,
using the param doesn't really help at all.
Does anyone know how I can stop wget from even downloading matching pages?
Well, you don't mention it, but I'll assume that those patterns occur in
the query string portion of the URL: that is, they follow a question
mark (?) that appears at some point.
Unfortunately, the -R and -A options only apply to the filename
portion of the URL: that is, whatever falls between the first question
mark, and the first preceding slash (/). Confusingly, it is also then
applied _after_ files are downloaded, to determine whether they should
be deleted after the fact: so Wget probably downloads those files you
really wish it wouldn't, and then deletes them afterwards anyway.
Worse, there's no way around this, currently. This is part of a suite of
problems that are currently slated to be addressed soon. The most
pertinent to your problem, though, is the need for a way to match
against query strings. I'm very much hoping to get around to this before
the next major Wget release, version 1.12. It's being tracked here:
https://savannah.gnu.org/bugs/index.php?22089
If you add yourself to the Cc list, you'll be able to follow along on
its progress.
- --
Cheers!
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFIr55d7M8hyUobTrERAu4KAJsHmDTZ46ioEGOTprdE/aTGrj853QCfet84
+c+npJnPwC/86/rLpn5rB8s=
=abdv
-END PGP SIGNATURE-