-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 asm c wrote: > I've recently been using wget, and got it working for the most part, but > there's one issue that's really been bugging me. One of the parameters I > use is '-R "*action=*,*oldid=*"' (side note on the platform: ZSH on > NetBSD on the SDF public access unix system, although I've also used it > on windows with the same result). The purpose of this parameter is so > that, when wget crawls a mid-sized wiki I'd like to have a local copy > of, it doesn't bother with all the history pages, edit pages, and so > forth. Not downloading these would save me an enormous amount of time. > Unfortunately, the parameter is ignored until after the php page is > downloaded. So, because it waits until it's downloaded to delete it, > using the param doesn't really help at all. > > Does anyone know how I can stop wget from even downloading matching pages?
Well, you don't mention it, but I'll assume that those patterns occur in the "query string" portion of the URL: that is, they follow a question mark (?) that appears at some point. Unfortunately, the -R and -A options only apply to the "filename" portion of the URL: that is, whatever falls between the first question mark, and the first preceding slash (/). Confusingly, it is also then applied _after_ files are downloaded, to determine whether they should be deleted after the fact: so Wget probably downloads those files you really wish it wouldn't, and then deletes them afterwards anyway. Worse, there's no way around this, currently. This is part of a suite of problems that are currently slated to be addressed soon. The most pertinent to your problem, though, is the need for a way to match against query strings. I'm very much hoping to get around to this before the next major Wget release, version 1.12. It's being tracked here: https://savannah.gnu.org/bugs/index.php?22089 If you add yourself to the Cc list, you'll be able to follow along on its progress. - -- Cheers! Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIr55d7M8hyUobTrERAu4KAJsHmDTZ46ioEGOTprdE/aTGrj853QCfet84 +c+npJnPwC/86/rLpn5rB8s= =abdv -----END PGP SIGNATURE-----