-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 It appears that some people (including myself) are confused by the fact that wget will download files that match a rejection pattern (or fail to match an accept pattern), if the file type is text/html.
The manual says: "Note that these two options do not affect the downloading of HTML files; Wget must load all the HTMLs to know where to go at all--recursive retrieval would make no sense otherwise. That might potentially apply to brain-dead uses such as -Rhtml, but what about -R '*cgi-bin* and the like? One user (Frank Lui, Cc'd) recently submitted a bug report/complaint that despite a reject list of "*\?rev*,*\?sortcol*,*\?raw*,*\?skin*,*\?template*", wget was downloading tons of files whose URIs differed only in their use of such parameters. I believe Frank was just trying to mirror a wiki, with just the current versions of the wiki (so, not downloading other revisions, sorting variants, different actions, etc). This is a very reasonable and common sort of expectation, and AFAICT there is no way to accomplish this with current wget. This seems like a problem. Is there any real reason that we can't just always reject files if they match the reject list? Or, would it be worth adding an extra option to allow even HTML files to be skipped? - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGzM017M8hyUobTrERCIXJAJ4gsfxbogYnr+jKS6a4scKh8TmG1QCeIra0 hBZ/w0LaiSftI0R3nSbwlfQ= =928c -----END PGP SIGNATURE-----