Hello, I am trying to mirror my website using wget. The site is driven by a Wiki software and therefore contains a couple of publicly accessible interactive pages which I want to exclude from the download, e.g.:
http://www.daemon.de/podwiki.pl?page=ShorthandSandbox&state=checkout&revision=1.11 This link checks an old version of some (unprotected) page out and overwrites the current revision of this page. For the site itself, apache mod_rewrite rules are installed to make it possible to access the content via normal urls, e.g.: http://www.daemon.de/ShorthandSandbox Now I want wget to ignore every interactive page, I tried: wget -vm --exclude-directories='*.pl*' http://www.daemon.de But it still fetches interactive stuff: --23:37:55-- http://www.daemon.de/podwiki.pl?page=PodWikiIndex&entry=AutoLoadPrint => `www.daemon.de/podwiki.pl?page=PodWikiIndex&entry=AutoLoadPrint' Reusing connection to www.daemon.de:80. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] I digged a little bit through the code and found, that somewhere it calls the function "fnmatch()" to decide if a given url matches an exclusion glob. Whatever really happens - I dont understand. But I tried the mentioned function and it normally works as I would expect: '*.pl*' matches 'podwiki.pl?page=PodWikiIndex'. IMHO this is a bug, seems that wget ignores the fnmatch() result anyway for some reason. kind regards, Tom -- Thomas Linden (http://www.daemon.de/) tom at co dot daemon dot de $_=`perl -v`;s;^.*ll;;s;$^=unpack"u", "'8V]D;')E<```";s;\W;;gs;$/=7* ($^=~s;.;;g);%^=map{$_=>1}split//,lc;$_=join$\, (sort keys(%^))[map{ ord($_)-$/}split//,'[EMAIL PROTECTED]:7C1A7C=1:35<7C'];s"0(.)" \U$1"g;print;
