I am running a PC version of wget.

 

===============================================

C:\> wget --version
GNU Wget 1.9

Copyright (C) 2003 Free Software Foundation, Inc.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

Originally written by Hrvoje Niksic <[EMAIL PROTECTED]>.

===============================================

 

I'm on Win2K.

 

I have observed 2 seeming failures.

 

1) a failure to spider a site where the link URLs do not show file extensions, i.e.

http://www.domain.com/filename_no_extension

Only the homepage is retrieved.

 

2) a failure to spider past the second page fetched on a completely static html site where the homepage links to a sitemap, which then links to everything else. The fetch stops at the sitemap and goes no further.

 

My own spider (useless for building mirrors) spiders both sites just fine, and so does HTTrack (PC version).

 

I have tried a fairly large variety of invocations including:

 

wget -q --mirror -p --html-extension --base=./ -k -P ./ http://www.MyDomain.com

 

wget  --mirror -p --html-extension --base=./ --convert-links --directory-prefix ./ http://MyDomain.com --verbose

wget  -r -N -l 100 -nr -p --html-extension --base=./ --convert-links --directory-prefix ./ http://MyDomain.com --verbose   


C:\temp>wget  -r -N -l 100 -nr -p --html-extension --base=./ --convert-links --directory-prefix ./ http://MyDomain.com --verbose -erobots=off -m -U "Mozilla/5.0 (compatible; Konqueror/3.2; Linux)"

The results are consistent.

 

An associate of mine, in fact the person who supplied me with the exe, has gotten the same results on the same sites.

 

Please advise.

 

Thanks

Joe.

 


Yahoo! FareChase - Search multiple travel sites in one click.

Reply via email to