HTTP does not provide a dirlist command, so wget parses html to find other files it should download. Note: HTML not XML. I suspect that is the problem.
Max. Funk Gabor wrote: > I recently found that during a (wget) "mirror", not all the files are > downloaded. (wget v1.8.2 / debian) For example: > > wget --mirror http://www.jeannette.hu > downloads some files, but for example ./saj_elemei will > only contain filelist.xml (with the following content). > > <xml xmlns:o="urn:schemas-microsoft-com:office:office"> > <o:MainFile HRef="../saj.htm"/> > <o:File HRef="image001.jpg"/> > <o:File HRef="image002.gif"/> > <o:File HRef="image003.gif"/> > <o:File HRef="image004.gif"/> > <o:File HRef="filelist.xml"/> > </xml> > > However, if I issue a > wget [--no-parent] --mirror http://www.jeannette.hu/saj_elemei > then the following will also gets downloaded. > > -rw-r--r-- 1 root root 257 Oct 29 2001 filelist.xml > -rw-r--r-- 1 root root 2506 Oct 29 2001 image001.jpg > -rw-r--r-- 1 root root 23343 Oct 29 2001 image001.png > -rw-r--r-- 1 root root 4959 Oct 29 2001 image002.gif > -rw-r--r-- 1 root root 1053 Oct 29 2001 image003.gif > -rw-r--r-- 1 root root 4246 Oct 29 2001 image004.gif > -rw-r--r-- 1 root root 27068 Oct 29 2001 image004.wmz > -rw-r--r-- 1 root root 17627 Oct 29 2001 image006.gif > -rw-r--r-- 1 root root 1447 Aug 15 16:33 index.html > -rw-r--r-- 1 root root 1447 Aug 15 16:33 index.html?D=A > -rw-r--r-- 1 root root 1447 Aug 15 16:33 index.html?D=D > -rw-r--r-- 1 root root 1447 Aug 15 16:33 index.html?M=A > -rw-r--r-- 1 root root 1447 Aug 15 16:33 index.html?M=D > -rw-r--r-- 1 root root 1447 Aug 15 16:33 index.html?N=A > -rw-r--r-- 1 root root 1447 Aug 15 16:33 index.html?N=D > -rw-r--r-- 1 root root 1447 Aug 15 16:33 index.html?S=A > -rw-r--r-- 1 root root 1447 Aug 15 16:33 index.html?S=D > > My goal is to have the most files (eg: full retreive) of a site (with > possibly using one command only...). I tried several other > ftpmirroring program but they're racing for the "crappiest program on > earth" title against each other. Is it wget's fault, or am I the dumb > one and missed something somewhere? > > Thanks, Gabor
