HTTP does not provide a dirlist command, so wget parses html to find other files
it should download. Note: HTML not XML. I suspect that is the problem.

Max.


Funk Gabor wrote:
> I recently found that during a (wget) "mirror", not all the files are
> downloaded. (wget v1.8.2 / debian) For example:
>
> wget --mirror http://www.jeannette.hu
> downloads some files, but for example ./saj_elemei will
> only contain filelist.xml (with the following content).
>
> <xml xmlns:o="urn:schemas-microsoft-com:office:office">
>  <o:MainFile HRef="../saj.htm"/>
>  <o:File HRef="image001.jpg"/>
>  <o:File HRef="image002.gif"/>
>  <o:File HRef="image003.gif"/>
>  <o:File HRef="image004.gif"/>
>  <o:File HRef="filelist.xml"/>
> </xml>
>
> However, if I issue a
> wget [--no-parent] --mirror http://www.jeannette.hu/saj_elemei
> then the following will also gets downloaded.
>
> -rw-r--r--    1 root     root          257 Oct 29  2001 filelist.xml
> -rw-r--r--    1 root     root         2506 Oct 29  2001 image001.jpg
> -rw-r--r--    1 root     root        23343 Oct 29  2001 image001.png
> -rw-r--r--    1 root     root         4959 Oct 29  2001 image002.gif
> -rw-r--r--    1 root     root         1053 Oct 29  2001 image003.gif
> -rw-r--r--    1 root     root         4246 Oct 29  2001 image004.gif
> -rw-r--r--    1 root     root        27068 Oct 29  2001 image004.wmz
> -rw-r--r--    1 root     root        17627 Oct 29  2001 image006.gif
> -rw-r--r--    1 root     root         1447 Aug 15 16:33 index.html
> -rw-r--r--    1 root     root         1447 Aug 15 16:33 index.html?D=A
> -rw-r--r--    1 root     root         1447 Aug 15 16:33 index.html?D=D
> -rw-r--r--    1 root     root         1447 Aug 15 16:33 index.html?M=A
> -rw-r--r--    1 root     root         1447 Aug 15 16:33 index.html?M=D
> -rw-r--r--    1 root     root         1447 Aug 15 16:33 index.html?N=A
> -rw-r--r--    1 root     root         1447 Aug 15 16:33 index.html?N=D
> -rw-r--r--    1 root     root         1447 Aug 15 16:33 index.html?S=A
> -rw-r--r--    1 root     root         1447 Aug 15 16:33 index.html?S=D
>
> My goal is to have the most files (eg: full retreive) of a site (with
> possibly using one command only...). I tried several other
> ftpmirroring program but they're racing for the "crappiest program on
> earth" title against each other. Is it wget's fault, or am I the dumb
> one and missed something somewhere?
>
> Thanks, Gabor

Reply via email to