Piotr Stankiewicz wrote:
Hello!
I'm using wget for windows version 1.10.2.
I'm trying to download the contents of my photography site. For doing that I
created the following command:
wget --wait
2 --random-wait -r -l7 -H -p --convert-links --html-extension -Dpbase.com --
exclude-domains forum.pbase.com,search.pbase.com --no-parent -e robots=off
http://www.pbase.com/piotrstankiewicz
(I had to use -H option as the photos are placed at other servers that
www.pbase.com)
Unfortunately wget seems to ignore --no-parent option as it starts to
download also www.pbase.com/index.html
www.pbase.com/help.hmtl
documents and others placed in the main directory. I have impression it's
some kind of bug, although I'm not definitely wget expert. Could you try to
verify it please?
hi piotr,
both the url you specified:
http://www.pbase.com/piotrstankiewicz
and the urls you don't want to retrieve:
http://www.pbase.com/help.html
http://www.pbase.com/index.html
reside in the same directory, so the --no-parent option can't help you.
you should probably try to append '/' to the first url:
wget --wait 2 --random-wait -r -l7 -H -p --convert-links
--html-extension -Dpbase.com --exclude-domains
forum.pbase.com,search.pbase.com --no-parent -e robots=off
http://www.pbase.com/piotrstankiewicz/
this command should work.
Additionnaly I tried to use the option -R to exclude those files. In such a
case wget downloads those files and deletes it after but it follows the
links from those files (which is unwated by me). I found the information
that it's by design.
correct. in recursive mode wget retrieves undesired html files to parse
them for other urls to download, and deletes them after parsing.
But what about introducing any other option precising if the links from
the unwated documents (specified with -R) should be followed or no (in
some cases it's not welcome).
i agree. users should be able to tell wget not to retrieve undesired
html files at all.
--
Aequam memento rebus in arduis servare mentem...
Mauro Tortonesi http://www.tortonesi.com
University of Ferrara - Dept. of Eng. http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linux http://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it