Am Sonntag, 7. Juni 2015, 08:19:28 schrieb Tony Lewis:
On Friday, June 05, 2015 1:24 PM, Tim Rühsen wrote:
First, I have not dug into the source code to see how -H is implemented.
However, it makes sense to me that one ought to be able to specify
both -H and -D together.
-H (=all domains)
to exclude some sites use --exclude-domains domain-list
wget --help says about -H: go to foreign hosts when recursive.
It doesn't say that when using -H one *must* take every foreign host that
exists on the Internet and I'm arguing that such an interpretation does not
make sense.
That is what -H is for :-)
Well, not *every* foreign host, but *every* foreign host that appears in
downloaded, parsable files (HTML and CSS files).
wget --help just gives a short help, not a full description. See 'man wget'
for the extended description. If there is something unclear, we should fix it.
Using -H always has the chance to 'download the whole internet'. That's
normally not what you want and thus -H is not enabled by default.
One ought to be able to request that wget go to foreign hosts without that
implying that wget mirror the entire Internet. One obvious way to limit
which foreign hosts are mirrored is to use -H in combination with -D.
Consider this scenario: I want to mirror a site including the images
that are stored in a sub-domain, but I don't want to mirror every
external site referenced by the site. So I would try this:
wget --mirror http://www.somesite.com -H -D www.somesite.com
images.somesite.com
You can also play with:
-A acclist --accept acclist
-R rejlist --reject rejlist
I can play with lots of wget options, but in the scenario described I want
*all* files from two hosts, but not every other foreign host that might be
referenced by one of those hosts.
What command line would you use for the scenario described?
Let's say you want all from the two hosts example1.com and example2.com:
wget --mirror example1.com example2.com
Regards, Tim
signature.asc
Description: This is a digitally signed message part.