Re: [Bug-wget] Behaviour of spanning to accepted domains

2015-06-07 Thread Tim Rühsen
Am Sonntag, 7. Juni 2015, 08:19:28 schrieb Tony Lewis:
 On Friday, June 05, 2015 1:24 PM, Tim Rühsen wrote:
   First, I have not dug into the source code to see how -H is implemented.
   However, it makes sense to me that one ought to be able to specify
   both -H and -D together.
  
  -H (=all domains)
  to exclude some sites use --exclude-domains domain-list
 
 wget --help says about -H: go to foreign hosts when recursive.
 
 It doesn't say that when using -H one *must* take every foreign host that
 exists on the Internet and I'm arguing that such an interpretation does not
 make sense.

That is what -H is for :-)
Well, not *every* foreign host, but *every* foreign host that appears in 
downloaded, parsable files (HTML and CSS files).

wget --help just gives a short help, not a full description. See 'man wget' 
for the extended description. If there is something unclear, we should fix it.

Using -H always has the chance to 'download the whole internet'. That's 
normally not what you want and thus -H is not enabled by default.

 
 One ought to be able to request that wget go to foreign hosts without that
 implying that wget mirror the entire Internet. One obvious way to limit
 which foreign hosts are mirrored is to use -H in combination with -D.
 
   Consider this scenario: I want to mirror a site including the images
   that are stored in a sub-domain, but I don't want to mirror every
   external site referenced by the site. So I would try this:
   
   wget --mirror http://www.somesite.com -H -D www.somesite.com
   images.somesite.com
  
  You can also play with:
-A acclist --accept acclist
-R rejlist --reject rejlist
 
 I can play with lots of wget options, but in the scenario described I want
 *all* files from two hosts, but not every other foreign host that might be
 referenced by one of those hosts.
 
 What command line would you use for the scenario described?

Let's say you want all from the two hosts example1.com and example2.com:

wget --mirror example1.com example2.com

Regards, Tim


signature.asc
Description: This is a digitally signed message part.


Re: [Bug-wget] Behaviour of spanning to accepted domains

2015-06-07 Thread Tony Lewis
On Friday, June 05, 2015 1:24 PM, Tim Rühsen wrote:

  First, I have not dug into the source code to see how -H is implemented.
  However, it makes sense to me that one ought to be able to specify 
  both -H and -D together.
 -H (=all domains)
 to exclude some sites use --exclude-domains domain-list

wget --help says about -H: go to foreign hosts when recursive.

It doesn't say that when using -H one *must* take every foreign host that
exists on the Internet and I'm arguing that such an interpretation does not
make sense.

One ought to be able to request that wget go to foreign hosts without that
implying that wget mirror the entire Internet. One obvious way to limit
which foreign hosts are mirrored is to use -H in combination with -D.

  Consider this scenario: I want to mirror a site including the images 
  that are stored in a sub-domain, but I don't want to mirror every 
  external site referenced by the site. So I would try this:
 
  wget --mirror http://www.somesite.com -H -D www.somesite.com 
  images.somesite.com

 You can also play with:

   -A acclist --accept acclist
   -R rejlist --reject rejlist

I can play with lots of wget options, but in the scenario described I want
*all* files from two hosts, but not every other foreign host that might be
referenced by one of those hosts.

What command line would you use for the scenario described?

Tony