Re: [Bug-wget] Unexpected result with -H and -D

2018-01-18 Thread Friso van Vollenhoven
Hi, Thanks for confirming it's a bug. I'm currently not fluent enough in C to provide a fix myself, but I see a patch was already posted, so I hope that's satisfactory. Cheers, Friso On Wed, Jan 17, 2018 at 3:01 PM, Darshit Shah wrote: > Hi, > > This is a bug in Wget,

Re: [Bug-wget] Unexpected result with -H and -D

2018-01-17 Thread Tim Rühsen
Hi, this is not a PSL matching, so no libpsl is needed. Just sufmatch() has to be fixed to do (sub)domain matching. Attached is a fix. With Best Regards, Tim On 01/17/2018 03:01 PM, Darshit Shah wrote: > Hi, > > This is a bug in Wget, apparently a really old one! Seems like the bug has >

Re: [Bug-wget] Unexpected result with -H and -D

2018-01-17 Thread Darshit Shah
Hi, This is a bug in Wget, apparently a really old one! Seems like the bug has been around since atleast 1997. Looking at the source, the issue is that Wget does a very simple suffix matching on the actual domain and accepted domains list. This is obviously wrong as you have just found out. I'm

[Bug-wget] Unexpected result with -H and -D

2018-01-17 Thread Friso van Vollenhoven
Hello all, I am trying to do a recursive download of a webpage and span multiple hosts within the same domain, but not cross to other domains. The issue is that the crawl does extend to other domains. My full command is this: wget \ --recursive \ --no-clobber \ --page-requisites \