RE: db.ignore.external.links

Markus Jelsma Sat, 05 Nov 2016 04:28:59 -0700

Hi - db.ignore.* operates on absolute URL's. The parser shouldn't return 
relative URL's. Does it?
 
 
-----Original message-----
> From:Michael Coffey <[email protected]>
> Sent: Friday 4th November 2016 3:40
> To: [email protected]
> Subject: db.ignore.external.links
> 
> Does db.ignore.external.links accept only relative urls? I am crawling a 
> site, let's call it http://www.xyz.com. It contains things like <A 
> HREF="http://www.xyz.com/business.html"; >.
> 
> 
> Those urls don't end up in the crawldb, but ones with relative urls do. Is 
> this normal, or am I confused?
>

RE: db.ignore.external.links

Reply via email to