Hello,

It maps anything to anything and has wildcard support:
*.example.com example.org
maps all URL's on the example.com domain to example.org.

Cheers
 
 
-----Original message-----
> From:John McCormac <[email protected]>
> Sent: Sat 23-Jun-2012 13:29
> To: [email protected]
> Subject: Re: Near Duplicate Detection in nutch /Solr
> 
> On 23/06/2012 12:14, Markus Jelsma wrote:
> > Nutch now has a HostURLNormalizer capable of normalizing source hosts to a 
> > target host. This prevents duplication of complete websites  and bad 
> > hyperlinks.
> >
> > https://issues.apache.org/jira/browse/NUTCH-1319
> 
> But does that normalize subdomains to the main site (same TLD - 
> sub.example.org to example.org etc) rather than clone sites in different 
> TLDs to the main site?
> 
> Regards...jmcc
> -- 
> **********************************************************
> John McCormac  *  e-mail: [email protected]
> MC2            *  web: http://www.hosterstats.com/
> 22 Viewmount   *  Domain Registrations Statistics
> Waterford      *  And Historical DNS Database.
> Ireland        *  Over 275 Million Domains Tracked.
> IE             *  http://www.hosterstats.com/blog
> **********************************************************
> 
> 
> 

Reply via email to