Hello, It maps anything to anything and has wildcard support: *.example.com example.org maps all URL's on the example.com domain to example.org.
Cheers -----Original message----- > From:John McCormac <[email protected]> > Sent: Sat 23-Jun-2012 13:29 > To: [email protected] > Subject: Re: Near Duplicate Detection in nutch /Solr > > On 23/06/2012 12:14, Markus Jelsma wrote: > > Nutch now has a HostURLNormalizer capable of normalizing source hosts to a > > target host. This prevents duplication of complete websites and bad > > hyperlinks. > > > > https://issues.apache.org/jira/browse/NUTCH-1319 > > But does that normalize subdomains to the main site (same TLD - > sub.example.org to example.org etc) rather than clone sites in different > TLDs to the main site? > > Regards...jmcc > -- > ********************************************************** > John McCormac * e-mail: [email protected] > MC2 * web: http://www.hosterstats.com/ > 22 Viewmount * Domain Registrations Statistics > Waterford * And Historical DNS Database. > Ireland * Over 275 Million Domains Tracked. > IE * http://www.hosterstats.com/blog > ********************************************************** > > >

