On 23/06/2012 12:14, Markus Jelsma wrote:
Nutch now has a HostURLNormalizer capable of normalizing source hosts to a 
target host. This prevents duplication of complete websites  and bad hyperlinks.

https://issues.apache.org/jira/browse/NUTCH-1319

But does that normalize subdomains to the main site (same TLD - sub.example.org to example.org etc) rather than clone sites in different TLDs to the main site?

Regards...jmcc
--
**********************************************************
John McCormac  *  e-mail: [email protected]
MC2            *  web: http://www.hosterstats.com/
22 Viewmount   *  Domain Registrations Statistics
Waterford      *  And Historical DNS Database.
Ireland        *  Over 275 Million Domains Tracked.
IE             *  http://www.hosterstats.com/blog
**********************************************************


Reply via email to