Hi, Susam. But that's wrong. Your solution is the easiest way to get rid of duplicates If you know DataParkSearch engine, it has this option. So, is the usage of url filter the only way to avoid duplicates? Or is there any way to code this feature, and if so, then how?
> I have faced this issue. I block the duplicate domain using the URL > filters. So only one domain is crawled by the bot and the other domain > is ignored. > Regards, > Susam Pal > http://susam.in/ > On 7/6/07, Nuther <[EMAIL PROTECTED]> wrote: >> Hi, >> I was wondering if nutch has alias option >> Let's say we have two domains www.site1.com and www.site2.com that point on >> one site. How can I tell nutch that they pooint on that site? This is problem >> because there are a lot of duplicates in search results. >> Thanks. >> -- >> Regards, >> Nuther mailto:[EMAIL PROTECTED] -- Regards, Nuther mailto:[EMAIL PROTECTED]
