Hi, Susam.
But that's wrong. Your solution is the easiest way to get rid of duplicates
If you know DataParkSearch engine, it has this option.
So, is the usage of url filter the only way to avoid duplicates?
Or is there any way to code this feature, and if so, then how?
> I have faced this issue. I block the duplicate domain using the URL
> filters. So only one domain is crawled by the bot and the other domain
> is ignored.
> Regards,
> Susam Pal
> http://susam.in/
> On 7/6/07, Nuther <[EMAIL PROTECTED]> wrote:
>> Hi,
>> I was wondering if nutch has alias option
>> Let's say we have two domains www.site1.com and www.site2.com that point on
>> one site. How can I tell nutch that they pooint on that site? This is problem
>> because there are a lot of duplicates in search results.
>> Thanks.
>> --
>> Regards,
>> Nuther mailto:[EMAIL PROTECTED]
--
Regards,
Nuther mailto:[EMAIL PROTECTED]
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general