1.bloom filter
high effient algorithm to elimate duplicate URL.

2.based on disk hash table
mercator uses it

2005/12/16, Arun Kumar Sharma <[EMAIL PROTECTED]>:
>
> Hi
>      I have list of urls which may contain duplicate urls. I want to check
> that there is no duplicate url insertion through WebDBInjector. Is there any
> way to achieve this using nutch functionality???
>     answer awaited anxiously...
>
>
> Regards,
>
> Arun Kumar Sharma (Tech Lead -Java/J2EE)
> Mob: +91.981.529.5761
>
>
>
>
> Send instant messages to your online friends http://in.messenger.yahoo.com
>



--
想搜就搜

Reply via email to