On Tuesday 17 August 2010 13:47:32 Jeroen van Vianen wrote:
> 
> Yes. I have lots of similar results because of these URLs occurring many
> times for the same original URL.

You can use deduplication [1]. It generates signatures for (near) exact 
content depending on configuration. It can then optionally overwrite (delete) 
duplicates.

[1]: http://wiki.apache.org/solr/Deduplication

> 
> Thanks and best regards,
> 
> 
> Jeroen
> 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Reply via email to