On Tuesday 17 August 2010 13:47:32 Jeroen van Vianen wrote: > > Yes. I have lots of similar results because of these URLs occurring many > times for the same original URL.
You can use deduplication [1]. It generates signatures for (near) exact content depending on configuration. It can then optionally overwrite (delete) duplicates. [1]: http://wiki.apache.org/solr/Deduplication > > Thanks and best regards, > > > Jeroen > Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350

