Re: Implication of not calling closeSearcher() in DirectUpdateHandler2?

Walter Ferrara Mon, 24 Sep 2007 08:39:49 -0700

solr have unique keys, which do that "avoid duplicate" work for you, so
you may try to make some kind of unique identifier out of the text your
going to index, and use that as a solr <uniqueKey>.


You could try to create a sort of hashCode or something like that from
the text your are going to index, and use that as uniquekey of the
schema -  the next time you're going to add the same text, you should
get the same key, and so solr will not add it again, but just update it
(or at least it will be a lot simpler to understand if that document is
already present in the index).

any other thoughts?
--
Walter

climbingrose wrote:
>   
>>> You would get autowarming, etc, by default though - not what you want
>>>       
>> >from a searcher that is  only used for deletions.
>>     
>
> As a work around, I manually initialise LRUCache instance in DUH2
> constructor. It works but not very elegant because you can't view cache's
> statistics info in Solr admin...
>
>   
>>> What problem are you trying to solve that requires directly using or
>>> modifying DUH2?
>>>       
>
> I'm doing near duplication detection on a fairly large number of documents.
> Each document to be added to Solr will be compared with sample documents
> from all clusters in the index. I could of course, dedupe documents at
> client side but the performance will not be as good.
>
> BTW, has anyone here done any serious near duplication detection with Solr?
> If yes, what approaches did you use?
>
> Thanks.
>

Re: Implication of not calling closeSearcher() in DirectUpdateHandler2?

Reply via email to