Dear Community, 

 

I have a Solr cluster with an index containing approximately 100+ million
addresses. I need to do a bulk search with the same number of addresses in
order to find near-duplicate entities.

 

Is there anything specific that I should look for before doing so? 

 

At the moment I am just querying the index using the Solr client but that
means that I will be executing 100+ million HTTP requests to the cluster and
that sounds very time consuming and not optimal.

 

Is there any offline solution to query Solr?

 

Thanks for your help!

Reply via email to