Re: Data Import Handler with Solr Source behind Load Balancer

2018-09-14 Thread Emir Arnautović
Hi Thomas,
Is this SolrCloud or Solr master-slave? Do you update index while indexing? Did 
you check if all your instances behind LB are in sync if you are using 
master-slave?
My guess would be that DIH is using cursors to read data from another Solr. If 
you are using multiple Solr instances behind LB there might be some diffs in 
index that results in different documents being returned for the same cursor 
mark. Is num doc and max doc the same on new instance after import?

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 12 Sep 2018, at 05:53, Zimmermann, Thomas  
> wrote:
> 
> We have a Solr v7 Instance sourcing data from a Data Import Handler with a 
> Solr data source running Solr v4. When it hits a single server in that 
> instance directly, all documents are read and written correctly to the v7. 
> When we hit the load balancer DNS entry, the resulting data import handler 
> json states that it read all the documents and skipped none, and all looks 
> fine, but the result set is missing ~20% of the documents in the v7 core. 
> This has happened multiple time on multiple environments.
> 
> Any thoughts on whether this might be a bug in the underlying DIH code? I'll 
> also pass it along to the server admins on our side for input.



Data Import Handler with Solr Source behind Load Balancer

2018-09-11 Thread Zimmermann, Thomas
We have a Solr v7 Instance sourcing data from a Data Import Handler with a Solr 
data source running Solr v4. When it hits a single server in that instance 
directly, all documents are read and written correctly to the v7. When we hit 
the load balancer DNS entry, the resulting data import handler json states that 
it read all the documents and skipped none, and all looks fine, but the result 
set is missing ~20% of the documents in the v7 core. This has happened multiple 
time on multiple environments.

Any thoughts on whether this might be a bug in the underlying DIH code? I'll 
also pass it along to the server admins on our side for input.