Well, I think I finally figured out how to get SolrEntityProcessor to work,
but there are still some issues.  I had to add a library path to
solrconfig.xml, but the cores are finally coming up and i am now manually
able to run a data import that does seem to index all of the documents on
the remote SolrCloud.  I ran into the issue here where I got version
conflicts:

http://lucene.472066.n3.nabble.com/Version-conflict-during-data-import-from-another-Solr-instance-into-clean-Solr-td4046937.html

I used the suggestion of adding fl="*,old_version:_version_" to the
data-config.xml entity config line.  This seems to be working but I don't
know if this will cause a problem.  When I do a manual data import i get the
correct number of documents from the source SolrCloud (the total number of
docs added up between both shards is 6357 in this test case)

Indexing completed. Added/Updated: 6,357 documents. Deleted 0 documents.
(Duration: 22s)
Requests: 0 (0/s), Fetched: 6,357 (289/s), Skipped: 0, Processed: 6,357 

However, when I check the number of docs indexed for each shard in the core
admin UI on the destination SolrCloud, the numbers are way off and a lot
less than 6357.  Theres nothing in the logs to indicate collisions or
dropped documents.  What could account for the disparity?

I would assume down the road what I need to do is configure multiple
collections/cores on the failover cluster representing each DC its
replicating from, but how would you create multiple collections when using
zookeeper?  How do you upload multiple sets of config files for each one and
keep them separate?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Replicating-Between-Solr-Clouds-tp4121196p4121737.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to