For backup purposes to an offsite data center, I need to make sure that each core's configuration has replication to a consistently defined backup directory on a Netapp filer. The Netapp filer's snapshot can be invoked manually, and its snap mirror will copy the data to the offsite data center where it will be mounted. A comparison script at the offsite data center can then rsync data to the local filesystem, and signal Solr to reload the core.
Since we have less than 30G of index data, less than 10 million documents, and about 2 QPS, we think SolrCloud doesn't make sense for us at this time. I'm wondering whether SolrCloud has any advantage to me if I define this as 2 SolrCloud's with replication from the master cloud to the slave cloud. More specifically, without SolrCloud I see some need to modify the solrconfig.xml of each core to assure the ReplicationHandler is defined and has the right backup parameters. With SolrCloud, I would hope for some way to set backup parameters for data globally. I know about SemaText's post and the issue (with sub-tasks) on this general area, but have had no time to parse the whole thing to understand whether SolrCloud offers me value over Solr non-cloud in this configuration. Another architecture would be to take an LVM snapshot after commit, mount that on the master node/single-node cloud, and rsync to the Netapp for both backup and fault-tolerance. A signal file on the Netapp would cause the slave to rsync and reload. Dan Davis, Systems/Applications Architect (Contractor), Office of Computer and Communications Systems, National Library of Medicine, NIH