Re: Remote backup of Solr index over low-bandwith connection

Jonathan Rochkind Tue, 09 Aug 2011 08:32:35 -0700

You can use rsync to automatically only transfer the files that havechanged. I don't think you'll have to home grow your own 'only transferthe diffs' solution, I think rsync will do that for you.

But yes, running an optimization, after many updates/deletes, willgenerally mean nearly everything has changed.

Solr's index, of course _is_ lucene, so your experience with lucene willbe applicable to Solr. Unless lucene or Solr have added new featuressince you last used it, but you're still using lucene, when you're usingSolr.


On 8/9/2011 11:22 AM, Peter Kritikos wrote:

Hello, everyone,
My company will be using Solr on the server appliance we deliver toour clients. We would like to maintain remote backups of clients'search indexes to avoid rebuilding a large index when an appliance fails.
One of our clients backs up their data onto a remote server providedby a vendor which only provides storage space, so I don't believe itis possible for us to set up a remote slave server to use Solr'sreplication functionality. Because our client has a low-bandwidthconnection to their backup server, we would like to minimize theamount of data transferred to the remote machine. Our Solr indexreceives commits every few minutes and will probably be optimizedroughly once a day. Does our frequently modified index allow us totransfer an amount of data proportional to the number of new documentsadded to the search index daily? From my understanding, optimizing anindex makes very significant changes to its files. Is there a wayaround this that I may be missing?
We have faced this problem in the past when our product used aLucene-based search engine. We were unable to find a solution where wecould only copy the "diffs" introduced to the index since the mostrecent backup, so we opted to make our indexing process faster. Inaddition to plain text, many of the documents that we are indexing arebinary, e.g. Word, PDF. We cached the extracted text from these binarydocuments on the clients' backup servers, saving us the cost ofextraction at index time. If we must pursue a solution like this forSolr, how else might we optimize the indexing process?
Much appreciated,
Peter Kritikos

Re: Remote backup of Solr index over low-bandwith connection

Reply via email to