Remote backup of Solr index over low-bandwith connection

Peter Kritikos Tue, 09 Aug 2011 08:23:38 -0700

Hello, everyone,

My company will be using Solr on the server appliance we deliver to ourclients. We would like to maintain remote backups of clients' searchindexes to avoid rebuilding a large index when an appliance fails.

One of our clients backs up their data onto a remote server provided bya vendor which only provides storage space, so I don't believe it ispossible for us to set up a remote slave server to use Solr'sreplication functionality. Because our client has a low-bandwidthconnection to their backup server, we would like to minimize the amountof data transferred to the remote machine. Our Solr index receivescommits every few minutes and will probably be optimized roughly once aday. Does our frequently modified index allow us to transfer an amountof data proportional to the number of new documents added to the searchindex daily? From my understanding, optimizing an index makes verysignificant changes to its files. Is there a way around this that I maybe missing?

We have faced this problem in the past when our product used aLucene-based search engine. We were unable to find a solution where wecould only copy the "diffs" introduced to the index since the mostrecent backup, so we opted to make our indexing process faster. Inaddition to plain text, many of the documents that we are indexing arebinary, e.g. Word, PDF. We cached the extracted text from these binarydocuments on the clients' backup servers, saving us the cost ofextraction at index time. If we must pursue a solution like this forSolr, how else might we optimize the indexing process?


Much appreciated,
Peter Kritikos

Remote backup of Solr index over low-bandwith connection

Reply via email to