Here's the Solr Wiki on collection distribution: http://wiki.apache.org/solr/CollectionDistribution
It describes the "incremental" nature of the distribution: A collection is a directory of many files. Collections are distributed to the slaves as snapshots of these files. Each snapshot is made up of hard links to the files so copying of the actual files is not necessary when snapshots are created. Lucene only significantly rewrites files following an optimization command. Generally, a file once written, will change very little if at all. This makes the underlying transport of rsync very useful. Files that have already been transfered and have not changed do not need to be re-transferred with the new edition of a collection. Bill On 4/21/07, Kevin Lewandowski <[EMAIL PROTECTED]> wrote:
snapshooter does create incremental builds of the index. It doesn't appear so if you look at the contents because the existing files are hard links. But it is incremental. On 4/20/07, Doss <[EMAIL PROTECTED]> wrote: > Hi Yonik, > > Thanks for your quick response, my question is this, can we take incremental > backup/replication in SOLR? > > Regards, > Doss. > > > M. MOHANDOSS Software Engineer Ext: 507 (A BharatMatrimony Enterprise) > ----- Original Message ----- > From: "Yonik Seeley" <[EMAIL PROTECTED]> > To: <solr-user@lucene.apache.org> > Sent: Thursday, April 19, 2007 7:42 PM > Subject: Re: Snapshooting or replicating recently indexed data > > > > On 4/19/07, Doss <[EMAIL PROTECTED]> wrote: > >> It seems the snapshooter takes the exact copy of the indexed data, that > >> is all the contents inside the index directory, how can we take the > >> recently added once? > >> ... > >> cp -lr ${data_dir}/index ${temp} > >> mv ${temp} ${name} ... > > > > > > I don't quite understand your question, but since hard links are used, > > it's more like pointing to the index files instead of copying them. > > Rsync is used as a transport to only move the files that were changed > > from the master to slaves. > > > > -Yonik > >