Re: Backup strategies
We've recently implemented something similar with the backup process creating a file (much like the lock files during indexing) that the IndexWriter recognizes (tweak) and doesn't attempt to start and indexing or a delete while it's there, wasn't that much work actually. Nader Doug Cutting wrote: Christoph Kiehl wrote: I'm curious about your strategy to backup indexes based on FSDirectory. If I do a file based copy I suspect I will get corrupted data because of concurrent write access. My current favorite is to create an empty index and use IndexWriter.addIndexes() to copy the current index state. But I'm not sure about the performance of this solution. How do you make your backups? A safe way to backup is to have your indexing process, when it knows the index is stable (e.g., just after calling IndexWriter.close()), make a checkpoint copy of the index by running a shell command like "cp -lpr index index.YYYMMDDHHmmSS". This is very fast and requires little disk space, since it creates only a new directory of hard links. Then you can separately back this up and subsequently remove it. This is also a useful way to replicate indexes. On the master indexing server periodically perform "cp -lpr" as above. Then search slaves can use rsync to pull down the latest version of the index. If a very small mergefactor is used (e.g., 2) then the index will have only a few segments, so that searches are fast. On the slave, periodically find the latest index.YYYMMDDHHmmSS, use "cp -lpr index/ index.YYYMMDDHHmmSS" and 'rsync --delete master:index.YYYMMDDHHmmSS index.YYYMMDDHHmmSS' to efficiently get a local copy, and finally "ln -fsn index.YYYMMDDHHmmSS index" to publish the new version of the index. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Backup strategies
Christoph Kiehl wrote: I'm curious about your strategy to backup indexes based on FSDirectory. If I do a file based copy I suspect I will get corrupted data because of concurrent write access. My current favorite is to create an empty index and use IndexWriter.addIndexes() to copy the current index state. But I'm not sure about the performance of this solution. How do you make your backups? A safe way to backup is to have your indexing process, when it knows the index is stable (e.g., just after calling IndexWriter.close()), make a checkpoint copy of the index by running a shell command like "cp -lpr index index.YYYMMDDHHmmSS". This is very fast and requires little disk space, since it creates only a new directory of hard links. Then you can separately back this up and subsequently remove it. This is also a useful way to replicate indexes. On the master indexing server periodically perform "cp -lpr" as above. Then search slaves can use rsync to pull down the latest version of the index. If a very small mergefactor is used (e.g., 2) then the index will have only a few segments, so that searches are fast. On the slave, periodically find the latest index.YYYMMDDHHmmSS, use "cp -lpr index/ index.YYYMMDDHHmmSS" and 'rsync --delete master:index.YYYMMDDHHmmSS index.YYYMMDDHHmmSS' to efficiently get a local copy, and finally "ln -fsn index.YYYMMDDHHmmSS index" to publish the new version of the index. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Backup strategies
I would suggest that you create a lock file for your index writing process, if the lock file is encountered close the IndexWriter until the lock file is removed. After you create the lockfile, wait a few seconds to make sure the writer process has quiesced, then create a snapshot of the filesystem. Remove the lockfile and backup the snapshot with your favorite backup tool (exclude the lock file), then drop the snapshot. Swany On Wed, 27 Oct 2004 14:40:20 +0200, Christoph Kiehl <[EMAIL PROTECTED]> wrote: > Christiaan Fluit wrote: > > > I have no practical experience with backing up an online index, but I > > would try to find out the details of the write lock mechanism used by > > Lucene at the file level. You can then create a backup component that > > write-locks the index and does a regular file copy of the index dir. > > During backup time searches can continue while updates will be > > temporarily blocked. > > The problem with this approach is that this will not only block write > operations but you will get timeouts for these operations which will > lead to exceptions. To prevent this you must implement some queuing, > which is what I would like avoid. > > Regards, > Christoph > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Backup strategies
Christiaan Fluit wrote: I have no practical experience with backing up an online index, but I would try to find out the details of the write lock mechanism used by Lucene at the file level. You can then create a backup component that write-locks the index and does a regular file copy of the index dir. During backup time searches can continue while updates will be temporarily blocked. The problem with this approach is that this will not only block write operations but you will get timeouts for these operations which will lead to exceptions. To prevent this you must implement some queuing, which is what I would like avoid. Regards, Christoph - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Backup strategies
Christoph Kiehl wrote: I'm curious about your strategy to backup indexes based on FSDirectory. If I do a file based copy I suspect I will get corrupted data because of concurrent write access. My current favorite is to create an empty index and use IndexWriter.addIndexes() to copy the current index state. But I'm not sure about the performance of this solution. I have no practical experience with backing up an online index, but I would try to find out the details of the write lock mechanism used by Lucene at the file level. You can then create a backup component that write-locks the index and does a regular file copy of the index dir. During backup time searches can continue while updates will be temporarily blocked. But as I said, I'm only speculating... Chris -- - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Backup strategies
Hi, I'm curious about your strategy to backup indexes based on FSDirectory. If I do a file based copy I suspect I will get corrupted data because of concurrent write access. My current favorite is to create an empty index and use IndexWriter.addIndexes() to copy the current index state. But I'm not sure about the performance of this solution. How do you make your backups? Regards, Christoph - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]