Re: Backup strategies

2004-11-16 Thread Nader Henein
We've recently implemented something similar with the backup process 
creating a file (much like the lock files during indexing) that the 
IndexWriter recognizes (tweak) and doesn't attempt to start and indexing 
or a delete while it's there, wasn't that much work actually.

Nader
Doug Cutting wrote:
Christoph Kiehl wrote:
I'm curious about your strategy to backup indexes based on 
FSDirectory. If I do a file based copy I suspect I will get corrupted 
data because of concurrent write access.
My current favorite is to create an empty index and use 
IndexWriter.addIndexes() to copy the current index state. But I'm not 
sure about the performance of this solution.

How do you make your backups?

A safe way to backup is to have your indexing process, when it knows 
the index is stable (e.g., just after calling IndexWriter.close()), 
make a checkpoint copy of the index by running a shell command like 
"cp -lpr index index.YYYMMDDHHmmSS".  This is very fast and requires 
little disk space, since it creates only a new directory of hard 
links.  Then you can separately back this up and subsequently remove it.

This is also a useful way to replicate indexes.  On the master 
indexing server periodically perform "cp -lpr" as above.  Then search 
slaves can use rsync to pull down the latest version of the index.  If 
a very small mergefactor is used (e.g., 2) then the index will have 
only a few segments, so that searches are fast.  On the slave, 
periodically find the latest index.YYYMMDDHHmmSS, use "cp -lpr index/ 
index.YYYMMDDHHmmSS" and 'rsync --delete master:index.YYYMMDDHHmmSS 
index.YYYMMDDHHmmSS' to efficiently get a local copy, and finally "ln 
-fsn index.YYYMMDDHHmmSS index" to publish the new version of the index.

Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Backup strategies

2004-11-16 Thread Doug Cutting
Christoph Kiehl wrote:
I'm curious about your strategy to backup indexes based on FSDirectory. 
If I do a file based copy I suspect I will get corrupted data because of 
concurrent write access.
My current favorite is to create an empty index and use 
IndexWriter.addIndexes() to copy the current index state. But I'm not 
sure about the performance of this solution.

How do you make your backups?
A safe way to backup is to have your indexing process, when it knows the 
index is stable (e.g., just after calling IndexWriter.close()), make a 
checkpoint copy of the index by running a shell command like "cp -lpr 
index index.YYYMMDDHHmmSS".  This is very fast and requires little disk 
space, since it creates only a new directory of hard links.  Then you 
can separately back this up and subsequently remove it.

This is also a useful way to replicate indexes.  On the master indexing 
server periodically perform "cp -lpr" as above.  Then search slaves can 
use rsync to pull down the latest version of the index.  If a very small 
mergefactor is used (e.g., 2) then the index will have only a few 
segments, so that searches are fast.  On the slave, periodically find 
the latest index.YYYMMDDHHmmSS, use "cp -lpr index/ index.YYYMMDDHHmmSS" 
and 'rsync --delete master:index.YYYMMDDHHmmSS index.YYYMMDDHHmmSS' to 
efficiently get a local copy, and finally "ln -fsn index.YYYMMDDHHmmSS 
index" to publish the new version of the index.

Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Backup strategies

2004-10-27 Thread Justin Swanhart
I would suggest that you create a lock file for your index writing
process, if the lock file is encountered close the IndexWriter until
the lock file is removed.  After you create the lockfile, wait a few
seconds to make sure the writer process has quiesced, then create a
snapshot of the filesystem.  Remove the lockfile and backup the
snapshot with your favorite backup tool (exclude the lock file), then
drop the snapshot.

Swany

On Wed, 27 Oct 2004 14:40:20 +0200, Christoph Kiehl <[EMAIL PROTECTED]> wrote:
> Christiaan Fluit wrote:
> 
> > I have no practical experience with backing up an online index, but I
> > would try to find out the details of the write lock mechanism used by
> > Lucene at the file level. You can then create a backup component that
> > write-locks the index and does a regular file copy of the index dir.
> > During backup time searches can continue while updates will be
> > temporarily blocked.
> 
> The problem with this approach is that this will not only block write
> operations but you will get timeouts for these operations which will
> lead to exceptions. To prevent this you must implement some queuing,
> which is what I would like avoid.
> 
> Regards,
> Christoph
> 
> 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Backup strategies

2004-10-27 Thread Christoph Kiehl
Christiaan Fluit wrote:
I have no practical experience with backing up an online index, but I 
would try to find out the details of the write lock mechanism used by 
Lucene at the file level. You can then create a backup component that 
write-locks the index and does a regular file copy of the index dir. 
During backup time searches can continue while updates will be 
temporarily blocked.
The problem with this approach is that this will not only block write 
operations but you will get timeouts for these operations which will 
lead to exceptions. To prevent this you must implement some queuing, 
which is what I would like avoid.

Regards,
Christoph
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Backup strategies

2004-10-27 Thread Christiaan Fluit
Christoph Kiehl wrote:
I'm curious about your strategy to backup indexes based on FSDirectory. 
If I do a file based copy I suspect I will get corrupted data because of 
concurrent write access.
My current favorite is to create an empty index and use 
IndexWriter.addIndexes() to copy the current index state. But I'm not 
sure about the performance of this solution.
I have no practical experience with backing up an online index, but I 
would try to find out the details of the write lock mechanism used by 
Lucene at the file level. You can then create a backup component that 
write-locks the index and does a regular file copy of the index dir. 
During backup time searches can continue while updates will be 
temporarily blocked.

But as I said, I'm only speculating...
Chris
--
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Backup strategies

2004-10-27 Thread Christoph Kiehl
Hi,
I'm curious about your strategy to backup indexes based on FSDirectory. 
If I do a file based copy I suspect I will get corrupted data because of 
concurrent write access.
My current favorite is to create an empty index and use 
IndexWriter.addIndexes() to copy the current index state. But I'm not 
sure about the performance of this solution.

How do you make your backups?
Regards,
Christoph
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]