Taking backup of a Lucene index
I want to take back-up of a Lucene index. I need to ensure that index files would not change when I take their backup. I am concerned about the housekeeping/merge/optimization activities which Lucene performs internally. I am not sure when/how these activities are performed by Lucene and how we can prevent them. My application (which allows indexing and searching over the created indexes) keeps running in the background. I can ensure that nothing is written to the indexes by my application when I take their backup, but I am not sure whether indexes would change in some manner when a search is performed over it. How can I ensure that an index would not change (i.e., no housekeeping/merge/optimization activity is performed by Lucene) when I take its backup? Any help would be much appreciated. PS: Currently I am using Lucene 2.9.4 but wish to upgrade it to 3.6.2. Regards Ashish
Re: Taking backup of a Lucene index
On Wed, Apr 17, 2013 at 12:57 PM, Ashish Sarna wrote: > I want to take back-up of a Lucene index. I need to ensure that index files > would not change when I take their backup. > > > I am concerned about the housekeeping/merge/optimization activities which > Lucene performs internally. I am not sure when/how these activities are > performed by Lucene and how we can prevent them. > > Use a http://lucene.apache.org/core/4_2_1/core/org/apache/lucene/index/SnapshotDeletionPolicy.html Take a snapshot, backup/copy all the files in the commit, relase the snapshot
Re: Taking backup of a Lucene index
On Wed, Apr 17, 2013 at 7:02 AM, Thomas Matthijs wrote: > On Wed, Apr 17, 2013 at 12:57 PM, Ashish Sarna wrote: > >> I want to take back-up of a Lucene index. I need to ensure that index files >> would not change when I take their backup. >> > > >> >> I am concerned about the housekeeping/merge/optimization activities which >> Lucene performs internally. I am not sure when/how these activities are >> performed by Lucene and how we can prevent them. >> >> > > Use a > http://lucene.apache.org/core/4_2_1/core/org/apache/lucene/index/SnapshotDeletionPolicy.html > Take a snapshot, backup/copy all the files in the commit, relase the > snapshot That's right! Because Lucen is "write-once" (each file is opened, written, and never changed, and no file is over-written), the SnapshotDeletionPolicy let's you take a hot (live) point-in-time backup even while IndexWriter continues making changes to the index. As long as you hold that snapshot, all files it references will not be deleted, just be sure to release it once you're done backing up. Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: Taking backup of a Lucene index
Thanks for your replies. However, in my scenario, an external backup utility would be used to take backup of the Lucene index. I just need to ensure that index do not get changed when a search is performed over it or due to internal Lucene housekeeping/optimization/merge activities. Ashish -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Wednesday, April 17, 2013 4:55 PM To: java-user@lucene.apache.org Subject: Re: Taking backup of a Lucene index On Wed, Apr 17, 2013 at 7:02 AM, Thomas Matthijs wrote: > On Wed, Apr 17, 2013 at 12:57 PM, Ashish Sarna wrote: > >> I want to take back-up of a Lucene index. I need to ensure that index files >> would not change when I take their backup. >> > > >> >> I am concerned about the housekeeping/merge/optimization activities which >> Lucene performs internally. I am not sure when/how these activities are >> performed by Lucene and how we can prevent them. >> >> > > Use a > http://lucene.apache.org/core/4_2_1/core/org/apache/lucene/index/SnapshotDel etionPolicy.html > Take a snapshot, backup/copy all the files in the commit, relase the > snapshot That's right! Because Lucen is "write-once" (each file is opened, written, and never changed, and no file is over-written), the SnapshotDeletionPolicy let's you take a hot (live) point-in-time backup even while IndexWriter continues making changes to the index. As long as you hold that snapshot, all files it references will not be deleted, just be sure to release it once you're done backing up. Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Taking backup of a Lucene index
On Wed, Apr 17, 2013 at 7:32 AM, Ashish Sarna wrote: > Thanks for your replies. > > However, in my scenario, an external backup utility would be used to take > backup of the Lucene index. I just need to ensure that index do not get > changed when a search is performed over it or due to internal Lucene > housekeeping/optimization/merge activities. That's fine. You use the SnapshotDeletionPolicy to get a snapshot, and then you ask that snapshot for the list of files, then tell your external utility to copy those files. Once that utility is done, you release the snapshot. If you need the snapshot to persist even when you close the IndexWriter/JVM and later open a new IndexWriter, use PersistentSnapshotDeletionPolicy. Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: Taking backup of a Lucene index
The external backup utility would be used by some other person and it would simply copy the index directory to take its backup. I have no control over this utility. I have ensured that nothing would be written to index before the backup utility is executed and now just need to ensure that it does not get changed due to searches and or Lucene housekeeping activities. Is there a way to ensure this? Does using the IndexReader.open method with 'readOnly' flag passed as 'true' would help keeping the indexes from modifying when a search is performed? -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Wednesday, April 17, 2013 5:13 PM To: java-user@lucene.apache.org Subject: Re: Taking backup of a Lucene index On Wed, Apr 17, 2013 at 7:32 AM, Ashish Sarna wrote: > Thanks for your replies. > > However, in my scenario, an external backup utility would be used to take > backup of the Lucene index. I just need to ensure that index do not get > changed when a search is performed over it or due to internal Lucene > housekeeping/optimization/merge activities. That's fine. You use the SnapshotDeletionPolicy to get a snapshot, and then you ask that snapshot for the list of files, then tell your external utility to copy those files. Once that utility is done, you release the snapshot. If you need the snapshot to persist even when you close the IndexWriter/JVM and later open a new IndexWriter, use PersistentSnapshotDeletionPolicy. Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Taking backup of a Lucene index
It is difficult to associate a class named SnapshotDeletionPolicy with taking backup of Lucene index. Hien From: Thomas Matthijs To: java-user@lucene.apache.org Sent: Wednesday, April 17, 2013 4:02 AM Subject: Re: Taking backup of a Lucene index On Wed, Apr 17, 2013 at 12:57 PM, Ashish Sarna wrote: > I want to take back-up of a Lucene index. I need to ensure that index files > would not change when I take their backup. > > > I am concerned about the housekeeping/merge/optimization activities which > Lucene performs internally. I am not sure when/how these activities are > performed by Lucene and how we can prevent them. > > Use a http://lucene.apache.org/core/4_2_1/core/org/apache/lucene/index/SnapshotDeletionPolicy.html Take a snapshot, backup/copy all the files in the commit, relase the snapshot