Taking backup of a Lucene index

2013-04-17 Thread Ashish Sarna
I want to take back-up of a Lucene index. I need to ensure that index files
would not change when I take their backup. 

 

I am concerned about the housekeeping/merge/optimization activities which
Lucene performs internally. I am not sure when/how these activities are
performed by Lucene and how we can prevent them.

 

My application (which allows indexing and searching over the created
indexes) keeps running in the background. I can ensure that nothing is
written to the indexes by my application when I take their backup, but I am
not sure whether indexes would change in some manner when a search is
performed over it.

 

How can I ensure that an index would not change (i.e., no
housekeeping/merge/optimization activity is performed by Lucene) when I take
its backup?

 

Any help would be much appreciated.

 

PS: Currently I am using Lucene 2.9.4 but wish to upgrade it to 3.6.2.

 

Regards

Ashish



Re: Taking backup of a Lucene index

2013-04-17 Thread Thomas Matthijs
On Wed, Apr 17, 2013 at 12:57 PM, Ashish Sarna wrote:

> I want to take back-up of a Lucene index. I need to ensure that index files
> would not change when I take their backup.
>


>
> I am concerned about the housekeeping/merge/optimization activities which
> Lucene performs internally. I am not sure when/how these activities are
> performed by Lucene and how we can prevent them.
>
>

Use a
http://lucene.apache.org/core/4_2_1/core/org/apache/lucene/index/SnapshotDeletionPolicy.html
Take a snapshot, backup/copy all the files in the commit, relase the
snapshot


Re: Taking backup of a Lucene index

2013-04-17 Thread Michael McCandless
On Wed, Apr 17, 2013 at 7:02 AM, Thomas Matthijs  wrote:
> On Wed, Apr 17, 2013 at 12:57 PM, Ashish Sarna wrote:
>
>> I want to take back-up of a Lucene index. I need to ensure that index files
>> would not change when I take their backup.
>>
>
>
>>
>> I am concerned about the housekeeping/merge/optimization activities which
>> Lucene performs internally. I am not sure when/how these activities are
>> performed by Lucene and how we can prevent them.
>>
>>
>
> Use a
> http://lucene.apache.org/core/4_2_1/core/org/apache/lucene/index/SnapshotDeletionPolicy.html
> Take a snapshot, backup/copy all the files in the commit, relase the
> snapshot

That's right!

Because Lucen is "write-once" (each file is opened, written, and never
changed, and no file is over-written), the SnapshotDeletionPolicy
let's you take a hot (live) point-in-time backup even while
IndexWriter continues making changes to the index.

As long as you hold that snapshot, all files it references will not be
deleted, just be sure to release it once you're done backing up.

Mike McCandless

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: Taking backup of a Lucene index

2013-04-17 Thread Ashish Sarna
Thanks for your replies. 

However, in my scenario, an external backup utility would be used to take
backup of the Lucene index. I just need to ensure that index do not get
changed when a search is performed over it or due to internal Lucene
housekeeping/optimization/merge activities.

Ashish

 
-Original Message-
From: Michael McCandless [mailto:luc...@mikemccandless.com] 
Sent: Wednesday, April 17, 2013 4:55 PM
To: java-user@lucene.apache.org
Subject: Re: Taking backup of a Lucene index

On Wed, Apr 17, 2013 at 7:02 AM, Thomas Matthijs  wrote:
> On Wed, Apr 17, 2013 at 12:57 PM, Ashish Sarna
wrote:
>
>> I want to take back-up of a Lucene index. I need to ensure that index
files
>> would not change when I take their backup.
>>
>
>
>>
>> I am concerned about the housekeeping/merge/optimization activities which
>> Lucene performs internally. I am not sure when/how these activities are
>> performed by Lucene and how we can prevent them.
>>
>>
>
> Use a
>
http://lucene.apache.org/core/4_2_1/core/org/apache/lucene/index/SnapshotDel
etionPolicy.html
> Take a snapshot, backup/copy all the files in the commit, relase the
> snapshot

That's right!

Because Lucen is "write-once" (each file is opened, written, and never
changed, and no file is over-written), the SnapshotDeletionPolicy
let's you take a hot (live) point-in-time backup even while
IndexWriter continues making changes to the index.

As long as you hold that snapshot, all files it references will not be
deleted, just be sure to release it once you're done backing up.

Mike McCandless

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Taking backup of a Lucene index

2013-04-17 Thread Michael McCandless
On Wed, Apr 17, 2013 at 7:32 AM, Ashish Sarna  wrote:
> Thanks for your replies.
>
> However, in my scenario, an external backup utility would be used to take
> backup of the Lucene index. I just need to ensure that index do not get
> changed when a search is performed over it or due to internal Lucene
> housekeeping/optimization/merge activities.

That's fine.

You use the SnapshotDeletionPolicy to get a snapshot, and then you ask
that snapshot for the list of files, then tell your external utility
to copy those files.

Once that utility is done, you release the snapshot.

If you need the snapshot to persist even when you close the
IndexWriter/JVM and later open a new IndexWriter, use
PersistentSnapshotDeletionPolicy.

Mike McCandless

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: Taking backup of a Lucene index

2013-04-17 Thread Ashish Sarna
The external backup utility would be used by some other person and it would
simply copy the index directory to take its backup. I have no control over
this utility.

I have ensured that nothing would be written to index before the backup
utility is executed and now just need to ensure that it does not get changed
due to searches and or Lucene housekeeping activities. 

Is there a way to ensure this? 

Does using the IndexReader.open method with 'readOnly' flag passed as 'true'
would help keeping the indexes from modifying when a search is performed?

-Original Message-
From: Michael McCandless [mailto:luc...@mikemccandless.com] 
Sent: Wednesday, April 17, 2013 5:13 PM
To: java-user@lucene.apache.org
Subject: Re: Taking backup of a Lucene index

On Wed, Apr 17, 2013 at 7:32 AM, Ashish Sarna 
wrote:
> Thanks for your replies.
>
> However, in my scenario, an external backup utility would be used to take
> backup of the Lucene index. I just need to ensure that index do not get
> changed when a search is performed over it or due to internal Lucene
> housekeeping/optimization/merge activities.

That's fine.

You use the SnapshotDeletionPolicy to get a snapshot, and then you ask
that snapshot for the list of files, then tell your external utility
to copy those files.

Once that utility is done, you release the snapshot.

If you need the snapshot to persist even when you close the
IndexWriter/JVM and later open a new IndexWriter, use
PersistentSnapshotDeletionPolicy.

Mike McCandless

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Taking backup of a Lucene index

2013-04-17 Thread Hien Luu
It is difficult to associate a class named SnapshotDeletionPolicy with taking 
backup of Lucene index.

Hien



 From: Thomas Matthijs 
To: java-user@lucene.apache.org 
Sent: Wednesday, April 17, 2013 4:02 AM
Subject: Re: Taking backup of a Lucene index
 

On Wed, Apr 17, 2013 at 12:57 PM, Ashish Sarna wrote:

> I want to take back-up of a Lucene index. I need to ensure that index files
> would not change when I take their backup.
>


>
> I am concerned about the housekeeping/merge/optimization activities which
> Lucene performs internally. I am not sure when/how these activities are
> performed by Lucene and how we can prevent them.
>
>

Use a
http://lucene.apache.org/core/4_2_1/core/org/apache/lucene/index/SnapshotDeletionPolicy.html
Take a snapshot, backup/copy all the files in the commit, relase the
snapshot