Re: Question about the best way to replace existing docs in an index.

2005-01-10 Thread Miles Barr
On Fri, 2005-01-07 at 14:47 -0500, Jim Lynch wrote:
 My application for Lucene involves updating an existing index with a 
 mixture of new and revised documents.  From what I've been able to 
 dicern from reading I'm going to have to delete the old versions of the 
 revised documents before indexing them again.  Since this indexing will 
 probably take quite a while due to the number of new/revised documents 
 I'll be adding and the large number of documents already in the index, 
 I'm uncomfortable keeping an IndexReader and an IndexWriter open for 
 long periods of time.  

As I understand it you can't have an index reader which you do deletes
on and an index writer open at the same time since they are both doing
write operations. I think locking will prevent you from opening an index
writer once you do a delete on the reader.

So you're either going to have to open and close the reader and writer
for each update, or keep a list of duplicate references and a list of
documents to be updated, then do the deletes like:

for (Iterator it = toBeDeleted.iterator(); it.hasNext(); ) {
  Term term = new Term(Reference, (String) it.next());
  indexReader.delete(term);
}

Close the reader, open the writer, then iterate through your list of new
docs and write them to the index.

 Should I be concerned about keeping both an indexReader and indexWriter 
 open at the same time?  I'll have other processes probably making 
 searches during this time.  I'm not concerned about the searches not 
 finding the data I'm currently adding, I'm more concerned about locking 
 those searches out.  

Once you close your reader searches won't be possible. So once you've
done your deletes close the reader and open it again to release the
write lock before opening the writer.


-- 
Miles Barr [EMAIL PROTECTED]
Runtime Collective Ltd.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Question about the best way to replace existing docs in an index.

2005-01-10 Thread Jim Lynch
Miles,
Thanks for the tips.  I didn't see this response nor did I see my 
original email earlier, so I reposted the question, thinking I had 
forgotten to do so on Friday.  My apologies to the group for the double 
post.

Jim.
Miles Barr wrote:
On Fri, 2005-01-07 at 14:47 -0500, Jim Lynch wrote:
 

My application for Lucene involves updating an existing index with a 
mixture of new and revised documents.  From what I've been able to 
dicern from reading I'm going to have to delete the old versions of the 
revised documents before indexing them again.  Since this indexing will 
probably take quite a while due to the number of new/revised documents 
I'll be adding and the large number of documents already in the index, 
I'm uncomfortable keeping an IndexReader and an IndexWriter open for 
long periods of time.  
   

As I understand it you can't have an index reader which you do deletes
on and an index writer open at the same time since they are both doing
write operations. I think locking will prevent you from opening an index
writer once you do a delete on the reader.
So you're either going to have to open and close the reader and writer
for each update, or keep a list of duplicate references and a list of
documents to be updated, then do the deletes like:
for (Iterator it = toBeDeleted.iterator(); it.hasNext(); ) {
 Term term = new Term(Reference, (String) it.next());
 indexReader.delete(term);
}
Close the reader, open the writer, then iterate through your list of new
docs and write them to the index.
 

Should I be concerned about keeping both an indexReader and indexWriter 
open at the same time?  I'll have other processes probably making 
searches during this time.  I'm not concerned about the searches not 
finding the data I'm currently adding, I'm more concerned about locking 
those searches out.  
   

Once you close your reader searches won't be possible. So once you've
done your deletes close the reader and open it again to release the
write lock before opening the writer.
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]