Re: [RFE] IndexWriter.updateDocument()

2004-12-14 Thread David Spencer
petite_abeille wrote:
Well, the subject says it all...
If there is one thing which is overly cumbersome in Lucene, it's 
updating documents, therefore this Request For Enhancement:

Please consider enhancing the IndexWriter API to include an 
updateDocument(...) method to take care of all the gory details involved 
in such operation.
I agree, this is always a hassle to do right due to having to use 
IndexWriter and IndexReader and properly opening/closing them.

I have a prelim version of a batched index writer that I use. The code 
is kinda messy, but for discussion here's what it does:

Briefly the methods are:
// [1]
// the ctr has parameters:
//'batch size # docs' e.g. it will flush pending updates every 100 docs
//'batch freq' e.g. auto flush every 60 sec
// [2]
// queue a document to be added to the index
// 'key' is the primary key name e.g. url
// 'val' is the primary key val e.g. http://www.tropo.com/;
// 'doc' is the doc to be added
update( String key, String val, Document doc)
// [3]
//  queue a document for removal
// 'key' and 'val' are the params, as from [2]
remove( String key, String val)
// [4]
// periodic flush, called automatically or on demand, 2 stages:
// 1. call IndexReader.delete() on all pending (key,val) pairs
// 2. close IndexReader
// 3. call IndexWriter.add() on all pending documents
// 4. optionally call optimze()
// 5. close IndexWriter
flush()

//
So in normal usage you just keep calling update() and it peridically 
flushes the pending updates to the index. By its nature this uses memory 
however it's tunable as to how many documents it'll queue in memory.

Does the algorithm above, esp flush(), sound correct? It seems to work 
right for me and I can post this if people want to see it...

- Dave

Thanks in advance.
Cheers,
PA.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


[RFE] IndexWriter.updateDocument()

2004-12-14 Thread petite_abeille
Well, the subject says it all...
If there is one thing which is overly cumbersome in Lucene, it's 
updating documents, therefore this Request For Enhancement:

Please consider enhancing the IndexWriter API to include an 
updateDocument(...) method to take care of all the gory details 
involved in such operation.

Thanks in advance.
Cheers,
PA.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]