Re: allowing applications to control docids change? (e.g. setKeepDeletes(boolean)?)

2007-01-16 Thread Doron Cohen
robert engels <[EMAIL PROTECTED]> wrote on 15/01/2007 16:37:35: > I did a cursory review of the discussion. > > The problem I see is that in the checkpoint tx files you need a > 'delete file' for every segment where a deletion SHOULD occur when it > is commited, but if you have multiple open trans

Re: adding "explicit commits" to Lucene?

2007-01-16 Thread Michael McCandless
OK, catching up here and trying to merge threads together otherwise I'm going to lose my mind!: Chuck Williams wrote: > > Ning Li wrote: >> >> If a reader can only open snapshots both for search and for >> modification, I think another change is needed besides the ones >> listed: assume the lates

Re: adding "explicit commits" to Lucene?

2007-01-16 Thread Michael McCandless
Chuck Williams wrote: Michael McCandless wrote on 01/15/2007 01:49 AM: Chuck, Possibly related, one of the ways I improved concurrency in ParallelWriter was to break up IndexWriter.addDocument() into one method to invert the document and create a RAMSegment and a second method that takes the R

[jira] Updated: (LUCENE-756) Maintain norms in a single file .nrm

2007-01-16 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-756: -- Attachment: LUCENE-756-Jan16.patch > Maintain norms in a single file .nrm > ---

[jira] Updated: (LUCENE-756) Maintain norms in a single file .nrm

2007-01-16 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-756: -- Attachment: index.premergednorms.nocfs.zip > Maintain norms in a single file .nrm > ---

[jira] Updated: (LUCENE-756) Maintain norms in a single file .nrm

2007-01-16 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-756: -- Attachment: index.premergednorms.cfs.zip > Maintain norms in a single file .nrm > -

[jira] Reopened: (LUCENE-756) Maintain norms in a single file .nrm

2007-01-16 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reopened LUCENE-756: --- I would like to propose some small improvements to this nice feature. I've worked out a

[jira] Commented: (LUCENE-756) Maintain norms in a single file .nrm

2007-01-16 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465208 ] Yonik Seeley commented on LUCENE-756: - I agree that reducing the IO operations on an index open is a good thing.

Re: adding "explicit commits" to Lucene?

2007-01-16 Thread Ning Li
On 1/16/07, Michael McCandless <[EMAIL PROTECTED]> wrote: Good catch Ning! And, I agree, when a reader plans to make modifications to the index, I think the best solution is to require that the reader has opened most recent "segments*_N" (be that a snapshot or a checkpoint). Really a reader is

Re: adding "explicit commits" to Lucene?

2007-01-16 Thread Yonik Seeley
On 1/15/07, Chuck Williams <[EMAIL PROTECTED]> wrote: (Side thought: I've been wondering how hard it would be to make merging not a critical section). It would be very nice if segment merging didn't block the addition of new documents... it really doesn't need to. I don't think it would be to

[jira] Commented: (LUCENE-756) Maintain norms in a single file .nrm

2007-01-16 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465214 ] Michael McCandless commented on LUCENE-756: --- > No hard rule on this, but IMO that may be a small enough win

[jira] Commented: (LUCENE-756) Maintain norms in a single file .nrm

2007-01-16 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465216 ] Yonik Seeley commented on LUCENE-756: - As an aside, I think we need to start making more frequent releases... the

Lucene 2.1, soon

2007-01-16 Thread Yonik Seeley
Lucene 2.1 has been a long time in coming, but I think we should plan on making a release when the file format changes settle down. After that, I think we should start making more frequent releases, which should make make many people's lives easier by 1) give people something more recent to work

Re: adding "explicit commits" to Lucene?

2007-01-16 Thread Ning Li
On 1/16/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 1/15/07, Chuck Williams <[EMAIL PROTECTED]> wrote: > (Side thought: I've been wondering how hard it would > be to make merging not a critical section). It would be very nice if segment merging didn't block the addition of new documents... i

Re: Lucene 2.1, soon

2007-01-16 Thread Grant Ingersoll
+1 Was thinking the same thing this morning. The changes.txt 2.1 section is getting quite long. On Jan 16, 2007, at 12:16 PM, Yonik Seeley wrote: Lucene 2.1 has been a long time in coming, but I think we should plan on making a release when the file format changes settle down. After that,

Re: Lucene 2.1, soon

2007-01-16 Thread Otis Gospodnetic
Same here. As soon as the file format changes settle down. Otis - Original Message From: Grant Ingersoll <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Tuesday, January 16, 2007 12:26:43 PM Subject: Re: Lucene 2.1, soon +1 Was thinking the same thing this morning. The chang

[jira] Updated: (LUCENE-756) Maintain norms in a single file .nrm

2007-01-16 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-756: -- Attachment: LUCENE-756-Jan16.Take2.patch > Maintain norms in a single file .nrm > -

[jira] Commented: (LUCENE-756) Maintain norms in a single file .nrm

2007-01-16 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465230 ] Michael McCandless commented on LUCENE-756: --- OK, take two! I attached LUCENE-756-Jan16.Take2.patch I remo

Re: Lucene 2.1, soon

2007-01-16 Thread Michael McCandless
+1 for releasing 2.1 soon. I hope to get explicit commits (LUCENE-710) working, which has a tiny file format change, and LUCENE-773 (deprecate FSDirectory.getDirectory methods that take a create arg) completed soon, so we can get them into 2.1, if possible. Also +1 on more frequent releases afte

[jira] Commented: (LUCENE-756) Maintain norms in a single file .nrm

2007-01-16 Thread Chuck Williams (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465240 ] Chuck Williams commented on LUCENE-756: --- I may have the only app that will be broken by the 10-day backwards i

[jira] Commented: (LUCENE-756) Maintain norms in a single file .nrm

2007-01-16 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465250 ] Michael McCandless commented on LUCENE-756: --- Actually, if you apply my first change above, regen your index

[jira] Commented: (LUCENE-756) Maintain norms in a single file .nrm

2007-01-16 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465260 ] Doron Cohen commented on LUCENE-756: Michael, I like this improvement! (At first I considered adding such FORMAT

[jira] Commented: (LUCENE-756) Maintain norms in a single file .nrm

2007-01-16 Thread Doug Cutting (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465265 ] Doug Cutting commented on LUCENE-756: - > the term "merged" (in hasMergedNorms) is a little overloaded with other

[jira] Commented: (LUCENE-756) Maintain norms in a single file .nrm

2007-01-16 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465271 ] Doron Cohen commented on LUCENE-756: Catenated? > Maintain norms in a single file .nrm > ---

[jira] Commented: (LUCENE-756) Maintain norms in a single file .nrm

2007-01-16 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465273 ] Doron Cohen commented on LUCENE-756: Just to let you know - I checked this with recent patch for Lucene-741 (Fiel

[jira] Commented: (LUCENE-756) Maintain norms in a single file .nrm

2007-01-16 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465274 ] Michael McCandless commented on LUCENE-756: --- OK thanks Doron. I will make the fixes you suggested! I like

Re: adding "explicit commits" to Lucene?

2007-01-16 Thread Michael McCandless
Ning Li wrote: On 1/16/07, Michael McCandless <[EMAIL PROTECTED]> wrote: Good catch Ning! And, I agree, when a reader plans to make modifications to the index, I think the best solution is to require that the reader has opened most recent "segments*_N" (be that a snapshot or a checkpoint). Rea

Re: adding "explicit commits" to Lucene?

2007-01-16 Thread Doron Cohen
Michael McCandless <[EMAIL PROTECTED]> wrote on 16/01/2007 12:13:47: > Ning Li wrote: > > On 1/16/07, Michael McCandless <[EMAIL PROTECTED]> wrote: > >> Good catch Ning! And, I agree, when a reader plans to make > >> modifications to the index, I think the best solution is to require > >> that th

[jira] Resolved: (LUCENE-756) Maintain norms in a single file .nrm

2007-01-16 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-756. --- Resolution: Fixed Fix Version/s: 2.1 OK I committed the fix (changed the name

Re: adding "explicit commits" to Lucene?

2007-01-16 Thread robert engels
I really wish Doug would comment on all of these proposed changes... I seems that after you account for all of the constraints (e.g. IndexReader must be current snashot...) you are going to end up right back where you started. It propose that this work should be done in some sort of facade

Re: adding "explicit commits" to Lucene?

2007-01-16 Thread Michael McCandless
Doron Cohen wrote: Michael McCandless <[EMAIL PROTECTED]> wrote on 16/01/2007 12:13:47: Ning Li wrote: On 1/16/07, Michael McCandless <[EMAIL PROTECTED]> wrote: Good catch Ning! And, I agree, when a reader plans to make modifications to the index, I think the best solution is to require that

Re: adding "explicit commits" to Lucene?

2007-01-16 Thread Doug Cutting
robert engels wrote: I really wish Doug would comment on all of these proposed changes... I wish he would too! Ideally the segments file would only be updated when one commits, by closing the index, or perhaps by calling a new method. So, if you abort, all documents added since the last com

Re: adding "explicit commits" to Lucene?

2007-01-16 Thread Yonik Seeley
On 1/16/07, Doug Cutting <[EMAIL PROTECTED]> wrote: Remind me, why do we have to update the segments file except at close? I'm sure there's a good reason, and that's central to this discussion. If segments are removed because of a merge, a new reader coming along will have problems opening the

Re: adding "explicit commits" to Lucene?

2007-01-16 Thread Doug Cutting
Yonik Seeley wrote: If segments are removed because of a merge, a new reader coming along will have problems opening the index if the segments file isn't updated to reflect that. One could keep around all old segments until a close() but that would cost disk space. Won't "explicit commits" hav

Re: adding "explicit commits" to Lucene?

2007-01-16 Thread Grant Ingersoll
On Jan 16, 2007, at 3:55 PM, Michael McCandless wrote: Doron Cohen wrote: Michael McCandless <[EMAIL PROTECTED]> wrote on 16/01/2007 12:13:47: Ning Li wrote: Re those 2 ideas: I do agree the whole division of certain kinds of index changes into a reader and other ones into a writer, is confu

Re: adding "explicit commits" to Lucene?

2007-01-16 Thread robert engels
You have the same problem if there is an existing reader open, so what is the difference? You can't remove the segments there either. On Jan 16, 2007, at 3:18 PM, Yonik Seeley wrote: On 1/16/07, Doug Cutting <[EMAIL PROTECTED]> wrote: Remind me, why do we have to update the segments file exce

Re: adding "explicit commits" to Lucene?

2007-01-16 Thread Yonik Seeley
On 1/16/07, robert engels <[EMAIL PROTECTED]> wrote: You have the same problem if there is an existing reader open, so what is the difference? You can't remove the segments there either. The disk space for the segments is currently removed if no one has them open... this is quite a bit differen

Re: adding "explicit commits" to Lucene?

2007-01-16 Thread robert engels
Yes it is ! This is what I am getting at when I said the design is moving all over the place. The thread is "explicit commits", so I apologize for getting lost. It just seems that we should design a new high-level class class Repository and design that API. It might use Lucene IndexReader

Re: adding "explicit commits" to Lucene?

2007-01-16 Thread Chuck Williams
Yonik Seeley wrote on 01/16/2007 11:29 AM: > On 1/16/07, robert engels <[EMAIL PROTECTED]> wrote: >> You have the same problem if there is an existing reader open, so >> what is the difference? You can't remove the segments there either. > > The disk space for the segments is currently removed if n

Re: adding "explicit commits" to Lucene?

2007-01-16 Thread Michael McCandless
Doug Cutting wrote: robert engels wrote: I really wish Doug would comment on all of these proposed changes... I wish he would too! Ideally the segments file would only be updated when one commits, by closing the index, or perhaps by calling a new method. So, if you abort, all documents add

Re: adding "explicit commits" to Lucene?

2007-01-16 Thread Doug Cutting
Michael McCandless wrote: We could indeed simply tie "close" to mean "commit now", and not add a separate "commit" method. But what about the "bulk delete then bulk add" case? Ideally if a reader refreshes by checking "isCurrent()" it shouldn't ever open the index "at a bad time". Ie, we need

Re: adding "explicit commits" to Lucene?

2007-01-16 Thread Michael McCandless
Yonik Seeley wrote: On 1/16/07, robert engels <[EMAIL PROTECTED]> wrote: You have the same problem if there is an existing reader open, so what is the difference? You can't remove the segments there either. The disk space for the segments is currently removed if no one has them open... this is

Re: adding "explicit commits" to Lucene?

2007-01-16 Thread Marvin Humphrey
On Jan 16, 2007, at 1:51 PM, Doug Cutting wrote: One could also implement this with a Directory that permits checkpointing and rollback. Would that be any simpler? FWIW, explicit commits, including deletes from the IndexWriter class, come along for the ride with the KinoSearch merge model

Re: adding "explicit commits" to Lucene?

2007-01-16 Thread Michael McCandless
Doug Cutting wrote: Michael McCandless wrote: We could indeed simply tie "close" to mean "commit now", and not add a separate "commit" method. But what about the "bulk delete then bulk add" case? Ideally if a reader refreshes by checking "isCurrent()" it shouldn't ever open the index "at a bad

Re: Lockless commits -- great stuff!

2007-01-16 Thread Marvin Humphrey
Late response... On Jan 12, 2007, at 3:02 AM, Michael McCandless wrote: Now that readers are read-only, I think it makes sense to default the write lock into the index directory, and as you describe, no longer generate a "unique namespace" hash lock ID since the index dir gives us that scoping.

Re: [jira] Commented: (LUCENE-140) docs out of order

2007-01-16 Thread Marvin Humphrey
On Jan 12, 2007, at 3:57 AM, Michael McCandless wrote: Chris Hostetter wrote: : I think we should deprecate the "create" argument to : FSDirectory.getDirectory(*) and leave only the create argument in : IndexWriter's constructors. Am I missing something? Is there are a : reason not to do

Re: Lockless commits -- great stuff!

2007-01-16 Thread robert engels
What is the problem with implementing the KinoSearch model for Lucene? It seems this would solve nearly all of these issues in a very srtaightfoward way. BTW, the KinoSearch model is nearly exactly what we did when we our original implementation of IndexReader/Writer wrote directly to JDBC.

Re: adding "explicit commits" to Lucene?

2007-01-16 Thread Doug Cutting
Yonik Seeley wrote: One could keep around all old segments until a close() but that would cost disk space. One could optimize that so that intermediate segments, created since open, would be deleted. So, for example, batch indexing starting with an empty index could freely delete segments as

Re: adding "explicit commits" to Lucene?

2007-01-16 Thread Chuck Williams
Michael McCandless wrote on 01/16/2007 12:09 PM: > Doug Cutting wrote: >> Michael McCandless wrote: >>> We could indeed simply tie "close" to mean "commit now", and not add a >>> separate "commit" method. >>> >>> But what about the "bulk delete then bulk add" case? Ideally if a >>> reader refreshe

Re: adding "explicit commits" to Lucene?

2007-01-16 Thread Andi Vajda
On Tue, 16 Jan 2007, Doug Cutting wrote: Michael McCandless wrote: We could indeed simply tie "close" to mean "commit now", and not add a separate "commit" method. But what about the "bulk delete then bulk add" case? Ideally if a reader refreshes by checking "isCurrent()" it shouldn't ever o

Re: adding "explicit commits" to Lucene?

2007-01-16 Thread Michael McCandless
Doug Cutting wrote: Yonik Seeley wrote: One could keep around all old segments until a close() but that would cost disk space. One could optimize that so that intermediate segments, created since open, would be deleted. So, for example, batch indexing starting with an empty index could free

Re: Lockless commits -- great stuff!

2007-01-16 Thread Marvin Humphrey
On Jan 16, 2007, at 2:30 PM, robert engels wrote: What is the problem with implementing the KinoSearch model for Lucene? It seems this would solve nearly all of these issues in a very srtaightfoward way. It's a major undertaking, and the only developer sufficiently motivated thus far has