robert engels <[EMAIL PROTECTED]> wrote on 15/01/2007 16:37:35:
> I did a cursory review of the discussion.
>
> The problem I see is that in the checkpoint tx files you need a
> 'delete file' for every segment where a deletion SHOULD occur when it
> is commited, but if you have multiple open trans
OK, catching up here and trying to merge threads together otherwise
I'm going to lose my mind!:
Chuck Williams wrote:
>
> Ning Li wrote:
>>
>> If a reader can only open snapshots both for search and for
>> modification, I think another change is needed besides the ones
>> listed: assume the lates
Chuck Williams wrote:
Michael McCandless wrote on 01/15/2007 01:49 AM:
Chuck,
Possibly related, one of the ways I improved concurrency in
ParallelWriter was to break up IndexWriter.addDocument() into one method
to invert the document and create a RAMSegment and a second method that
takes the R
[
https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-756:
--
Attachment: LUCENE-756-Jan16.patch
> Maintain norms in a single file .nrm
> ---
[
https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-756:
--
Attachment: index.premergednorms.nocfs.zip
> Maintain norms in a single file .nrm
> ---
[
https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-756:
--
Attachment: index.premergednorms.cfs.zip
> Maintain norms in a single file .nrm
> -
[
https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless reopened LUCENE-756:
---
I would like to propose some small improvements to this nice feature.
I've worked out a
[
https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465208
]
Yonik Seeley commented on LUCENE-756:
-
I agree that reducing the IO operations on an index open is a good thing.
On 1/16/07, Michael McCandless <[EMAIL PROTECTED]> wrote:
Good catch Ning! And, I agree, when a reader plans to make
modifications to the index, I think the best solution is to require
that the reader has opened most recent "segments*_N" (be that a
snapshot or a checkpoint). Really a reader is
On 1/15/07, Chuck Williams <[EMAIL PROTECTED]> wrote:
(Side thought: I've been wondering how hard it would
be to make merging not a critical section).
It would be very nice if segment merging didn't block the addition of
new documents... it really doesn't need to. I don't think it would be
to
[
https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465214
]
Michael McCandless commented on LUCENE-756:
---
> No hard rule on this, but IMO that may be a small enough win
[
https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465216
]
Yonik Seeley commented on LUCENE-756:
-
As an aside, I think we need to start making more frequent releases... the
Lucene 2.1 has been a long time in coming, but I think we should plan
on making a release when the file format changes settle down.
After that, I think we should start making more frequent releases,
which should make make many people's lives easier by
1) give people something more recent to work
On 1/16/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:
On 1/15/07, Chuck Williams <[EMAIL PROTECTED]> wrote:
> (Side thought: I've been wondering how hard it would
> be to make merging not a critical section).
It would be very nice if segment merging didn't block the addition of
new documents... i
+1
Was thinking the same thing this morning. The changes.txt 2.1
section is getting quite long.
On Jan 16, 2007, at 12:16 PM, Yonik Seeley wrote:
Lucene 2.1 has been a long time in coming, but I think we should plan
on making a release when the file format changes settle down.
After that,
Same here. As soon as the file format changes settle down.
Otis
- Original Message
From: Grant Ingersoll <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Tuesday, January 16, 2007 12:26:43 PM
Subject: Re: Lucene 2.1, soon
+1
Was thinking the same thing this morning. The chang
[
https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-756:
--
Attachment: LUCENE-756-Jan16.Take2.patch
> Maintain norms in a single file .nrm
> -
[
https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465230
]
Michael McCandless commented on LUCENE-756:
---
OK, take two! I attached LUCENE-756-Jan16.Take2.patch
I remo
+1 for releasing 2.1 soon.
I hope to get explicit commits (LUCENE-710) working, which has a tiny
file format change, and LUCENE-773 (deprecate FSDirectory.getDirectory
methods that take a create arg) completed soon, so we can get them
into 2.1, if possible.
Also +1 on more frequent releases afte
[
https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465240
]
Chuck Williams commented on LUCENE-756:
---
I may have the only app that will be broken by the 10-day backwards
i
[
https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465250
]
Michael McCandless commented on LUCENE-756:
---
Actually, if you apply my first change above, regen your index
[
https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465260
]
Doron Cohen commented on LUCENE-756:
Michael, I like this improvement!
(At first I considered adding such FORMAT
[
https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465265
]
Doug Cutting commented on LUCENE-756:
-
> the term "merged" (in hasMergedNorms) is a little overloaded with other
[
https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465271
]
Doron Cohen commented on LUCENE-756:
Catenated?
> Maintain norms in a single file .nrm
> ---
[
https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465273
]
Doron Cohen commented on LUCENE-756:
Just to let you know - I checked this with recent patch for Lucene-741 (Fiel
[
https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465274
]
Michael McCandless commented on LUCENE-756:
---
OK thanks Doron. I will make the fixes you suggested!
I like
Ning Li wrote:
On 1/16/07, Michael McCandless <[EMAIL PROTECTED]> wrote:
Good catch Ning! And, I agree, when a reader plans to make
modifications to the index, I think the best solution is to require
that the reader has opened most recent "segments*_N" (be that a
snapshot or a checkpoint). Rea
Michael McCandless <[EMAIL PROTECTED]> wrote on 16/01/2007
12:13:47:
> Ning Li wrote:
> > On 1/16/07, Michael McCandless <[EMAIL PROTECTED]> wrote:
> >> Good catch Ning! And, I agree, when a reader plans to make
> >> modifications to the index, I think the best solution is to require
> >> that th
[
https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless resolved LUCENE-756.
---
Resolution: Fixed
Fix Version/s: 2.1
OK I committed the fix (changed the name
I really wish Doug would comment on all of these proposed changes...
I seems that after you account for all of the constraints (e.g.
IndexReader must be current snashot...) you are going to end up right
back where you started.
It propose that this work should be done in some sort of facade
Doron Cohen wrote:
Michael McCandless <[EMAIL PROTECTED]> wrote on 16/01/2007
12:13:47:
Ning Li wrote:
On 1/16/07, Michael McCandless <[EMAIL PROTECTED]> wrote:
Good catch Ning! And, I agree, when a reader plans to make
modifications to the index, I think the best solution is to require
that
robert engels wrote:
I really wish Doug would comment on all of these proposed changes...
I wish he would too!
Ideally the segments file would only be updated when one commits, by
closing the index, or perhaps by calling a new method. So, if you
abort, all documents added since the last com
On 1/16/07, Doug Cutting <[EMAIL PROTECTED]> wrote:
Remind me, why do we have to update the segments file except at close?
I'm sure there's a good reason, and that's central to this discussion.
If segments are removed because of a merge, a new reader coming along
will have problems opening the
Yonik Seeley wrote:
If segments are removed because of a merge, a new reader coming along
will have problems opening the index if the segments file isn't
updated to reflect that.
One could keep around all old segments until a close() but that would
cost disk space.
Won't "explicit commits" hav
On Jan 16, 2007, at 3:55 PM, Michael McCandless wrote:
Doron Cohen wrote:
Michael McCandless <[EMAIL PROTECTED]> wrote on 16/01/2007
12:13:47:
Ning Li wrote:
Re those 2 ideas: I do agree the whole division of certain kinds of
index changes into a reader and other ones into a writer, is confu
You have the same problem if there is an existing reader open, so
what is the difference? You can't remove the segments there either.
On Jan 16, 2007, at 3:18 PM, Yonik Seeley wrote:
On 1/16/07, Doug Cutting <[EMAIL PROTECTED]> wrote:
Remind me, why do we have to update the segments file exce
On 1/16/07, robert engels <[EMAIL PROTECTED]> wrote:
You have the same problem if there is an existing reader open, so
what is the difference? You can't remove the segments there either.
The disk space for the segments is currently removed if no one has
them open... this is quite a bit differen
Yes it is !
This is what I am getting at when I said the design is moving all
over the place.
The thread is "explicit commits", so I apologize for getting lost.
It just seems that we should design a new high-level class class
Repository and design that API. It might use Lucene IndexReader
Yonik Seeley wrote on 01/16/2007 11:29 AM:
> On 1/16/07, robert engels <[EMAIL PROTECTED]> wrote:
>> You have the same problem if there is an existing reader open, so
>> what is the difference? You can't remove the segments there either.
>
> The disk space for the segments is currently removed if n
Doug Cutting wrote:
robert engels wrote:
I really wish Doug would comment on all of these proposed changes...
I wish he would too!
Ideally the segments file would only be updated when one commits, by
closing the index, or perhaps by calling a new method. So, if you
abort, all documents add
Michael McCandless wrote:
We could indeed simply tie "close" to mean "commit now", and not add a
separate "commit" method.
But what about the "bulk delete then bulk add" case? Ideally if a
reader refreshes by checking "isCurrent()" it shouldn't ever open the
index "at a bad time". Ie, we need
Yonik Seeley wrote:
On 1/16/07, robert engels <[EMAIL PROTECTED]> wrote:
You have the same problem if there is an existing reader open, so
what is the difference? You can't remove the segments there either.
The disk space for the segments is currently removed if no one has
them open... this is
On Jan 16, 2007, at 1:51 PM, Doug Cutting wrote:
One could also implement this with a Directory that permits
checkpointing and rollback. Would that be any simpler?
FWIW, explicit commits, including deletes from the IndexWriter class,
come along for the ride with the KinoSearch merge model
Doug Cutting wrote:
Michael McCandless wrote:
We could indeed simply tie "close" to mean "commit now", and not add a
separate "commit" method.
But what about the "bulk delete then bulk add" case? Ideally if a
reader refreshes by checking "isCurrent()" it shouldn't ever open the
index "at a bad
Late response...
On Jan 12, 2007, at 3:02 AM, Michael McCandless wrote:
Now that readers are read-only, I think it makes sense to default the
write lock into the index directory, and as you describe, no longer
generate a "unique namespace" hash lock ID since the index dir gives
us that scoping.
On Jan 12, 2007, at 3:57 AM, Michael McCandless wrote:
Chris Hostetter wrote:
: I think we should deprecate the "create" argument to
: FSDirectory.getDirectory(*) and leave only the create argument in
: IndexWriter's constructors. Am I missing something? Is there
are a
: reason not to do
What is the problem with implementing the KinoSearch model for
Lucene? It seems this would solve nearly all of these issues in a
very srtaightfoward way.
BTW, the KinoSearch model is nearly exactly what we did when we our
original implementation of IndexReader/Writer wrote directly to JDBC.
Yonik Seeley wrote:
One could keep around all old segments until a close() but that would
cost disk space.
One could optimize that so that intermediate segments, created since
open, would be deleted. So, for example, batch indexing starting with
an empty index could freely delete segments as
Michael McCandless wrote on 01/16/2007 12:09 PM:
> Doug Cutting wrote:
>> Michael McCandless wrote:
>>> We could indeed simply tie "close" to mean "commit now", and not add a
>>> separate "commit" method.
>>>
>>> But what about the "bulk delete then bulk add" case? Ideally if a
>>> reader refreshe
On Tue, 16 Jan 2007, Doug Cutting wrote:
Michael McCandless wrote:
We could indeed simply tie "close" to mean "commit now", and not add a
separate "commit" method.
But what about the "bulk delete then bulk add" case? Ideally if a
reader refreshes by checking "isCurrent()" it shouldn't ever o
Doug Cutting wrote:
Yonik Seeley wrote:
One could keep around all old segments until a close() but that would
cost disk space.
One could optimize that so that intermediate segments, created since
open, would be deleted. So, for example, batch indexing starting with
an empty index could free
On Jan 16, 2007, at 2:30 PM, robert engels wrote:
What is the problem with implementing the KinoSearch model for
Lucene? It seems this would solve nearly all of these issues in a
very srtaightfoward way.
It's a major undertaking, and the only developer sufficiently
motivated thus far has
52 matches
Mail list logo