Re: Lucene 2.1, soon

2007-01-18 Thread Doron Cohen
Sounds good to me. So it is IndexFileDeleter that can be used by applications to guarantee "their" NFS-safe behavior, namely preventing premature files deletions. Cool. We can probably sometimes write one such alternative, even in contrib. But, should enabling this way of extending IndexFileDele

Re: Lucene 2.1, soon

2007-01-18 Thread Chuck Williams
I need to support NFS and would not want to rely on the reader refreshing in X minutes. Setting X too small risks a query failure and setting X too large wastes disk space. X would need to be set for 100% reader availability, implying a large value and a lot of disk space waste. I like the idea

Re: Lucene 2.1, soon

2007-01-18 Thread Michael McCandless
Doron Cohen wrote: I am not happy with complicating the readers like this, conceptually adding back commit locks (for deletion), this time with a keep-a-life thread, and again making readers not read-only. To my understanding the only remaining issue with NFS is: a reader might get an IO excepti

Re: Lucene 2.1, soon

2007-01-18 Thread Marvin Humphrey
On Jan 18, 2007, at 2:59 PM, Doron Cohen wrote: To my understanding the only remaining issue with NFS is: a reader might get an IO exception in case writer removed an old file that the reader is using. It is not a possible corruption that we try to solve, right? For that I think it is not wor

Re: Lucene 2.1, soon

2007-01-18 Thread Marvin Humphrey
On Jan 18, 2007, at 2:24 PM, Michael McCandless wrote: I think we should decouple the deletion policy from commits. This way developers could subclass and make their own deletion policy that suits their application. But your excellent work has brought us so close to just handling all deleti

Re: Lucene 2.1, soon

2007-01-18 Thread Doron Cohen
I am not happy with complicating the readers like this, conceptually adding back commit locks (for deletion), this time with a keep-a-life thread, and again making readers not read-only. To my understanding the only remaining issue with NFS is: a reader might get an IO exception in case writer rem

Re: Decorative cache (and Hits.setSearcher)

2007-01-18 Thread karl wettin
18 jan 2007 kl. 23.24 skrev Chris Hostetter: now you change the underlying Searcher/IndexReader out from under the Hits, repacingit with an updated index in which many new documents have been added that contain both "Lucene" and "java" ... if you issued a brand new search against this index

[jira] Resolved: (LUCENE-773) Deprecate "create" method in FSDirectory.getDirectory in favor of IndexWriter's "create"

2007-01-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-773. --- Resolution: Fixed OK I committed this: * Added removal of old write lock in Index

Re: Lucene 2.1, soon

2007-01-18 Thread robert engels
The touching doesn't have to be that timely. If the indexed is configured to only keep old segments less than x hours old, you just check if any of the timestamps are with x hours, and if not you can delete the segments. So even if the reader is late updating the timestamp - they would need

Re: Lucene 2.1, soon

2007-01-18 Thread Marvin Humphrey
On Jan 18, 2007, at 2:17 PM, Michael McCandless wrote: How about if each reader were assigned a unique ID (eg hostname) by the application, and wrote a file ($ID.inuse or something) into the index dir referencing the segments_N that it's currently using? It would have to go in the old /tmp lo

Re: Lucene 2.1, soon

2007-01-18 Thread Michael McCandless
Marvin Humphrey wrote: On Jan 17, 2007, at 1:16 PM, Michael McCandless wrote: This is the solution I have in mind for LUCENE-710: change the IndexFileDeleter so that instead of always immediately deleting the last commit when a new commit happens, allow some time before doing so. This way rea

Re: Decorative cache (and Hits.setSearcher)

2007-01-18 Thread Chris Hostetter
: Looked in to this and came up with a perhaps better solution: adding : yet another layer of decoration of the searcher passed to the Hits at : construction time. This way the searcher can change without touching : the Hits. I.e. the decorated searcher in the searched passed to the : Hits will be

Re: Lucene 2.1, soon

2007-01-18 Thread robert engels
You would also have to add a requirement that readers touch the file every N minutes, otherwise dead users will prevent cleanup On Jan 18, 2007, at 4:17 PM, Michael McCandless wrote: Marvin Humphrey wrote: On Jan 18, 2007, at 1:58 PM, Chuck Williams wrote: How about a direct solution with a

Re: Lucene 2.1, soon

2007-01-18 Thread Michael McCandless
Marvin Humphrey wrote: On Jan 18, 2007, at 1:58 PM, Chuck Williams wrote: How about a direct solution with a reference count scheme? Segments files could be reference-counted, There would have to be a file where the refcounts are maintained. The problem is that if an IndexReader crashes,

Re: Lucene 2.1, soon

2007-01-18 Thread robert engels
This won't work with multiple JVMs attached to the same Lucene directory. All JVMs need to vote as whether or not certain segments can be deleted, since the others JVMS can't know this. How you do this... On Jan 18, 2007, at 3:58 PM, Chuck Williams wrote: How about a direct solution with

Re: Lucene 2.1, soon

2007-01-18 Thread Marvin Humphrey
On Jan 18, 2007, at 1:58 PM, Chuck Williams wrote: How about a direct solution with a reference count scheme? Segments files could be reference-counted, There would have to be a file where the refcounts are maintained. The problem is that if an IndexReader crashes, it could orphan a ref

Re: Lucene 2.1, soon

2007-01-18 Thread Chuck Williams
How about a direct solution with a reference count scheme? Segments files could be reference-counted, as well as individual segments either directly, possibly by interning SegmentInfo instances, or indirectly by reference counting all files via Directory. The most recent checkpoint and snapshot w

Re: Lucene 2.1, soon

2007-01-18 Thread Marvin Humphrey
I wrote: I'd be cool with making it impossible to put an index on an NFS volume prior to version 4. Elaborating and clarifying... IndexReader attempts to establish a read lock on the relevant segments_N file. It doesn't bother to see whether the locking attempt succeeds, though. Index

[jira] Resolved: (LUCENE-772) Lucene infinite loop? In FieldsReader.uncompress called from IndexSearcher.doc

2007-01-18 Thread Arthur Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arthur Smith resolved LUCENE-772. - Resolution: Fixed Fix Version/s: 2.1 Ok, running the 1-12-07 nightly build seems to have f

Re: Lucene 2.1, soon

2007-01-18 Thread Marvin Humphrey
On Jan 17, 2007, at 1:16 PM, Michael McCandless wrote: This is the solution I have in mind for LUCENE-710: change the IndexFileDeleter so that instead of always immediately deleting the last commit when a new commit happens, allow some time before doing so. This way readers have a chance to re

Re: Payloads

2007-01-18 Thread Marvin Humphrey
On Jan 18, 2007, at 8:59 AM, Grant Ingersoll wrote: I think one thing that would really bolster the flex. indexing format changes would be to have someone write another implementation for it so that we can iron out any interface details that may be needed. For instance, maybe the Kino mer

[jira] Commented: (LUCENE-580) Pre-analyzed fields

2007-01-18 Thread Karl Wettin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465811 ] Karl Wettin commented on LUCENE-580: Nadav Har'El [18/Jan/07 08:21 AM] > The description above suggests that it

Re: Payloads

2007-01-18 Thread Grant Ingersoll
I agree (and this has been discussed on this very thread in the past, see Doug's comments). I would love to have someone take a look at the flexible indexing patch that was submitted (I have looked a little at it, but it is going to need more than just me since it is a big change, although

Re: Payloads

2007-01-18 Thread Grant Ingersoll
Couldn't agree more. This is good progress. I like the payloads patch, but I would like to see the lazy prox stream (Lucene 761) stuff done (or at least details given on it) so that we can hook this into Similarity so that it can be hooked into scoring. For 761 and the payload stuff, we n

Re: Payloads

2007-01-18 Thread Marvin Humphrey
On Jan 18, 2007, at 8:31 AM, Michael Busch wrote: I think it makes sense to add new functions incrementally, as long as we try to only extend the API in a way, so that it is compatible with the long-term goal, as Doug suggested already. After the payload patch is committed we can work on a

Re: Payloads

2007-01-18 Thread Michael Busch
Nadav Har'El wrote: On Thu, Jan 18, 2007, Michael Busch wrote about "Re: Payloads": As you pointed out it is still possible to have per-doc payloads. You need an analyzer which adds just one Token with payload to a specific field for each doc. I understand that this code would be quite ugly

Re: Payloads

2007-01-18 Thread Michael Busch
Grant Ingersoll wrote: Just to put in two cents: the Flexible Indexing thread has also talked about the notion of being able to store arbitrary data at: token, field, doc and Index level. -Grant Yes I agree that this should be the long-term goal. The payload feature is just a first step in

[jira] Commented: (LUCENE-580) Pre-analyzed fields

2007-01-18 Thread Nadav Har'El (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465797 ] Nadav Har'El commented on LUCENE-580: - This patch will be useful for users LUCENE-755, the payloads patch. That p

Re: Payloads

2007-01-18 Thread Grant Ingersoll
Just to put in two cents: the Flexible Indexing thread has also talked about the notion of being able to store arbitrary data at: token, field, doc and Index level. -Grant On Jan 18, 2007, at 11:01 AM, Nadav Har'El wrote: On Thu, Jan 18, 2007, Michael Busch wrote about "Re: Payloads": As

Re: Payloads

2007-01-18 Thread Nadav Har'El
On Thu, Jan 18, 2007, Michael Busch wrote about "Re: Payloads": > As you pointed out it is still possible to have per-doc payloads. You > need an analyzer which adds just one Token with payload to a specific > field for each doc. I understand that this code would be quite ugly on > the app side.

Re: Payloads

2007-01-18 Thread Michael Busch
Nadav Har'El wrote: Hi Michael, For some uses (e.g., faceted search), one wants to add a payload to each document, not per position for some text field. In the faceted search example, we could use payloads to encode the list of facets that each document belongs to. For this, with the old API, y

[jira] Commented: (LUCENE-761) Clone proxStream lazily in SegmentTermPositions

2007-01-18 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465753 ] Michael Busch commented on LUCENE-761: -- Grant, your are absolutely right, 755 does not block this issue. The re

Re: Payloads

2007-01-18 Thread Michael Busch
Doug, sorry for the late response. I was on vacation after New Year's... oh btw. Happy New Year to everyone! :-) Doug Cutting wrote: Michael Busch wrote: Yes I could introduce a new class called e.g. PayloadToken that extends Token (good that it is not final anymore). Not sure if I understa

contrib packaging

2007-01-18 Thread karl wettin
is org.apache.lucene.index.facade an OK package? or should it be org.apache.lucene.indexfacade? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Decorative cache (and Hits.setSearcher)

2007-01-18 Thread karl wettin
Hoss Man [15/Jan/07 12:16 AM] 14 jan 2007 kl. 18.18 skrev karl wettin: I'm not certain I understand the possible effects of replacing the searcher in an instance of Hits, but this is what I do in order to keep cached instances valid when index is updated with changes that does not affect

Re: Decorative cache (and Hits.setSearcher)

2007-01-18 Thread karl wettin
Hoss Man [15/Jan/07 12:16 AM] 14 jan 2007 kl. 18.18 skrev karl wettin: I'm not certain I understand the possible effects of replacing the searcher in an instance of Hits, but this is what I do in order to keep cached instances valid when index is updated with changes that does not affect