Re: Incremental Field Updates

2010-03-29 Thread Mark Harwood
Of course introducing the idea of updates also introduces the notion of a primary key and there's probably an entirely separate discussion to be had around user-supplied vs Lucene-generated keys. Not sure I see that need. Can you explain your reasoning a bit more? If you want to update a

[jira] Updated: (LUCENE-2351) optimize automatonquery

2010-03-29 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2351: Attachment: LUCENE-2351.patch attached is a new approach: * rids of linearmode * adds real

Re: Incremental Field Updates

2010-03-29 Thread Earwin Burrfoot
Of course introducing the idea of updates also introduces the notion of a primary key and there's probably an entirely separate discussion to be had around user-supplied vs Lucene-generated keys. Not sure I see that need.  Can you explain your reasoning a bit more? If you want to update a

Re: Incremental Field Updates

2010-03-29 Thread Mark Harwood
On 29 Mar 2010, at 07:45, Earwin Burrfoot ear...@gmail.com wrote: Of course introducing the idea of updates also introduces the notion of a primary key and there's probably an entirely separate discussion to be had around user-supplied vs Lucene-generated keys. Not sure I see that need. Can

Re: Incremental Field Updates

2010-03-29 Thread Earwin Burrfoot
Of course introducing the idea of updates also introduces the notion of a primary key and there's probably an entirely separate discussion to be had around user-supplied vs Lucene-generated keys. Not sure I see that need.  Can you explain your reasoning a bit more? If you want to update a

Re: Incremental Field Updates

2010-03-29 Thread mark harwood
I can delete by lucene-generated docId. Which users used to have to find by first coding a primary-key-term search. Delete by term removed this step to make life easier. If someone needs this, it can be built over lucene, without introducing it as a core feature and needlessly complicating

[jira] Commented: (LUCENE-2351) optimize automatonquery

2010-03-29 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850844#action_12850844 ] Michael McCandless commented on LUCENE-2351: OOOH I like this approach!! It

Re: Incremental Field Updates

2010-03-29 Thread Earwin Burrfoot
If someone needs this, it can be built over lucene, without introducing it as a core feature and needlessly complicating things. I think with any partial-update feature the *absence* of primary key support would  needlessly complicate things: If Lucene is not capable of performing duplicate

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2010-03-29 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850857#action_12850857 ] Michael McCandless commented on LUCENE-2324: Yeah I think we're gonna need the

Re: Incremental Field Updates

2010-03-29 Thread mark harwood
Variant d) sounds most logical? And enables all sorts of fun stuff. So the duplicate-key docs can have different values for initial-insert fields but partial updates will cause sharing of a common field value? And subsequent same-key doc inserts do or don't share these previous partial-update

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2010-03-29 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850864#action_12850864 ] Michael McCandless commented on LUCENE-2324: bq. Mike, can you explain what

Re: Incremental Field Updates

2010-03-29 Thread Earwin Burrfoot
Variant d) sounds most logical? And enables all sorts of fun stuff. So the duplicate-key docs can have different values for initial-insert fields but partial updates will cause sharing of  a common field value? And subsequent same-key doc inserts do or don't share these previous

Re: Incremental Field Updates

2010-03-29 Thread Michael McCandless
I agree this is a long overdue feature... we need to get it into Lucene somehow. I like the Layers analogy... I think that will work well with Lucene's transactional semantics, ie a prior commit point would continue to see the index before the updates but new commit points would see the updates.

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-29 Thread Michael McCandless
On Thu, Mar 25, 2010 at 1:20 PM, Marvin Humphrey mar...@rectangular.com wrote: On Thu, Mar 25, 2010 at 06:24:34AM -0400, Michael McCandless wrote: Also, will Lucy store the original stats? These? * Total number of tokens in the field. * Number of unique terms in the field. *

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-29 Thread Michael McCandless
I think that's a good idea for Lucy. Mike On Fri, Mar 26, 2010 at 10:58 AM, Marvin Humphrey mar...@rectangular.com wrote: On Thu, Mar 25, 2010 at 06:24:34AM -0400, Michael McCandless wrote: Maybe aggressive automatic data-reduction makes more sense in the context of flexible matching,

[jira] Commented: (LUCENE-2329) Use parallel arrays instead of PostingList objects

2010-03-29 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850884#action_12850884 ] Michael McCandless commented on LUCENE-2329: I think we need to fix how RAM is

[jira] Reopened: (LUCENE-2329) Use parallel arrays instead of PostingList objects

2010-03-29 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reopened LUCENE-2329: Reopening to fix the RAM balancing problems... Use parallel arrays instead of

[jira] Created: (LUCENE-2356) Enable setting the terms index divisor used by IndexWriter whenever it opens internal readers

2010-03-29 Thread Michael McCandless (JIRA)
Enable setting the terms index divisor used by IndexWriter whenever it opens internal readers - Key: LUCENE-2356 URL: https://issues.apache.org/jira/browse/LUCENE-2356

[jira] Created: (LUCENE-2357) Reduce transient RAM usage while merging by using packed ints array for docID re-mapping

2010-03-29 Thread Michael McCandless (JIRA)
Reduce transient RAM usage while merging by using packed ints array for docID re-mapping Key: LUCENE-2357 URL: https://issues.apache.org/jira/browse/LUCENE-2357

[jira] Commented: (LUCENE-2356) Enable setting the terms index divisor used by IndexWriter whenever it opens internal readers

2010-03-29 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850887#action_12850887 ] Michael McCandless commented on LUCENE-2356: I won't have any time to take

Re: Incremental Field Updates

2010-03-29 Thread mark harwood
Who ever said that some_condition should point to a unique document? My assumption was, for now, we were still talking about the simpler case of updating a single document. If we extend the discussion to support set-based updates it's worth considering the common requirements for updating

Re: Incremental Field Updates

2010-03-29 Thread Earwin Burrfoot
Who ever said that some_condition should point to a unique document? My assumption was, for now, we were still talking about the simpler case of updating a single document. If we extend the discussion to support set-based updates it's worth considering the common requirements for updating

Re: Incremental Field Updates

2010-03-29 Thread Grant Ingersoll
On Mar 29, 2010, at 2:26 AM, Mark Harwood wrote: Of course introducing the idea of updates also introduces the notion of a primary key and there's probably an entirely separate discussion to be had around user-supplied vs Lucene-generated keys. Not sure I see that need. Can you

Re: Incremental Field Updates

2010-03-29 Thread Andrzej Bialecki
On 2010-03-29 12:26, Michael McCandless wrote: I agree this is a long overdue feature... we need to get it into Lucene somehow. I like the Layers analogy... I think that will work well with Lucene's transactional semantics, ie a prior commit point would continue to see the index before the

AW: Incremental Field Updates

2010-03-29 Thread Uwe Goetzke
The filed this as patent, too: http://www.freepatentsonline.com/y2009/0228528.html Regards Uwe Goetzke -Ursprüngliche Nachricht- Von: Andrzej Bialecki [mailto:a...@getopt.org] Gesendet: Montag, 29. März 2010 14:50 An: java-dev@lucene.apache.org Betreff: Re: Incremental Field Updates

Re: AW: Incremental Field Updates

2010-03-29 Thread Andrzej Bialecki
On 2010-03-29 15:11, Uwe Goetzke wrote: The filed this as patent, too: http://www.freepatentsonline.com/y2009/0228528.html .. which is not granted yet, right? It's a patent application. Besides, I live in EU ;) -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _

[jira] Commented: (LUCENE-2357) Reduce transient RAM usage while merging by using packed ints array for docID re-mapping

2010-03-29 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850914#action_12850914 ] Michael McCandless commented on LUCENE-2357: I won't have any time to take

[jira] Commented: (LUCENE-2356) Enable setting the terms index divisor used by IndexWriter whenever it opens internal readers

2010-03-29 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850916#action_12850916 ] Michael McCandless commented on LUCENE-2356: The above comment was on the

Re: Incremental Field Updates

2010-03-29 Thread mark harwood
Of course, but what about the Lucene doc id doesn't provide that? The question being how you determine the correct doc id to use in the first place (especially when they are know to be volatile) - the current answer is to use a stable identifier term which your app holds in the index, AKA a

[jira] Commented: (LUCENE-2356) Enable setting the terms index divisor used by IndexWriter whenever it opens internal readers

2010-03-29 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850928#action_12850928 ] Earwin Burrfoot commented on LUCENE-2356: - That's likely orthogonal. If you want

[jira] Commented: (LUCENE-2329) Use parallel arrays instead of PostingList objects

2010-03-29 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850989#action_12850989 ] Michael Busch commented on LUCENE-2329: --- Good catch! Thanks for the thorough

[jira] Commented: (LUCENE-2354) Convert NumericUtils and NumericTokenStream to use BytesRef instead of Strings/char[]

2010-03-29 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851008#action_12851008 ] Michael McCandless commented on LUCENE-2354: bq. NumericUtils still contains

[jira] Commented: (LUCENE-2354) Convert NumericUtils and NumericTokenStream to use BytesRef instead of Strings/char[]

2010-03-29 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851010#action_12851010 ] Uwe Schindler commented on LUCENE-2354: --- bq. But the encoding is unchanged right?

[jira] Issue Comment Edited: (LUCENE-2354) Convert NumericUtils and NumericTokenStream to use BytesRef instead of Strings/char[]

2010-03-29 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851010#action_12851010 ] Uwe Schindler edited comment on LUCENE-2354 at 3/29/10 5:23 PM:

[jira] Commented: (LUCENE-2184) CartesianPolyFilterBuilder doesn't properly account for which tiers actually exist in the index

2010-03-29 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851013#action_12851013 ] Grant Ingersoll commented on LUCENE-2184: - Note, this bug exists for the min case

[jira] Assigned: (LUCENE-2184) CartesianPolyFilterBuilder doesn't properly account for which tiers actually exist in the index

2010-03-29 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reassigned LUCENE-2184: --- Assignee: Grant Ingersoll CartesianPolyFilterBuilder doesn't properly account for

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2010-03-29 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851017#action_12851017 ] Jason Rutherglen commented on LUCENE-2324: -- Michael B.: What you're talking about

[jira] Updated: (LUCENE-2184) CartesianPolyFilterBuilder doesn't properly account for which tiers actually exist in the index

2010-03-29 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-2184: Attachment: LUCENE-2184.patch Here's a patch. All tests still pass.

[jira] Commented: (LUCENE-2184) CartesianPolyFilterBuilder doesn't properly account for which tiers actually exist in the index

2010-03-29 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851043#action_12851043 ] Grant Ingersoll commented on LUCENE-2184: - Committed revision 928860 w/ the patch

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2010-03-29 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851078#action_12851078 ] Michael Busch commented on LUCENE-2324: --- {quote} I'm not sure we need that level of

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2010-03-29 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851099#action_12851099 ] Jason Rutherglen commented on LUCENE-2324: -- {quote}You only need one additional

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2010-03-29 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851142#action_12851142 ] Michael Busch commented on LUCENE-2324: --- {quote} The clarify, the apply deletes doc

[jira] Updated: (LUCENE-2353) Config incorrectly handles Windows absolute pathnames

2010-03-29 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-2353: --- Attachment: LUCENE-2353.patch Updated to also match 'c:/temp' like paths, which are also accepted