Re: Lucene Spatial

2009-11-20 Thread Alex
Hi all ! I'm coming back to spatial after a little while working on other sides of my project. I just wanted to know what was up with this package and what were the news, additions etc ... Anything new ? Thanks for your feedback :) Cheers, Alex

[jira] Commented: (LUCENE-517) norm compression breaks ranking for small fields

2009-11-20 Thread Lance Norskog (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780901#action_12780901 ] Lance Norskog commented on LUCENE-517: -- [LUCENE-1360|http://issues.apache.org/jira/bro

[jira] Issue Comment Edited: (LUCENE-1360) A Similarity class which has unique length norms for numTerms <= 10

2009-11-20 Thread Lance Norskog (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780897#action_12780897 ] Lance Norskog edited comment on LUCENE-1360 at 11/21/09 3:03 AM: ---

[jira] Issue Comment Edited: (LUCENE-1360) A Similarity class which has unique length norms for numTerms <= 10

2009-11-20 Thread Lance Norskog (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780897#action_12780897 ] Lance Norskog edited comment on LUCENE-1360 at 11/21/09 2:58 AM: ---

[jira] Updated: (LUCENE-1360) A Similarity class which has unique length norms for numTerms <= 10

2009-11-20 Thread Lance Norskog (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lance Norskog updated LUCENE-1360: -- Attachment: LUCENE-1380 visualization.pdf This is a graph of the standard norms against the re

Re: Whither Query Norm?

2009-11-20 Thread Jake Mannix
On Fri, Nov 20, 2009 at 5:02 PM, Mark Miller wrote: > Go back and put it in after you have all the documents for that commit > point. Or on reader load, calculate it. > Ah! Now I see what you mean by "expensive". :) Basically run through all your documents you've indexed all over again, fixing

Re: Whither Query Norm?

2009-11-20 Thread Jake Mannix
Back to Grant's original question, for a second... On Fri, Nov 20, 2009 at 1:59 PM, Grant Ingersoll wrote: > This makes sense from a mathematical sense, assuming scores are comparable. > What I would like to get at is why anyone thinks scores are comparable > across queries to begin with. I ag

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-20 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780858#action_12780858 ] Robert Muir commented on LUCENE-1606: - Uwe, thank you. This is much nicer! I think no

Re: Whither Query Norm?

2009-11-20 Thread Mark Miller
Go back and put it in after you have all the documents for that commit point. Or on reader load, calculate it. - Mark http://www.lucidimagination.com (mobile) On Nov 20, 2009, at 7:56 PM, Jake Mannix wrote: On Fri, Nov 20, 2009 at 4:51 PM, Mark Miller wrote: Okay - my fault - I'm not

[jira] Updated: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1606: -- Attachment: LUCENE-1606.patch Hi Robert, here is my patch. The WildCard and RegExp test query

Re: Whither Query Norm?

2009-11-20 Thread Jake Mannix
On Fri, Nov 20, 2009 at 4:51 PM, Mark Miller wrote: > Okay - my fault - I'm not really talking in terms of Lucene. Though even > there I consider it possible. You'd just have to like, rewrite it :) And > it would likely be pretty slow. > Rewrite it how? When you index the very first document, t

Re: Whither Query Norm?

2009-11-20 Thread Mark Miller
Okay - my fault - I'm not really talking in terms of Lucene. Though even there I consider it possible. You'd just have to like, rewrite it :) And it would likely be pretty slow. Jake Mannix wrote: > > > On Fri, Nov 20, 2009 at 4:20 PM, Mark Miller > wrote: > > M

Re: Whither Query Norm?

2009-11-20 Thread Mark Miller
Jake Mannix wrote: > > > On Fri, Nov 20, 2009 at 4:09 PM, Mark Miller > wrote: > > > But cosine has two norms? The query norm and the document norm - > taking > the two vectors to the unit space - it looks expensive to me to do > both > of them p

Re: Whither Query Norm?

2009-11-20 Thread Jake Mannix
On Fri, Nov 20, 2009 at 4:20 PM, Mark Miller wrote: > Mark Miller wrote: > Okay - I guess that somewhat makes sense - you can calculate the > magnitude of the doc vectors at index time. How is that impossible with > incremental indexing though? Isn't it just expensive? Seems somewhat > expensive

Re: Whither Query Norm?

2009-11-20 Thread Jake Mannix
On Fri, Nov 20, 2009 at 4:20 PM, Mark Miller wrote: > Mark Miller wrote: > > > > it looks expensive to me to do both > > of them properly. > Okay - I guess that somewhat makes sense - you can calculate the > magnitude of the doc vectors at index time. How is that impossible with > incremental ind

Re: Whither Query Norm?

2009-11-20 Thread Jake Mannix
On Fri, Nov 20, 2009 at 4:09 PM, Mark Miller wrote: > But cosine has two norms? The query norm and the document norm - taking > the two vectors to the unit space - it looks expensive to me to do both > of them properly. The IR lit I've seen fudges them down to Root(L) even > in the non increment

Re: Whither Query Norm?

2009-11-20 Thread Mark Miller
Mark Miller wrote: > > it looks expensive to me to do both > of them properly. Okay - I guess that somewhat makes sense - you can calculate the magnitude of the doc vectors at index time. How is that impossible with incremental indexing though? Isn't it just expensive? Seems somewhat expensive in

Re: Whither Query Norm?

2009-11-20 Thread Mark Miller
Jake Mannix wrote: > > > On Fri, Nov 20, 2009 at 2:50 PM, Mark Miller > wrote: > > Jake Mannix wrote: > > Remember: we're not really doing cosine at all here. > This, I think, is fuzzy right? It seems to be common to still call > this > cosine scor

Re: Whither Query Norm?

2009-11-20 Thread Jake Mannix
On Fri, Nov 20, 2009 at 2:50 PM, Mark Miller wrote: > Jake Mannix wrote: > > Remember: we're not really doing cosine at all here. > This, I think, is fuzzy right? It seems to be common to still call this > cosine scoring loosely - pretty much every practical impl fudges things > somewhat when doi

[jira] Issue Comment Edited: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-20 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780833#action_12780833 ] Robert Muir edited comment on LUCENE-1606 at 11/20/09 11:33 PM:

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-20 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780833#action_12780833 ] Robert Muir commented on LUCENE-1606: - bq. That would make the enum ugly... But would

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780831#action_12780831 ] Uwe Schindler commented on LUCENE-1606: --- see LUCENE-2075, why it is not so fast (the

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-20 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780825#action_12780825 ] Robert Muir commented on LUCENE-1606: - by the way Uwe, I do not particularly like how

[jira] Commented: (LUCENE-2075) Share the Term -> TermInfo cache across threads

2009-11-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780824#action_12780824 ] Uwe Schindler commented on LUCENE-2075: --- The initial seek should really be optimized

[jira] Commented: (LUCENE-2075) Share the Term -> TermInfo cache across threads

2009-11-20 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780819#action_12780819 ] Yonik Seeley commented on LUCENE-2075: -- Aside: a singe numeric range query will be do

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-20 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780806#action_12780806 ] Robert Muir commented on LUCENE-1606: - bq. We should simply add a test for this method

Re: Whither Query Norm?

2009-11-20 Thread Mark Miller
Jake Mannix wrote: > Remember: we're not really doing cosine at all here. This, I think, is fuzzy right? It seems to be common to still call this cosine scoring loosely - pretty much every practical impl fudges things somewhat when doing the normalization (though we are on the heavy side of fudgers

[jira] Issue Comment Edited: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780801#action_12780801 ] Uwe Schindler edited comment on LUCENE-1606 at 11/20/09 10:45 PM: --

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780801#action_12780801 ] Uwe Schindler commented on LUCENE-1606: --- bq. Uwe, i looked at the WildcardTermEnum a

Re: Whither Query Norm?

2009-11-20 Thread Mark Miller
Yes, its a good point. I'm coming at it from a more pure angle. And I'm not so elegant in my thought patterns :) Right though - our document vector normalization is - uh - quick and dirty :) Its about the cheapest one I've seen other than root(length). I don't think that scores between queries ar

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-20 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780796#action_12780796 ] Robert Muir commented on LUCENE-1606: - Uwe, i looked at the WildcardTermEnum and it wa

Re: Whither Query Norm?

2009-11-20 Thread Jake Mannix
Remember: we're not really doing cosine at all here. The factor of IDF^2 on the top, with the factor of 1/sqrt(numTermsInDocument) on the bottom couples together to end up with the following effect: q1 = "TERM1" q2 = "TERM2" doc1 = "TERM1" doc2 = "TERM2" score(q1, doc1) = idf(TERM1) score(q

[jira] Commented: (LUCENE-2075) Share the Term -> TermInfo cache across threads

2009-11-20 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780795#action_12780795 ] Yonik Seeley commented on LUCENE-2075: -- bq. Also, the results for ConcurrentLRUCache

Re: Whither Query Norm?

2009-11-20 Thread Mark Miller
Grant Ingersoll wrote: > > What I would like to get at is why anyone thinks scores are > comparable across queries to begin with. > They are somewhat comparable because we are using the approximate cosine between the document/query vectors for the score - plus boosts n stuff. How close the vectors

Re: Whither Query Norm?

2009-11-20 Thread Grant Ingersoll
On Nov 20, 2009, at 1:24 PM, Jake Mannix wrote: > > On Fri, Nov 20, 2009 at 10:08 AM, Grant Ingersoll wrote: >> I should add in my $0.02 on whether to just get rid of queryNorm() >> altogether: >> >> -1 from me, even though it's confusing, because having that call there >> (somewhere, at

Re: Final 3.0 artifacts

2009-11-20 Thread Mark Miller
Uwe Schindler wrote: > Mark: Should I merge the fix for the double doExplain call in CSQ also to > 3.0? I already removed some more dead code (in > MultiTermQuerryWrapperFilter). This fix is so simple, so I think it can be > included in the final artifacts without any further RC. > +1 from me. N

Final 3.0 artifacts

2009-11-20 Thread Uwe Schindler
Hi, As no complaints about the release of 3.0 appeared on java-user, I think I can start to build the final artifacts tomorrow. Mark: Should I merge the fix for the double doExplain call in CSQ also to 3.0? I already removed some more dead code (in MultiTermQuerryWrapperFilter). This fix is so si

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-20 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780750#action_12780750 ] Robert Muir commented on LUCENE-1606: - Uwe, both your ideas are great. thank you for l

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-20 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780749#action_12780749 ] Robert Muir commented on LUCENE-1606: - bq. As far as testing, one of the simple things

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780748#action_12780748 ] Uwe Schindler commented on LUCENE-1606: --- I like it, too, some thoughts: - Maybe mak

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-20 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780747#action_12780747 ] Robert Muir commented on LUCENE-1606: - Mark, thanks, let me know if you have the chanc

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-20 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780738#action_12780738 ] Mark Miller commented on LUCENE-1606: - Nice! Resulting jar is still just 1.0 MB. Looks

[jira] Updated: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-20 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1606: Attachment: LUCENE-1606.patch Mark, I think this patch is ok, all tests pass etc. Can you take a l

Re: [jira] Commented: (LUCENE-2086) When resolving deletes, IW should resolve in term sort order

2009-11-20 Thread Mark Miller
Tim Smith (JIRA) wrote: > I would definitely like to see a more accelerated release cycle (even if less > functionality gets into each minor release) > > > Heh - everyone has said that before. In principle, we all agree - in practice ...

[jira] Commented: (LUCENE-2086) When resolving deletes, IW should resolve in term sort order

2009-11-20 Thread Tim Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780710#action_12780710 ] Tim Smith commented on LUCENE-2086: --- bq. maybe try it & report back? i'll see if i can

[jira] Commented: (LUCENE-2086) When resolving deletes, IW should resolve in term sort order

2009-11-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780706#action_12780706 ] Michael McCandless commented on LUCENE-2086: bq. i've seen the deletes domina

[jira] Commented: (LUCENE-2086) When resolving deletes, IW should resolve in term sort order

2009-11-20 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780703#action_12780703 ] Jason Rutherglen commented on LUCENE-2086: -- bq. I don't think it should be backpo

[jira] Commented: (LUCENE-2086) When resolving deletes, IW should resolve in term sort order

2009-11-20 Thread Tim Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780701#action_12780701 ] Tim Smith commented on LUCENE-2086: --- i've seen the deletes dominating commit time quite

[jira] Commented: (LUCENE-2086) When resolving deletes, IW should resolve in term sort order

2009-11-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780700#action_12780700 ] Michael McCandless commented on LUCENE-2086: bq. any chance this can go into 3

[jira] Commented: (LUCENE-2086) When resolving deletes, IW should resolve in term sort order

2009-11-20 Thread Tim Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780698#action_12780698 ] Tim Smith commented on LUCENE-2086: --- any chance this can go into 3.0.0 or a 3.0.1? > W

[jira] Commented: (LUCENE-2086) When resolving deletes, IW should resolve in term sort order

2009-11-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780690#action_12780690 ] Michael McCandless commented on LUCENE-2086: Excellent, so this is an importan

Re: Whither Query Norm?

2009-11-20 Thread Jake Mannix
On Fri, Nov 20, 2009 at 10:08 AM, Grant Ingersoll wrote: > I should add in my $0.02 on whether to just get rid of queryNorm() > altogether: > > -1 from me, even though it's confusing, because having that call there > (somewhere, at least) allows you to actually do compare scores across > queries

[jira] Commented: (LUCENE-2086) When resolving deletes, IW should resolve in term sort order

2009-11-20 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780685#action_12780685 ] Yonik Seeley commented on LUCENE-2086: -- bq. Though, you need relatively high density

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-20 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780684#action_12780684 ] Robert Muir commented on LUCENE-1606: - bq. Okay - still not an issue I don't think - l

[jira] Commented: (LUCENE-2086) When resolving deletes, IW should resolve in term sort order

2009-11-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780682#action_12780682 ] Michael McCandless commented on LUCENE-2086: OK I changed changes entry to: *

Re: Whither Query Norm?

2009-11-20 Thread Grant Ingersoll
On Nov 20, 2009, at 11:19 AM, Jake Mannix wrote: > I should add in my $0.02 on whether to just get rid of queryNorm() > altogether: > > -1 from me, even though it's confusing, because having that call there > (somewhere, at least) allows you to actually do compare scores across queries > i

[jira] Commented: (LUCENE-2086) When resolving deletes, IW should resolve in term sort order

2009-11-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780680#action_12780680 ] Michael McCandless commented on LUCENE-2086: Ahh, you're right, so long as you

[jira] Updated: (LUCENE-1907) sumOfSquared weights should be calculated as part of queryNorm

2009-11-20 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-1907: Attachment: LUCENE-1907.patch to trunk > sumOfSquared weights should be calculated as part of que

[jira] Commented: (LUCENE-2086) When resolving deletes, IW should resolve in term sort order

2009-11-20 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780670#action_12780670 ] Yonik Seeley commented on LUCENE-2086: -- bq. for better locality for the disk heads I

[jira] Updated: (LUCENE-2061) Create benchmark & approach for testing Lucene's near real-time performance

2009-11-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2061: --- Attachment: LUCENE-2061.patch Just attaching latest nrtBench.py... > Create benchma

[jira] Resolved: (LUCENE-2079) Further improvements to contrib/benchmark for testing NRT

2009-11-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2079. Resolution: Fixed > Further improvements to contrib/benchmark for testing NRT > --

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-20 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780659#action_12780659 ] Robert Muir commented on LUCENE-1606: - Mark, ok. In that case I will not include these

[jira] Updated: (LUCENE-2086) When resolving deletes, IW should resolve in term sort order

2009-11-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2086: --- Attachment: LUCENE-2086.patch Attached patch. > When resolving deletes, IW should r

Re: IndexWriter.updateDocument performance improvement

2009-11-20 Thread Michael McCandless
Opened LUCENE-2086. Mike On Fri, Nov 20, 2009 at 9:43 AM, Michael McCandless wrote: > +1 > > I'll open an issue. > > Mike > > On Fri, Nov 20, 2009 at 8:11 AM, Yonik Seeley > wrote: >> Thanks Bogdan, I've been meaning to bring this up. >> Solr used a TreeMap in the past (when it handled it's own

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-20 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780655#action_12780655 ] Mark Miller commented on LUCENE-1606: - On the way hand I'd say, well lets not rename t

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-20 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780654#action_12780654 ] Robert Muir commented on LUCENE-1606: - OK we have the start of a plan, only one final

[jira] Commented: (LUCENE-1877) Use NativeFSLockFactory as default for new API (direct ctors & FSDir.open)

2009-11-20 Thread Marvin Humphrey (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780647#action_12780647 ] Marvin Humphrey commented on LUCENE-1877: - > http://www.h2database.com/html/advanc

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-20 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780643#action_12780643 ] Mark Miller commented on LUCENE-1606: - bq. We could just remove the .rewrite(). it is

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-20 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780633#action_12780633 ] Robert Muir commented on LUCENE-1606: - bq. Yes - I think so - but how to handle the fa

[jira] Issue Comment Edited: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-20 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780623#action_12780623 ] Robert Muir edited comment on LUCENE-1606 at 11/20/09 4:27 PM: -

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-20 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780629#action_12780629 ] Mark Miller commented on LUCENE-1606: - bq. I assume we should nuke the old WildcardQue

Re: Whither Query Norm?

2009-11-20 Thread Jake Mannix
I should add in my $0.02 on whether to just get rid of queryNorm() altogether: -1 from me, even though it's confusing, because having that call there (somewhere, at least) allows you to actually do compare scores across queries if you do the extra work of properly normalizing the documents as we

[jira] Updated: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-20 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1606: Attachment: LUCENE-1606_nodep.patch attached is an alternate patch with no library dependency (LU

Re: Whither Query Norm?

2009-11-20 Thread Mark Miller
Mark Miller wrote: > Grant Ingersoll wrote: > >> At a minimum, I think we might be able to refactor the callback mechanism >> for it just as we did for the collectors, such that we push of the actual >> calculation of the sum of squares into Similarity, instead of just doing >> 1/sqrt(sumSqs

Re: Whither Query Norm?

2009-11-20 Thread Jake Mannix
The fact Lucene Similarity is most decidely *not* cosine similarity, but strongly resembles it with the queryNorm() in there, means that I personally would certainly like to see this called out, at least in the documentation. As for performance, is the queryNorm() called ever in any loops? It's a

Re: Whither Query Norm?

2009-11-20 Thread Mark Miller
Grant Ingersoll wrote: > For a long time now, we've been telling people not to compare scores across > queries, yet we maintain the queryNorm() code as an attempt to do this and > the javadocs even promote it. I'm in the process of researching this some > more (references welcomed), but wanted

Whither Query Norm?

2009-11-20 Thread Grant Ingersoll
For a long time now, we've been telling people not to compare scores across queries, yet we maintain the queryNorm() code as an attempt to do this and the javadocs even promote it. I'm in the process of researching this some more (references welcomed), but wanted to hear what people think about

[jira] Commented: (LUCENE-965) Implement a state-of-the-art retrieval function in Lucene

2009-11-20 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780607#action_12780607 ] Grant Ingersoll commented on LUCENE-965: Hi Hui, I see you updated your paper on t

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-20 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780602#action_12780602 ] Robert Muir commented on LUCENE-1606: - Mark, ok. I will supply a new patch with no lib

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-20 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780599#action_12780599 ] Mark Miller commented on LUCENE-1606: - Point taken - the tests are not perfect. They n

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-20 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780594#action_12780594 ] Robert Muir commented on LUCENE-1606: - bq. How do you want to improve them? well for

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-20 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780591#action_12780591 ] Mark Miller commented on LUCENE-1606: - bq. i don't really have any, except that I don'

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-20 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780589#action_12780589 ] Robert Muir commented on LUCENE-1606: - bq. What are your concerns? If it passes the cu

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-20 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780584#action_12780584 ] Mark Miller commented on LUCENE-1606: - bq. I think trying this out around in contrib (

[jira] Resolved: (LUCENE-2076) Add org.apache.lucene.store.FSDirectory.getDirectory()

2009-11-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2076. Resolution: Fixed Thanks George! > Add org.apache.lucene.store.FSDirectory.getDir

[jira] Created: (LUCENE-2086) When resolving deletes, IW should resolve in term sort order

2009-11-20 Thread Michael McCandless (JIRA)
When resolving deletes, IW should resolve in term sort order Key: LUCENE-2086 URL: https://issues.apache.org/jira/browse/LUCENE-2086 Project: Lucene - Java Issue Type: Improvement

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-20 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780575#action_12780575 ] Robert Muir commented on LUCENE-1606: - bq. So Robert - what do you think about paring

Re: IndexWriter.updateDocument performance improvement

2009-11-20 Thread Michael McCandless
+1 I'll open an issue. Mike On Fri, Nov 20, 2009 at 8:11 AM, Yonik Seeley wrote: > Thanks Bogdan, I've been meaning to bring this up. > Solr used a TreeMap in the past (when it handled it's own deletes) for > the same exact reason.  In my profiling, I've also seen applyDeletes() > taking the bu

[jira] Updated: (LUCENE-2075) Share the Term -> TermInfo cache across threads

2009-11-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2075: --- Attachment: LUCENE-2075.patch First cut at a benchmark. First, download http://conc

Re: IndexWriter.updateDocument performance improvement

2009-11-20 Thread Yonik Seeley
Thanks Bogdan, I've been meaning to bring this up. Solr used a TreeMap in the past (when it handled it's own deletes) for the same exact reason. In my profiling, I've also seen applyDeletes() taking the bulk of the time with small/simple document indexing. So we should definitely go in sorted ord

[jira] Commented: (LUCENE-1877) Use NativeFSLockFactory as default for new API (direct ctors & FSDir.open)

2009-11-20 Thread Thomas Mueller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780540#action_12780540 ] Thomas Mueller commented on LUCENE-1877: FYI: other Java projects also implement e

IndexWriter.updateDocument performance improvement

2009-11-20 Thread Bogdan Ghidireac
Hi, One of the use case of my application involves updating the index with 10 to 10k docs every few minutes. Because we maintain a PK for each doc we have to use IndexWriter.updateDocument to be consistent. The average time for an update when we commit every 10k docs is around 17ms (the IndexWrit