Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Jake Mannix
Given that this new API is pretty unweildy, and seems to not actually perform any better than the old one... are we going to consider revisiting that? -jake On Mon, Oct 19, 2009 at 11:27 PM, Uwe Schindler u...@thetaphi.de wrote: The old search API is already removed in trunk… Uwe

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Michael McCandless
Sorry, I have been digging into it, just didn't get far enough to post patch/results. I'll try to do so today. I did find one bug in OneSortNoScoreCollector, in the getTop() method in the inner compare() method, to break ties it should be: if (v==0 { v = o1.doc + o1.comparatorQueue._base

[jira] Resolved: (LUCENE-1995) ArrayIndexOutOfBoundsException during indexing

2009-10-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1995. Resolution: Fixed Thanks Aaron! Maybe someday Lucene will allow a larger RAM

Re: [Lucene-java Wiki] Update of LuceneAtApacheConUs2009 by HossMan

2009-10-20 Thread Karl Wettin
20 okt 2009 kl. 07.15 skrev Apache Wiki: + There will be a Lucene/Search !MeetUp on Tuesday night at 8PM. 'This event is open to anyone who wants to come, even if you are not registered for the conference'. That is a really nice thing, and completely new if I'm not misstaken.

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Mark Miller
I didn't really follow that thread either - but we didn't move to the new Comp Api because of it's perfomance vs the old. - Mark http://www.lucidimagination.com (mobile) On Oct 20, 2009, at 4:22 AM, Uwe Schindler u...@thetaphi.de wrote: I did not follow the whole thread, but I do not

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Michael McCandless
On Tue, Oct 20, 2009 at 6:51 AM, Mark Miller markrmil...@gmail.com wrote: I didn't really follow that thread either - but we didn't move to the new Comp Api because of it's perfomance vs the old. We did (LUCENE-1483), but those perf tests mixed in a number of other improvements (eg, searching

[jira] Commented: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767779#action_12767779 ] Michael McCandless commented on LUCENE-1987: bq. How to handle the problem

[jira] Assigned: (LUCENE-1993) MoreLikeThis - allow to exclude terms that appear in too many documents (patch included)

2009-10-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-1993: -- Assignee: Michael McCandless MoreLikeThis - allow to exclude terms that

[jira] Commented: (LUCENE-1993) MoreLikeThis - allow to exclude terms that appear in too many documents (patch included)

2009-10-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767781#action_12767781 ] Michael McCandless commented on LUCENE-1993: Patch looks good... I'll commit

[jira] Resolved: (LUCENE-1993) MoreLikeThis - allow to exclude terms that appear in too many documents (patch included)

2009-10-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1993. Resolution: Fixed Fix Version/s: 3.0 Thanks Christian! MoreLikeThis -

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Mark Miller
Hmm - perhaps I'm not remembering right. Or perhaps we had different motivations ;) I never did anything in 1483 based on search perf - and I took your tests as testing that we didn't lose perf, not that we gained any. The fact that there were some wins was just a nice surprise from my

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Mark Miller
Ahhh - I see - way at the top. Man that was early. Had forgotten about that stuff even before the issue was finished. Mark Miller wrote: Hmm - perhaps I'm not remembering right. Or perhaps we had different motivations ;) I never did anything in 1483 based on search perf - and I took your tests

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Michael McCandless
On Tue, Oct 20, 2009 at 8:08 AM, Mark Miller markrmil...@gmail.com wrote: Hmm - perhaps I'm not remembering right. Or perhaps we had different motivations ;) I never did anything in 1483 based on search perf - and I took your tests as testing that we didn't lose perf, not that we gained any.

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Michael McCandless
On Tue, Oct 20, 2009 at 8:21 AM, Mark Miller markrmil...@gmail.com wrote: Ahhh - I see - way at the top. Man that was early. Had forgotten about that stuff even before the issue was finished. Tell me about it -- impossible to remember these things :) I wish I could upgrade the RAM in my brain

RE: lucene 2.9 sorting algorithm

2009-10-20 Thread Uwe Schindler
On Tue, Oct 20, 2009 at 8:08 AM, Mark Miller markrmil...@gmail.com wrote: Hmm - perhaps I'm not remembering right. Or perhaps we had different motivations ;) I never did anything in 1483 based on search perf - and I took your tests as testing that we didn't lose perf, not that we gained

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Mark Miller
Uwe Schindler wrote: On Tue, Oct 20, 2009 at 8:08 AM, Mark Miller markrmil...@gmail.com wrote: Hmm - perhaps I'm not remembering right. Or perhaps we had different motivations ;) I never did anything in 1483 based on search perf - and I took your tests as testing that we didn't lose

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Mark Miller
Actually though - how are we supposed to get back there? I don't think its as simple as just not removing the deprecated API's. Doesn't even seem close to that simple. Its another nightmare. It would have to be some serious wins to go through that pain starting at a 3.0 release wouldn't it? We

[jira] Commented: (LUCENE-666) TERM1 OR NOT TERM2 does not perform as expected

2009-10-20 Thread Siddharth Gargate (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767802#action_12767802 ] Siddharth Gargate commented on LUCENE-666: -- Can we rewrite the query (A OR NOT B)

[jira] Commented: (LUCENE-1183) TRStringDistance uses way too much memory (with patch)

2009-10-20 Thread JIRA
[ https://issues.apache.org/jira/browse/LUCENE-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767805#action_12767805 ] Cédrik LIME commented on LUCENE-1183: - Any news on the landing of this patch? Now that

RE: lucene 2.9 sorting algorithm

2009-10-20 Thread Uwe Schindler
Actually though - how are we supposed to get back there? I don't think its as simple as just not removing the deprecated API's. Doesn't even seem close to that simple. Its another nightmare. It would have to be some serious wins to go through that pain starting at a 3.0 release wouldn't it?

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Yonik Seeley
On Tue, Oct 20, 2009 at 9:31 AM, Uwe Schindler u...@thetaphi.de wrote: It is not bad, only harder to understand (for some people). The Javadoc is much improved since I made the switch. One trivial thing that could be improved is to perhaps move all of the methods to the top of the class? Right

Compile failure in 2.9.1 Highlighter

2009-10-20 Thread Mark Miller
Can someone help me figure this out. The Highlighter test runs and passes for me in Eclipse. It obviously compiles too. But when I try and compile with the ant build scripts, I get: [javac]

Re: Compile failure in 2.9.1 Highlighter

2009-10-20 Thread Yonik Seeley
Not sure... but could it just be that you are relying on autoboxing, which is a Java5 feature, and the core is still marked for Java1.4 syntax? -Yonik http://www.lucidimagination.com On Tue, Oct 20, 2009 at 10:02 AM, Mark Miller markrmil...@gmail.com wrote: Can someone help me figure this

Re: Compile failure in 2.9.1 Highlighter

2009-10-20 Thread Mark Miller
Thank you, than you, thank you ... I could have run around that for years. Yonik Seeley wrote: Not sure... but could it just be that you are relying on autoboxing, which is a Java5 feature, and the core is still marked for Java1.4 syntax? -Yonik http://www.lucidimagination.com On Tue,

RE: Compile failure in 2.9.1 Highlighter

2009-10-20 Thread Uwe Schindler
Lucene 2.9 is Java 1.4 only (in build script), so autoboxing does not work. With trunk it works, but not with 2.9. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: ysee...@gmail.com

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Earwin Burrfoot
There are some advanced things that are plain impossible with stock new API. Like having more than one HitQueue in your Collector, and stashing overflowing values from one of them into another. Once you cross the segment border - BOOM! Otherwise it may look intimidating, but is pretty simple in

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Mark Miller
bq. One trivial thing that could be improved is to perhaps move all of the methods to the top of the class? +1 - I think Mike and silently fought on that one once in the patches :) Though I don't know how conscious it was. I prefer the methods at the top myself. Yonik Seeley wrote: On Tue, Oct

[jira] Commented: (LUCENE-1183) TRStringDistance uses way too much memory (with patch)

2009-10-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767820#action_12767820 ] Michael McCandless commented on LUCENE-1183: Cédrik, could you update the

[jira] Created: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

2009-10-20 Thread Michael McCandless (JIRA)
Explore performance of multi-PQ vs single-PQ sorting API Key: LUCENE-1997 URL: https://issues.apache.org/jira/browse/LUCENE-1997 Project: Lucene - Java Issue Type: Improvement

[jira] Updated: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

2009-10-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1997: --- Attachment: LUCENE-1997.patch Attached patch. Note that patch is based on 2.9.x

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Michael McCandless
On Tue, Oct 20, 2009 at 10:49 AM, Mark Miller markrmil...@gmail.com wrote: bq. One trivial thing that could be improved is to perhaps move all of the methods to the top of the class? +1 - I think Mike and silently fought on that one once in the patches :) Though I don't know how conscious it

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread John Wang
Hi guys: I am not suggesting just simply changing the deprecated signatures. There are some work to be done of course. In the beginning of the thread, we discussed two algorithms (both handling per-segment field loading), and at the conclusion, (to be still verified by Mike) that both

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread John Wang
Sorry, mistyped again, we have a multivalued field of STRINGS, no integers. -John On Tue, Oct 20, 2009 at 8:55 AM, John Wang john.w...@gmail.com wrote: Hi guys: I am not suggesting just simply changing the deprecated signatures. There are some work to be done of course. In the beginning

[jira] Commented: (LUCENE-1257) Port to Java5

2009-10-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767852#action_12767852 ] Uwe Schindler commented on LUCENE-1257: --- Committed:

[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

2009-10-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767870#action_12767870 ] Michael McCandless commented on LUCENE-1997: OK I ran sortBench.py on

[jira] Updated: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

2009-10-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1997: --- Attachment: LUCENE-1997.patch New patch attached: * Turn off testing on the

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Michael McCandless
OK I posted a patch that folds the MultiPQ approach into contrib/benchmark, plus a simple python wrapper to run old/new tests across different queries, sort, topN, etc. But I got different results... MultiPQ looks generally slower than SinglePQ. So I think we now need to reconcile what's

[jira] Commented: (LUCENE-1183) TRStringDistance uses way too much memory (with patch)

2009-10-20 Thread JIRA
[ https://issues.apache.org/jira/browse/LUCENE-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767928#action_12767928 ] Cédrik LIME commented on LUCENE-1183: - Thanks Michael. FuzzyTermEnum.java has not

[jira] Updated: (LUCENE-1257) Port to Java5

2009-10-20 Thread Kay Kay (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kay Kay updated LUCENE-1257: Attachment: LUCENE-1257_unnecessary_casts.patch Remove unnecessary cast across the codebase (as a result

[jira] Commented: (LUCENE-1183) TRStringDistance uses way too much memory (with patch)

2009-10-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767935#action_12767935 ] Michael McCandless commented on LUCENE-1183: OK I had 2 hunks fail but I

[jira] Resolved: (LUCENE-1183) TRStringDistance uses way too much memory (with patch)

2009-10-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1183. Resolution: Fixed Fix Version/s: 3.0 Thanks Cédrik! TRStringDistance

[jira] Updated: (LUCENE-1257) Port to Java5

2009-10-20 Thread Kay Kay (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kay Kay updated LUCENE-1257: Attachment: LUCENE-1257_more_unnecessary_casts.patch Port to Java5 - Key:

[jira] Commented: (LUCENE-1257) Port to Java5

2009-10-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767953#action_12767953 ] Uwe Schindler commented on LUCENE-1257: --- Thanks! Much cleaner code. -- Committed

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-10-20 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767961#action_12767961 ] Robert Muir commented on LUCENE-1606: - if no one objects, i'd like to commit this in a

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-10-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767963#action_12767963 ] Uwe Schindler commented on LUCENE-1606: --- No prob! I will help you, I am on heavy

[jira] Updated: (LUCENE-1257) Port to Java5

2009-10-20 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DM Smith updated LUCENE-1257: - Attachment: LUCENE-1257_enum.patch Migrates to Java 5 enums in core and contrib. All tests pass.

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread TomS
Hi, I can confirm the below mentioned problems trying to migrate to 2.9. Our Lucene-based (2.4) app uses custom multi-level sorting on a lot of different fields and pretty large indexes ( 100m docs). Most of the fields that we sort on are strings, some with up to 400 characters in length. A lot

[jira] Updated: (LUCENE-1257) Port to Java5

2009-10-20 Thread Kay Kay (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kay Kay updated LUCENE-1257: Attachment: LUCENE-1257_contrib_highlighting.patch Port to Java5 - Key:

[jira] Updated: (LUCENE-1257) Port to Java5

2009-10-20 Thread Kay Kay (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kay Kay updated LUCENE-1257: Attachment: (was: LUCENE-1257_contrib_highlighting.patch) Port to Java5 -

[jira] Commented: (LUCENE-1257) Port to Java5

2009-10-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767977#action_12767977 ] Uwe Schindler commented on LUCENE-1257: --- DM Smith: Can you open a new issue. This is

[jira] Updated: (LUCENE-1257) Port to Java5

2009-10-20 Thread Kay Kay (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kay Kay updated LUCENE-1257: Attachment: LUCENE-1257_contrib_highlighting.patch Port to Java5 - Key:

[jira] Created: (LUCENE-1998) Use Java 5 enums

2009-10-20 Thread DM Smith (JIRA)
Use Java 5 enums Key: LUCENE-1998 URL: https://issues.apache.org/jira/browse/LUCENE-1998 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.0 Reporter: DM Smith Priority: Minor

[jira] Updated: (LUCENE-1998) Use Java 5 enums

2009-10-20 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DM Smith updated LUCENE-1998: - Attachment: LUCENE-1998_enum.patch This issue and patch were part of LUCENE-1257, but may have backward

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Earwin Burrfoot
That's quite possible to reimplement, I believe. You can have your docid-ordinal map bound to toplevel reader, as it was before and then your FIeldComparator rebases incoming compare() docids based on what last setNextReader() was called with. On Wed, Oct 21, 2009 at 02:07, TomS

[jira] Updated: (LUCENE-1257) Port to Java5

2009-10-20 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DM Smith updated LUCENE-1257: - Attachment: (was: LUCENE-1257_enum.patch) Port to Java5 - Key:

[jira] Created: (LUCENE-1999) Match spotter for all query types

2009-10-20 Thread Mark Harwood (JIRA)
Match spotter for all query types - Key: LUCENE-1999 URL: https://issues.apache.org/jira/browse/LUCENE-1999 Project: Lucene - Java Issue Type: New Feature Affects Versions: 2.9 Reporter: Mark

[jira] Updated: (LUCENE-1999) Match spotter for all query types

2009-10-20 Thread Mark Harwood (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Harwood updated LUCENE-1999: - Attachment: matchflagger.patch Match spotter for all query types

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread John Wang
Hi Mike: That's weird. Let me take a look at the patch. Need to brush up on python though :) Thanks -John On Tue, Oct 20, 2009 at 10:25 AM, Michael McCandless luc...@mikemccandless.com wrote: OK I posted a patch that folds the MultiPQ approach into contrib/benchmark, plus a simple python