[jira] Updated: (LUCENE-1257) Port to Java5

2009-10-20 Thread Paul Cowan (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Cowan updated LUCENE-1257: --- Attachment: LUCENE-1257-clone_covariance.patch OK, thought I'd jump in and help out here with one of

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread John Wang
Hi Mike: That's weird. Let me take a look at the patch. Need to brush up on python though :) Thanks -John On Tue, Oct 20, 2009 at 10:25 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > OK I posted a patch that folds the MultiPQ approach into > contrib/benchmark, plus a simple pyth

[jira] Updated: (LUCENE-1999) Match spotter for all query types

2009-10-20 Thread Mark Harwood (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Harwood updated LUCENE-1999: - Attachment: matchflagger.patch > Match spotter for all query types > ---

[jira] Created: (LUCENE-1999) Match spotter for all query types

2009-10-20 Thread Mark Harwood (JIRA)
Match spotter for all query types - Key: LUCENE-1999 URL: https://issues.apache.org/jira/browse/LUCENE-1999 Project: Lucene - Java Issue Type: New Feature Affects Versions: 2.9 Reporter: Mark H

[jira] Updated: (LUCENE-1257) Port to Java5

2009-10-20 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DM Smith updated LUCENE-1257: - Attachment: (was: LUCENE-1257_enum.patch) > Port to Java5 > - > > Key: L

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Earwin Burrfoot
That's quite possible to reimplement, I believe. You can have your docid->ordinal map bound to toplevel reader, as it was before and then your FIeldComparator rebases incoming compare() docids based on what last setNextReader() was called with. On Wed, Oct 21, 2009 at 02:07, TomS wrote: > Hi, > >

[jira] Updated: (LUCENE-1998) Use Java 5 enums

2009-10-20 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DM Smith updated LUCENE-1998: - Attachment: LUCENE-1998_enum.patch This issue and patch were part of LUCENE-1257, but may have backward

[jira] Created: (LUCENE-1998) Use Java 5 enums

2009-10-20 Thread DM Smith (JIRA)
Use Java 5 enums Key: LUCENE-1998 URL: https://issues.apache.org/jira/browse/LUCENE-1998 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.0 Reporter: DM Smith Priority: Minor

[jira] Commented: (LUCENE-1257) Port to Java5

2009-10-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767977#action_12767977 ] Uwe Schindler commented on LUCENE-1257: --- DM Smith: Can you open a new issue. This is

[jira] Updated: (LUCENE-1257) Port to Java5

2009-10-20 Thread Kay Kay (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kay Kay updated LUCENE-1257: Attachment: LUCENE-1257_contrib_highlighting.patch > Port to Java5 > - > > Key

[jira] Updated: (LUCENE-1257) Port to Java5

2009-10-20 Thread Kay Kay (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kay Kay updated LUCENE-1257: Attachment: (was: LUCENE-1257_contrib_highlighting.patch) > Port to Java5 > - > >

[jira] Updated: (LUCENE-1257) Port to Java5

2009-10-20 Thread Kay Kay (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kay Kay updated LUCENE-1257: Attachment: LUCENE-1257_contrib_highlighting.patch > Port to Java5 > - > > Key

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread TomS
Hi, I can confirm the below mentioned problems trying to migrate to 2.9. Our Lucene-based (2.4) app uses custom multi-level sorting on a lot of different fields and pretty large indexes (> 100m docs). Most of the fields that we sort on are strings, some with up to 400 characters in length. A lot

[jira] Updated: (LUCENE-1257) Port to Java5

2009-10-20 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DM Smith updated LUCENE-1257: - Attachment: LUCENE-1257_enum.patch Migrates to Java 5 enums in core and contrib. All tests pass. Depreca

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-10-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767963#action_12767963 ] Uwe Schindler commented on LUCENE-1606: --- No prob! I will help you, I am on heavy com

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-10-20 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767961#action_12767961 ] Robert Muir commented on LUCENE-1606: - if no one objects, i'd like to commit this in a

[jira] Commented: (LUCENE-1257) Port to Java5

2009-10-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767953#action_12767953 ] Uwe Schindler commented on LUCENE-1257: --- Thanks! Much cleaner code. -- Committed rev

[jira] Updated: (LUCENE-1257) Port to Java5

2009-10-20 Thread Kay Kay (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kay Kay updated LUCENE-1257: Attachment: LUCENE-1257_more_unnecessary_casts.patch > Port to Java5 > - > > K

[jira] Resolved: (LUCENE-1183) TRStringDistance uses way too much memory (with patch)

2009-10-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1183. Resolution: Fixed Fix Version/s: 3.0 Thanks Cédrik! > TRStringDistance use

[jira] Commented: (LUCENE-1183) TRStringDistance uses way too much memory (with patch)

2009-10-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767935#action_12767935 ] Michael McCandless commented on LUCENE-1183: OK I had 2 hunks fail but I manag

[jira] Updated: (LUCENE-1257) Port to Java5

2009-10-20 Thread Kay Kay (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kay Kay updated LUCENE-1257: Attachment: LUCENE-1257_unnecessary_casts.patch Remove unnecessary cast across the codebase (as a result o

[jira] Commented: (LUCENE-1183) TRStringDistance uses way too much memory (with patch)

2009-10-20 Thread JIRA
[ https://issues.apache.org/jira/browse/LUCENE-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767928#action_12767928 ] Cédrik LIME commented on LUCENE-1183: - Thanks Michael. FuzzyTermEnum.java has not chan

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Michael McCandless
OK I posted a patch that folds the MultiPQ approach into contrib/benchmark, plus a simple python wrapper to run old/new tests across different queries, sort, topN, etc. But I got different results... MultiPQ looks generally slower than SinglePQ. So I think we now need to reconcile what's differen

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Michael McCandless
On Tue, Oct 20, 2009 at 11:47 AM, Michael McCandless wrote: > On Tue, Oct 20, 2009 at 10:49 AM, Mark Miller wrote: >> bq. One trivial thing that could be improved is to perhaps move all of >> the methods to the top of the class? >> >> +1 - I think Mike and silently fought on that one once in the

[jira] Updated: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

2009-10-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1997: --- Attachment: LUCENE-1997.patch New patch attached: * Turn off testing on the balan

[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

2009-10-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767870#action_12767870 ] Michael McCandless commented on LUCENE-1997: OK I ran sortBench.py on opensola

[jira] Commented: (LUCENE-1257) Port to Java5

2009-10-20 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767852#action_12767852 ] Uwe Schindler commented on LUCENE-1257: --- Committed: LUCENE-1257_javacc_upgrade.pa

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread John Wang
Sorry, mistyped again, we have a multivalued field of STRINGS, no integers. -John On Tue, Oct 20, 2009 at 8:55 AM, John Wang wrote: > Hi guys: > I am not suggesting just simply changing the deprecated signatures. > There are some work to be done of course. In the beginning of the thread, we

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread John Wang
Hi guys: I am not suggesting just simply changing the deprecated signatures. There are some work to be done of course. In the beginning of the thread, we discussed two algorithms (both handling per-segment field loading), and at the conclusion, (to be still verified by Mike) that both algorithm

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Michael McCandless
On Tue, Oct 20, 2009 at 10:49 AM, Mark Miller wrote: > bq. One trivial thing that could be improved is to perhaps move all of > the methods to the top of the class? > > +1 - I think Mike and silently fought on that one once in the patches :) > Though I don't know how conscious it was. I prefer the

[jira] Updated: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

2009-10-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1997: --- Attachment: LUCENE-1997.patch Attached patch. Note that patch is based on 2.9.x bra

[jira] Created: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

2009-10-20 Thread Michael McCandless (JIRA)
Explore performance of multi-PQ vs single-PQ sorting API Key: LUCENE-1997 URL: https://issues.apache.org/jira/browse/LUCENE-1997 Project: Lucene - Java Issue Type: Improvement

[jira] Commented: (LUCENE-1183) TRStringDistance uses way too much memory (with patch)

2009-10-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767820#action_12767820 ] Michael McCandless commented on LUCENE-1183: Cédrik, could you update the patc

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Mark Miller
bq. One trivial thing that could be improved is to perhaps move all of the methods to the top of the class? +1 - I think Mike and silently fought on that one once in the patches :) Though I don't know how conscious it was. I prefer the methods at the top myself. Yonik Seeley wrote: > On Tue, Oct

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Earwin Burrfoot
There are some advanced things that are plain impossible with stock new API. Like having more than one HitQueue in your Collector, and stashing overflowing values from one of them into another. Once you cross the segment border - BOOM! Otherwise it may look intimidating, but is pretty simple in fa

RE: Compile failure in 2.9.1 Highlighter

2009-10-20 Thread Uwe Schindler
Lucene 2.9 is Java 1.4 only (in build script), so autoboxing does not work. With trunk it works, but not with 2.9. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: ysee...@gmail.com [mailto:ysee...@gmail.

Re: Compile failure in 2.9.1 Highlighter

2009-10-20 Thread Mark Miller
Thank you, than you, thank you ... I could have run around that for years. Yonik Seeley wrote: > Not sure... but could it just be that you are relying on autoboxing, > which is a Java5 feature, and the core is still marked for Java1.4 > syntax? > > -Yonik > http://www.lucidimagination.com > > > >

Re: Compile failure in 2.9.1 Highlighter

2009-10-20 Thread Yonik Seeley
Not sure... but could it just be that you are relying on autoboxing, which is a Java5 feature, and the core is still marked for Java1.4 syntax? -Yonik http://www.lucidimagination.com On Tue, Oct 20, 2009 at 10:02 AM, Mark Miller wrote: > Can someone help me figure this out. The Highlighter tes

Compile failure in 2.9.1 Highlighter

2009-10-20 Thread Mark Miller
Can someone help me figure this out. The Highlighter test runs and passes for me in Eclipse. It obviously compiles too. But when I try and compile with the ant build scripts, I get: [javac] /home/mark/workspace/lucene_2_9/contrib/highlighter/src/test/org/apache/lucene/search/highlight/Highlig

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Yonik Seeley
On Tue, Oct 20, 2009 at 9:31 AM, Uwe Schindler wrote: > It is not bad, only harder to understand (for some people). The Javadoc is much improved since I made the switch. One trivial thing that could be improved is to perhaps move all of the methods to the top of the class? Right now, if I go and

RE: lucene 2.9 sorting algorithm

2009-10-20 Thread Uwe Schindler
> Actually though - how are we supposed to get back there? I don't think > its as simple as just not removing the deprecated API's. Doesn't even > seem close to that simple. Its another nightmare. It would have to be > some serious wins to go through that pain starting at a 3.0 release > wouldn't i

[jira] Commented: (LUCENE-1183) TRStringDistance uses way too much memory (with patch)

2009-10-20 Thread JIRA
[ https://issues.apache.org/jira/browse/LUCENE-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767805#action_12767805 ] Cédrik LIME commented on LUCENE-1183: - Any news on the landing of this patch? Now that

[jira] Commented: (LUCENE-666) TERM1 OR NOT TERM2 does not perform as expected

2009-10-20 Thread Siddharth Gargate (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767802#action_12767802 ] Siddharth Gargate commented on LUCENE-666: -- Can we rewrite the query (A OR NOT B)

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Mark Miller
Actually though - how are we supposed to get back there? I don't think its as simple as just not removing the deprecated API's. Doesn't even seem close to that simple. Its another nightmare. It would have to be some serious wins to go through that pain starting at a 3.0 release wouldn't it? We just

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Mark Miller
Uwe Schindler wrote: >> On Tue, Oct 20, 2009 at 8:08 AM, Mark Miller >> wrote: >> >>> Hmm - perhaps I'm not remembering right. Or perhaps we had different >>> motivations ;) I never did anything in 1483 based on search perf - and I >>> took your tests as testing that we didn't lose perf, not

RE: lucene 2.9 sorting algorithm

2009-10-20 Thread Uwe Schindler
> On Tue, Oct 20, 2009 at 8:08 AM, Mark Miller > wrote: > > Hmm - perhaps I'm not remembering right. Or perhaps we had different > > motivations ;) I never did anything in 1483 based on search perf - and I > > took your tests as testing that we didn't lose perf, not that we gained > > any. The fac

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Michael McCandless
On Tue, Oct 20, 2009 at 8:21 AM, Mark Miller wrote: > Ahhh - I see - way at the top. Man that was early. Had forgotten about > that stuff even before the issue was finished. Tell me about it -- impossible to remember these things :) I wish I could upgrade the RAM in my brain the way I can in my

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Michael McCandless
On Tue, Oct 20, 2009 at 8:08 AM, Mark Miller wrote: > Hmm - perhaps I'm not remembering right. Or perhaps we had different > motivations ;) I never did anything in 1483 based on search perf - and I > took your tests as testing that we didn't lose perf, not that we gained > any. The fact that there

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Mark Miller
Ahhh - I see - way at the top. Man that was early. Had forgotten about that stuff even before the issue was finished. Mark Miller wrote: > Hmm - perhaps I'm not remembering right. Or perhaps we had different > motivations ;) I never did anything in 1483 based on search perf - and I > took your tes

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Mark Miller
Hmm - perhaps I'm not remembering right. Or perhaps we had different motivations ;) I never did anything in 1483 based on search perf - and I took your tests as testing that we didn't lose perf, not that we gained any. The fact that there were some wins was just a nice surprise from my perspective.

[jira] Resolved: (LUCENE-1993) MoreLikeThis - allow to exclude terms that appear in too many documents (patch included)

2009-10-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1993. Resolution: Fixed Fix Version/s: 3.0 Thanks Christian! > MoreLikeThis - al

[jira] Commented: (LUCENE-1993) MoreLikeThis - allow to exclude terms that appear in too many documents (patch included)

2009-10-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767781#action_12767781 ] Michael McCandless commented on LUCENE-1993: Patch looks good... I'll commit s

[jira] Assigned: (LUCENE-1993) MoreLikeThis - allow to exclude terms that appear in too many documents (patch included)

2009-10-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-1993: -- Assignee: Michael McCandless > MoreLikeThis - allow to exclude terms that appe

[jira] Commented: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767779#action_12767779 ] Michael McCandless commented on LUCENE-1987: bq. How to handle the problem wit

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Michael McCandless
On Tue, Oct 20, 2009 at 6:51 AM, Mark Miller wrote: > I didn't really follow that thread either - but we didn't move to the new > Comp Api because of it's perfomance vs the old. We did (LUCENE-1483), but those perf tests mixed in a number of other improvements (eg, searching by segment avoids the

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Mark Miller
I didn't really follow that thread either - but we didn't move to the new Comp Api because of it's perfomance vs the old. - Mark http://www.lucidimagination.com (mobile) On Oct 20, 2009, at 4:22 AM, "Uwe Schindler" wrote: I did not follow the whole thread, but I do not understand what’s ba

Re: [Lucene-java Wiki] Update of "LuceneAtApacheConUs2009" by HossMan

2009-10-20 Thread Karl Wettin
20 okt 2009 kl. 07.15 skrev Apache Wiki: + There will be a Lucene/Search !MeetUp on Tuesday night at 8PM. 'This event is open to anyone who wants to come, even if you are not registered for the conference'. That is a really nice thing, and completely new if I'm not misstaken. Pe

[jira] Resolved: (LUCENE-1995) ArrayIndexOutOfBoundsException during indexing

2009-10-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1995. Resolution: Fixed Thanks Aaron! Maybe someday Lucene will allow a larger RAM buff

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Michael McCandless
Sorry, I have been digging into it, just didn't get far enough to post patch/results. I'll try to do so today. I did find one bug in OneSortNoScoreCollector, in the getTop() method in the inner compare() method, to break ties it should be: if (v==0 { v = o1.doc + o1.comparatorQueue._base -

RE: lucene 2.9 sorting algorithm

2009-10-20 Thread Uwe Schindler
I did not follow the whole thread, but I do not understand what's bad with the new API that rectifies to preserve the old one. The old API does not fit very well with the segment based search and a lot of ugly stuff was done around to make both APIs work the same. For me it is not very complica