Problematic documentation in o.a.l.analysis.package.html

2009-11-22 Thread Shai Erera
Hi I've read the analysis package.html and I found two issues: 1) The code sample under Invoking the Analyzer is broken. It calls incrementToken() but inside the while it prints 'ts' (which is TokenStream) and then do "t = ts.next()", which no longer works. That's an easy fix, so I don't think a

[jira] Commented: (LUCENE-2068) fix reverseStringFilter for unicode 4.0

2009-11-22 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781102#action_12781102 ] Simon Willnauer commented on LUCENE-2068: - Robert, we can take the MatchV. Version

RE: Problematic documentation in o.a.l.analysis.package.html

2009-11-22 Thread Uwe Schindler
Hi Shai, Thanks for the suggestions! About your points: 1) This is really wrong, we can easily fix it for 3.1. Lucene 3.0 is already in the vote phase and 2.9x is also already out. 2) Maybe the explanation is not so good. This text comes especially from the 2.9 old to new TS

Re: Hiding JIRA issues

2009-11-22 Thread Michael McCandless
Eek, that's kinda spooky... that we didn't get to the root cause. I sure hope Lucene is not to blame ;) Mike On Sat, Nov 21, 2009 at 9:21 PM, Robert Muir wrote: > i sent a note to infrastructure about this, they reindexed, and everything > is fixed now. > > On Sat, Nov 21, 2009 at 10:20 AM, Uwe

[jira] Commented: (LUCENE-2075) Share the Term -> TermInfo cache across threads

2009-11-22 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781110#action_12781110 ] Michael McCandless commented on LUCENE-2075: {quote} bq. BTW the flex branch f

[jira] Commented: (LUCENE-1877) Use NativeFSLockFactory as default for new API (direct ctors & FSDir.open)

2009-11-22 Thread Thomas Mueller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1278#action_1278 ] Thomas Mueller commented on LUCENE-1877: > take it somewhere other than this close

[jira] Commented: (LUCENE-2075) Share the Term -> TermInfo cache across threads

2009-11-22 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781112#action_12781112 ] Michael McCandless commented on LUCENE-2075: bq. in both cases, its slower th

Re: Problematic documentation in o.a.l.analysis.package.html

2009-11-22 Thread Shai Erera
Thanks Uwe. About (3), I use copyTo, not clone. I used the word 'clone' just out of habit. I'll read more about captureState, but I think copyTo works fine for me. Abour (2), I still think it's confusing. When I read addAttribute, I get an impression as if by calling this method, it is guaranteed

[jira] Commented: (LUCENE-1877) Use NativeFSLockFactory as default for new API (direct ctors & FSDir.open)

2009-11-22 Thread Thomas Mueller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781113#action_12781113 ] Thomas Mueller commented on LUCENE-1877: > detect the wakeup / polling interval ex

RE: Problematic documentation in o.a.l.analysis.package.html

2009-11-22 Thread Uwe Schindler
About (3): CopyTo is also only availabe for AttributeImpls, but not for the interfaces (you have to cast first) and then you are warned. If you copyTo() on a TernmAttribute, it may also copy other attributes with it, if TermAttribute and PosIncr. Attribute are all implemented by the same AttributeI

Re: Problematic documentation in o.a.l.analysis.package.html

2009-11-22 Thread Shai Erera
Perhaps copyTo works for me because I reference Token, but like I said it's working for me ... Thanks for the tip regarding clearAttributes(). I assume I'll get the same behavior if I clear the attributes one by one, defaulting their values to whatever are my defaults. Well ... IMO as a user, the

RE: Problematic documentation in o.a.l.analysis.package.html

2009-11-22 Thread Uwe Schindler
Abvout clearAttributes: Just the warning if your clear the attributes one by one, you have two problems: - you can only clear attributes you know about. E.g. most Tokenizers just set TermAttribute and OffsetAttribute (because only these two attributes are interesting). The PosIncr attribu

[jira] Created: (LUCENE-2088) AttributeSource.addAttribute should only accept interfaces, the missing test leads to problems with Token.TOKEN_ATTRIBUTE_FACTORY

2009-11-22 Thread Uwe Schindler (JIRA)
AttributeSource.addAttribute should only accept interfaces, the missing test leads to problems with Token.TOKEN_ATTRIBUTE_FACTORY - Key: LUCENE-2088

[jira] Updated: (LUCENE-2088) AttributeSource.addAttribute should only accept interfaces, the missing test leads to problems with Token.TOKEN_ATTRIBUTE_FACTORY

2009-11-22 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2088: -- Attachment: LUCENE-2088.patch Here the patch, will commit soon and respawn 3.0. I will also m

[jira] Commented: (LUCENE-2088) AttributeSource.addAttribute should only accept interfaces, the missing test leads to problems with Token.TOKEN_ATTRIBUTE_FACTORY

2009-11-22 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781122#action_12781122 ] Earwin Burrfoot commented on LUCENE-2088: - bq. && Attribute.class.isAssignableFrom

RE: [jira] Commented: (LUCENE-2088) AttributeSource.addAttribute should only accept interfaces, the missing test leads to problems with Token.TOKEN_ATTRIBUTE_FACTORY

2009-11-22 Thread Uwe Schindler
If you use it type unsafe without generics, it will break. And we need it for 2.9. I was thinking about both variants and thought it would be better to leave it in. I will merge this now to 2.9, too. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@theta

[jira] Commented: (LUCENE-2088) AttributeSource.addAttribute should only accept interfaces, the missing test leads to problems with Token.TOKEN_ATTRIBUTE_FACTORY

2009-11-22 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781124#action_12781124 ] Uwe Schindler commented on LUCENE-2088: --- If you use it type unsafe without generics,

[jira] Updated: (LUCENE-2087) Remove recursion in NumericRangeTermEnum

2009-11-22 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2087: -- Fix Version/s: 3.0 > Remove recursion in NumericRangeTermEnum > --

[jira] Updated: (LUCENE-2088) AttributeSource.addAttribute should only accept interfaces, the missing test leads to problems with Token.TOKEN_ATTRIBUTE_FACTORY

2009-11-22 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2088: -- Attachment: LUCENE-2088-test.patch This patch shows how you can break. As Shai said, the prob

[jira] Commented: (LUCENE-2088) AttributeSource.addAttribute should only accept interfaces, the missing test leads to problems with Token.TOKEN_ATTRIBUTE_FACTORY

2009-11-22 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781129#action_12781129 ] Uwe Schindler commented on LUCENE-2088: --- Thinking about it more and reading http://

[jira] Commented: (LUCENE-2088) AttributeSource.addAttribute should only accept interfaces, the missing test leads to problems with Token.TOKEN_ATTRIBUTE_FACTORY

2009-11-22 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781131#action_12781131 ] Uwe Schindler commented on LUCENE-2088: --- But its no problem anymore, the sun bug is

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-22 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781134#action_12781134 ] Michael McCandless commented on LUCENE-1606: Are we going to deprecate contrib

[jira] Created: (LUCENE-2089) explore using automaton for fuzzyquery

2009-11-22 Thread Robert Muir (JIRA)
explore using automaton for fuzzyquery -- Key: LUCENE-2089 URL: https://issues.apache.org/jira/browse/LUCENE-2089 Project: Lucene - Java Issue Type: Wish Components: Search Reporter:

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-22 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781135#action_12781135 ] Robert Muir commented on LUCENE-1606: - bq. Are we going to deprecate contrib/regex wit

[VOTE] Release Apache Lucene Java 3.0.0 (take #2)

2009-11-22 Thread Uwe Schindler
Hi, I have built the artifacts for the final release of "Apache Lucene Java 3.0.0" a second time, because of a bug in the TokenStream API (found by Shai Erera, who wanted to make "bad" things with addAttribute, breaking its behaviour, LUCENE-2088) and an improvement in NumericRangeQuery (to preven

[jira] Commented: (LUCENE-2089) explore using automaton for fuzzyquery

2009-11-22 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781136#action_12781136 ] Mark Miller commented on LUCENE-2089: - bq. (i will assign this to him, I know he is it

[jira] Updated: (LUCENE-2089) explore using automaton for fuzzyquery

2009-11-22 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2089: Description: Mark brought this up on LUCENE-1606 (i will assign this to him, I know he is itching

[jira] Updated: (LUCENE-2089) explore using automaton for fuzzyquery

2009-11-22 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2089: Description: Mark brought this up on LUCENE-1606 (i will assign this to him, I know he is itching

[jira] Commented: (LUCENE-2089) explore using automaton for fuzzyquery

2009-11-22 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781138#action_12781138 ] Uwe Schindler commented on LUCENE-2089: --- bq. ha - too much wine last night to laugh

[jira] Commented: (LUCENE-2089) explore using automaton for fuzzyquery

2009-11-22 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781139#action_12781139 ] Robert Muir commented on LUCENE-2089: - by the way, the only open impl of this algorith

Re: svn commit: r883088 - in /lucene/java/branches/flex_1458/src/java/org/apache/lucene/index: TermRef.java codecs/standard/StandardTermsDictReader.java

2009-11-22 Thread Robert Muir
was this why i saw strange benchmark results? On Sun, Nov 22, 2009 at 9:52 AM, wrote: > Author: mikemccand > Date: Sun Nov 22 14:52:02 2009 > New Revision: 883088 > > URL: http://svn.apache.org/viewvc?rev=883088&view=rev > Log: > LUCENE-1458 (on flex branch): small optimization to terms dict cac

[jira] Commented: (LUCENE-2089) explore using automaton for fuzzyquery

2009-11-22 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781140#action_12781140 ] Robert Muir commented on LUCENE-2089: - I hope its obvious from the benchmark why we sh

[jira] Commented: (LUCENE-2089) explore using automaton for fuzzyquery

2009-11-22 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781142#action_12781142 ] Mark Miller commented on LUCENE-2089: - I'll take a look anyway - too bad I can't find

[jira] Commented: (LUCENE-2089) explore using automaton for fuzzyquery

2009-11-22 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781144#action_12781144 ] Robert Muir commented on LUCENE-2089: - bq. I'll take a look anyway - too bad I can't f

[jira] Commented: (LUCENE-2089) explore using automaton for fuzzyquery

2009-11-22 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781149#action_12781149 ] Mark Miller commented on LUCENE-2089: - bq. we can precompute the tables with that algo

Re: svn commit: r883088 - in /lucene/java/branches/flex_1458/src/java/org/apache/lucene/index: TermRef.java codecs/standard/StandardTermsDictReader.java

2009-11-22 Thread Michael McCandless
No, not really... just an optimization I found when hunting ;) I'm working now on an AutomatonTermsEnum that uses the flex API directly, to test that performance. One of the major challenges with flex is the 4-way testing required. Ie, you can have a non-flex or flex index, and then you can acces

[jira] Commented: (LUCENE-2089) explore using automaton for fuzzyquery

2009-11-22 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781151#action_12781151 ] Robert Muir commented on LUCENE-2089: - Mark, they would get large fast, but i think we

[jira] Commented: (LUCENE-2089) explore using automaton for fuzzyquery

2009-11-22 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781154#action_12781154 ] Robert Muir commented on LUCENE-2089: - Another twist, is that we have to support the '

Re: svn commit: r883088 - in /lucene/java/branches/flex_1458/src/java/org/apache/lucene/index: TermRef.java codecs/standard/StandardTermsDictReader.java

2009-11-22 Thread Robert Muir
On Sun, Nov 22, 2009 at 11:23 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > No, not really... just an optimization I found when hunting ;) > > I'm working now on an AutomatonTermsEnum that uses the flex API > directly, to test that performance. > > I didn't mean to 'bail out' on thi

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-22 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781155#action_12781155 ] Michael McCandless commented on LUCENE-1606: {quote} bq. Are we going to depre

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-22 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781156#action_12781156 ] Robert Muir commented on LUCENE-1606: - bq. Would be good to call out what's different

[jira] Commented: (LUCENE-2089) explore using automaton for fuzzyquery

2009-11-22 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781157#action_12781157 ] Mark Miller commented on LUCENE-2089: - bq. Mark, they would get large fast, but i thin

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-22 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781160#action_12781160 ] Michael McCandless commented on LUCENE-1606: I don't have any wording -- I don

[jira] Commented: (LUCENE-2089) explore using automaton for fuzzyquery

2009-11-22 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781161#action_12781161 ] Robert Muir commented on LUCENE-2089: - bq. Generally, if you have any kind of length t

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-22 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781162#action_12781162 ] Robert Muir commented on LUCENE-1606: - bq. If it's "only" that the syntax is different

[jira] Issue Comment Edited: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-22 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781162#action_12781162 ] Robert Muir edited comment on LUCENE-1606 at 11/22/09 5:06 PM: -

[jira] Commented: (LUCENE-2089) explore using automaton for fuzzyquery

2009-11-22 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781167#action_12781167 ] Mark Miller commented on LUCENE-2089: - Right, I wouldn't expect it to be great with a

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-22 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781168#action_12781168 ] Robert Muir commented on LUCENE-1606: - we call this out nicely in the current RegexQue

[jira] Commented: (LUCENE-2089) explore using automaton for fuzzyquery

2009-11-22 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781170#action_12781170 ] Robert Muir commented on LUCENE-2089: - Mark maybe, though it also depends largely on t

[jira] Issue Comment Edited: (LUCENE-2089) explore using automaton for fuzzyquery

2009-11-22 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781170#action_12781170 ] Robert Muir edited comment on LUCENE-2089 at 11/22/09 5:44 PM: -

[jira] Commented: (LUCENE-2089) explore using automaton for fuzzyquery

2009-11-22 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781173#action_12781173 ] Mark Miller commented on LUCENE-2089: - bq. the constant prefix is just an optimizatio

[jira] Commented: (LUCENE-2089) explore using automaton for fuzzyquery

2009-11-22 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781174#action_12781174 ] Robert Muir commented on LUCENE-2089: - bq. With a prefix of 1 again? Yeah - you really

[jira] Issue Comment Edited: (LUCENE-2089) explore using automaton for fuzzyquery

2009-11-22 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781174#action_12781174 ] Robert Muir edited comment on LUCENE-2089 at 11/22/09 6:01 PM: -

[jira] Commented: (LUCENE-2089) explore using automaton for fuzzyquery

2009-11-22 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781175#action_12781175 ] Mark Miller commented on LUCENE-2089: - bq. we must "use" the prefix, so the results ar

Re: svn commit: r883088 - in /lucene/java/branches/flex_1458/src/java/org/apache/lucene/index: TermRef.java codecs/standard/StandardTermsDictReader.java

2009-11-22 Thread Michael McCandless
On Sun, Nov 22, 2009 at 11:31 AM, Robert Muir wrote: >> No, not really... just an optimization I found when hunting ;) >> >> I'm working now on an AutomatonTermsEnum that uses the flex API >> directly, to test that performance. >> > > I didn't mean to 'bail out' on this You didn't 'bail out'; I

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-22 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781178#action_12781178 ] Michael McCandless commented on LUCENE-1606: OK that warning seems good. Mayb

[jira] Issue Comment Edited: (LUCENE-2089) explore using automaton for fuzzyquery

2009-11-22 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781175#action_12781175 ] Mark Miller edited comment on LUCENE-2089 at 11/22/09 6:06 PM: -

[jira] Commented: (LUCENE-2089) explore using automaton for fuzzyquery

2009-11-22 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781179#action_12781179 ] Robert Muir commented on LUCENE-2089: - bq. Basically, what I'm saying is the old Fuzzy

Re: svn commit: r883088 - in /lucene/java/branches/flex_1458/src/java/org/apache/lucene/index: TermRef.java codecs/standard/StandardTermsDictReader.java

2009-11-22 Thread Robert Muir
Mike, I guess what I am implying is should i even bother with lucene-1606 and trunk? or instead, should i be helping you, looking at TermsEnum, and working on integrating it into flex? On Sun, Nov 22, 2009 at 1:05 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Sun, Nov 22, 2009

[jira] Issue Comment Edited: (LUCENE-2089) explore using automaton for fuzzyquery

2009-11-22 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781175#action_12781175 ] Mark Miller edited comment on LUCENE-2089 at 11/22/09 6:12 PM: -

[jira] Commented: (LUCENE-2089) explore using automaton for fuzzyquery

2009-11-22 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781180#action_12781180 ] Mark Miller commented on LUCENE-2089: - bq. if it requires more edits than that, go wit

[jira] Updated: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-22 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1606: --- Attachment: LUCENE-1606-flex.patch First cut @ cutting over to flex API attached --

[jira] Commented: (LUCENE-2089) explore using automaton for fuzzyquery

2009-11-22 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781182#action_12781182 ] Robert Muir commented on LUCENE-2089: - bq. I wouldnt really like it if it was default

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-22 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781183#action_12781183 ] Robert Muir commented on LUCENE-1606: - bq. Looks like flex API is faster for the slow

Re: svn commit: r883088 - in /lucene/java/branches/flex_1458/src/java/org/apache/lucene/index: TermRef.java codecs/standard/StandardTermsDictReader.java

2009-11-22 Thread Michael McCandless
I think you should keep doing all LUCENE-1606 work (and, any other issues) on trunk, and then we merge down to flex branch once it's committed? We shouldn't hold up any trunk features because flex is coming... merging down every so often seems manageable so far (Mark?). I'm hoping to finish flex

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-22 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781185#action_12781185 ] Robert Muir commented on LUCENE-1606: - Mike, I think your port to TermsEnum is correct

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-22 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781188#action_12781188 ] Michael McCandless commented on LUCENE-1606: {quote} One question, is it possi

Re: svn commit: r883088 - in /lucene/java/branches/flex_1458/src/java/org/apache/lucene/index: TermRef.java codecs/standard/StandardTermsDictReader.java

2009-11-22 Thread Robert Muir
ok, I only ask because some rework of this enum could be necessary to take advantage of the new api. examples include changing it to use char[] (easy) to prevent lots of string creation, which was unavoidable with TermEnum since it is based on string. i will never mention this again, but it could

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-22 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781190#action_12781190 ] Robert Muir commented on LUCENE-1606: - bq. Oh, that'd be great! It would be faster. I

[jira] Updated: (LUCENE-1260) Norm codec strategy in Similarity

2009-11-22 Thread Johan Kindgren (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johan Kindgren updated LUCENE-1260: --- Attachment: Lucene-1260-2.patch I've added the old static methods again, but made them depre

Re: svn commit: r883088 - in /lucene/java/branches/flex_1458/src/java/org/apache/lucene/index: TermRef.java codecs/standard/StandardTermsDictReader.java

2009-11-22 Thread Mark Miller
bq. merging down every so often seems manageable so far (Mark?). Yeah, this has been working great from my perspective. Michael McCandless wrote: > I think you should keep doing all LUCENE-1606 work (and, any other > issues) on trunk, and then we merge down to flex branch once it's > committed? >

[jira] Commented: (LUCENE-2089) explore using automaton for fuzzyquery

2009-11-22 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781204#action_12781204 ] Mark Miller commented on LUCENE-2089: - bq. we find the nice n where this is almost as

[jira] Commented: (LUCENE-2089) explore using automaton for fuzzyquery

2009-11-22 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781205#action_12781205 ] Robert Muir commented on LUCENE-2089: - bq. but you will notice both the Lucene qp and

[jira] Commented: (LUCENE-2089) explore using automaton for fuzzyquery

2009-11-22 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781206#action_12781206 ] Mark Miller commented on LUCENE-2089: - I think it makes sense to allow leading ? - ???

[jira] Commented: (LUCENE-2089) explore using automaton for fuzzyquery

2009-11-22 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781207#action_12781207 ] Robert Muir commented on LUCENE-2089: - mark, you are right. plus, the qp does not thr

[jira] Commented: (LUCENE-2089) explore using automaton for fuzzyquery

2009-11-22 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781208#action_12781208 ] Mark Miller commented on LUCENE-2089: - solr doesnt even allow for a constant prefix wi

[jira] Commented: (LUCENE-2072) Upgrade contrib/regex to jakarta-regex 1.5

2009-11-22 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781209#action_12781209 ] Simon Willnauer commented on LUCENE-2072: - I just added a testcase to check if the

[jira] Resolved: (LUCENE-2072) Upgrade contrib/regex to jakarta-regex 1.5

2009-11-22 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-2072. - Resolution: Later once jakarta-regexp fixes their issues we can go on and upgrade. for n

Re: svn commit: r883088 - in /lucene/java/branches/flex_1458/src/java/org/apache/lucene/index: TermRef.java codecs/standard/StandardTermsDictReader.java

2009-11-22 Thread Michael McCandless
Yeah I think there will be lots of optimizing we can do, after flex lands. Maybe stick w/ String for now? But open an issue, today, to remind us to cutover to char[] post-flex? Doing all processing in UTF8 is tantalizing too ;) This would mean no conversion of the terms data on iterating from t

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-22 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781213#action_12781213 ] Michael McCandless commented on LUCENE-1606: bq. it would be nice I think if T

Re: svn commit: r883088 - in /lucene/java/branches/flex_1458/src/java/org/apache/lucene/index: TermRef.java codecs/standard/StandardTermsDictReader.java

2009-11-22 Thread Robert Muir
On Sun, Nov 22, 2009 at 3:50 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > Yeah I think there will be lots of optimizing we can do, after flex lands. > > Maybe stick w/ String for now? But open an issue, today, to remind us > to cutover to char[] post-flex? > ok, i'll create one.

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-22 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781214#action_12781214 ] Robert Muir commented on LUCENE-1606: - bq. I agree... though, this requires state (Uni

Re: svn commit: r883088 - in /lucene/java/branches/flex_1458/src/java/org/apache/lucene/index: TermRef.java codecs/standard/StandardTermsDictReader.java

2009-11-22 Thread Michael McCandless
On Sun, Nov 22, 2009 at 3:52 PM, Robert Muir wrote: > > On Sun, Nov 22, 2009 at 3:50 PM, Michael McCandless > wrote: >> >> Yeah I think there will be lots of optimizing we can do, after flex lands. >> >> Maybe stick w/ String for now?  But open an issue, today, to remind us >> to cutover to char[

[jira] Created: (LUCENE-2090) convert automaton to char[] based processing and TermRef / TermsEnum api

2009-11-22 Thread Robert Muir (JIRA)
convert automaton to char[] based processing and TermRef / TermsEnum api Key: LUCENE-2090 URL: https://issues.apache.org/jira/browse/LUCENE-2090 Project: Lucene - Java

Re: svn commit: r883088 - in /lucene/java/branches/flex_1458/src/java/org/apache/lucene/index: TermRef.java codecs/standard/StandardTermsDictReader.java

2009-11-22 Thread Robert Muir
I guess here is where I just say that unicode and java are optimized for utf-16 processing, and so while I agree with byte[] being available in places like this for flex indexing, I'm already nervous about seeing code / optimizations that only work well with latin-1, and are very slow / buggy for a

[jira] Updated: (LUCENE-2068) fix reverseStringFilter for unicode 4.0

2009-11-22 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2068: Attachment: LUCENE_2068.patch added a CHANGES.txt entry. Will commit soon. > fix reverseS

[jira] Resolved: (LUCENE-2068) fix reverseStringFilter for unicode 4.0

2009-11-22 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-2068. - Resolution: Fixed Commited in revision 883149 > fix reverseStringFilter for unicode 4.0

Re: svn commit: r883088 - in /lucene/java/branches/flex_1458/src/java/org/apache/lucene/index: TermRef.java codecs/standard/StandardTermsDictReader.java

2009-11-22 Thread Michael McCandless
On Sun, Nov 22, 2009 at 4:06 PM, Robert Muir wrote: > I guess here is where I just say that unicode and java are optimized for > utf-16 processing I agree, though leaving things as UTF8 works fine for low level stuff (sorting, comparing equality, etc.)? > and so while I agree with byte[] being a

Re: svn commit: r883088 - in /lucene/java/branches/flex_1458/src/java/org/apache/lucene/index: TermRef.java codecs/standard/StandardTermsDictReader.java

2009-11-22 Thread Robert Muir
On Sun, Nov 22, 2009 at 4:16 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Sun, Nov 22, 2009 at 4:06 PM, Robert Muir wrote: > > I guess here is where I just say that unicode and java are optimized for > > utf-16 processing > > I agree, though leaving things as UTF8 works fine fo

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-22 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781219#action_12781219 ] Michael McCandless commented on LUCENE-1606: Besides TermsEnum.. TermRef is us

RE: [VOTE] Release Apache Lucene Java 3.0.0 (take #2)

2009-11-22 Thread Uwe Schindler
Hi, As a non-counting vote: +1 to release these artifacts as Lucene 3.0 I tested lucene-core.3.0.0.jar with my updated application, no problems occurred. QueryParser search works, fieldcache/sorting works, numeric range works. Reopen also works correct, no leftover open files. MMPaDirectory on 6

Re: svn commit: r883088 - in /lucene/java/branches/flex_1458/src/java/org/apache/lucene/index: TermRef.java codecs/standard/StandardTermsDictReader.java

2009-11-22 Thread Michael McCandless
On Sun, Nov 22, 2009 at 4:19 PM, Robert Muir wrote: >> What places specifically are you worried about? > > places like AutomatonQuery, where I found myself wanting to consider the > option of processing byte[], when I know this is very bad! Ahh OK :) Well you got the better of yourself before i

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-22 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781220#action_12781220 ] Robert Muir commented on LUCENE-1606: - bq. We can discuss this under the new [separate

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-22 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781221#action_12781221 ] Michael McCandless commented on LUCENE-1606: bq. is there a jira issue for thi

[jira] Commented: (LUCENE-2090) convert automaton to char[] based processing and TermRef / TermsEnum api

2009-11-22 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781222#action_12781222 ] Michael McCandless commented on LUCENE-2090: Spinoff from LUCENE-1606. > conv

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-22 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781223#action_12781223 ] Robert Muir commented on LUCENE-1606: - bq. I thought you were about to open one! I op

[jira] Commented: (LUCENE-1260) Norm codec strategy in Similarity

2009-11-22 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781224#action_12781224 ] Michael McCandless commented on LUCENE-1260: Patch looks good! Thanks Johan.

RE: [VOTE] Release Apache Lucene Java 3.0.0 (take #2)

2009-11-22 Thread Uwe Schindler
> Hi, > > As a non-counting vote: > > +1 to release these artifacts as Lucene 3.0 > > I tested lucene-core.3.0.0.jar with my updated application, no problems > occurred. QueryParser search works, fieldcache/sorting works, numeric > range > works. Reopen also works correct, no leftover open files

[jira] Updated: (LUCENE-2089) explore using automaton for fuzzyquery

2009-11-22 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-2089: Attachment: Moman-0.1.tar.gz >From Moman author: Absolutely. Sorry for the missing links. I had s

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-11-22 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781227#action_12781227 ] Robert Muir commented on LUCENE-1606: - bq. Actually... wouldn't we need to convert to

  1   2   >