[jira] Commented: (LUCENE-1460) Change all contrib TokenStreams/Filters to use the new TokenStream API

2009-05-22 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712295#action_12712295 ] Robert Muir commented on LUCENE-1460: - is anyone working on this? I have some function

[jira] Issue Comment Edited: (LUCENE-1654) Include diagnostics per-segment when writing a new segment

2009-05-22 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712269#action_12712269 ] Earwin Burrfoot edited comment on LUCENE-1654 at 5/22/09 2:26 PM: --

[jira] Commented: (LUCENE-1654) Include diagnostics per-segment when writing a new segment

2009-05-22 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712269#action_12712269 ] Earwin Burrfoot commented on LUCENE-1654: - Let's have string key-value pairs per-s

Re: Lucene's default settings & back compatibility

2009-05-22 Thread Michael McCandless
On Fri, May 22, 2009 at 3:37 PM, DM Smith wrote: > So, what is it that they use that leads to such unfavorable results? I think it's simply that they take each search engine, get it to index their collection in the most obvious way, perhaps having read a tutorial somewhere, and test that. I'm g

Re: Lucene's default settings & back compatibility

2009-05-22 Thread DM Smith
Earwin Burrfoot wrote: 4. [Maybe?] Allow certain limited changes that will require source code changes in your app on upgrading to a new minor release: adding a new method to an interface, adding a new abstract method to an abstract class, renaming of deprecated methods. Yaho

Re: Lucene's default settings & back compatibility

2009-05-22 Thread Marvin Humphrey
On Fri, May 22, 2009 at 10:40:03PM +0400, Earwin Burrfoot wrote: > >> Custom analyzers. > > No problem. > How are they recorded in the index? Analyzers must implement dump() and load(), which convert the Analyzer to/from a JSON-izable data structure. They end up as JSON in index_dir/schema_NNN.js

Re: Lucene's default settings & back compatibility

2009-05-22 Thread DM Smith
Michael McCandless wrote: Well... I would expect & hope Lucene's adoption is growing with time, so the number of new users should increase on each release. For a healthy project that's relatively young compared to its potential user base, that growth should be exponential. And, I'd expect the v

[jira] Created: (LUCENE-1654) Include diagnostics per-segment when writing a new segment

2009-05-22 Thread Michael McCandless (JIRA)
Include diagnostics per-segment when writing a new segment -- Key: LUCENE-1654 URL: https://issues.apache.org/jira/browse/LUCENE-1654 Project: Lucene - Java Issue Type: Improvement

Re: Lucene's default settings & back compatibility

2009-05-22 Thread Michael McCandless
Well... I would expect & hope Lucene's adoption is growing with time, so the number of new users should increase on each release. For a healthy project that's relatively young compared to its potential user base, that growth should be exponential. And, I'd expect the vast majority of old users do

Re: Lucene's default settings & back compatibility

2009-05-22 Thread Michael McCandless
I'd like to do this for 2.9 :) I'll open an issue... (Yes this would just be for diagnostics). Mike On Fri, May 22, 2009 at 1:48 PM, DM Smith wrote: > Yonik Seeley wrote: >> >> On Fri, May 22, 2009 at 1:22 PM, Michael McCandless >> wrote: >> >>> >>> (That said, unrelated to this discussion, I

Re: Lucene's default settings & back compatibility

2009-05-22 Thread DM Smith
Michael McCandless wrote: On Fri, May 22, 2009 at 2:27 PM, DM Smith wrote: Marvin Humphrey wrote: I feel the opposite: I'd like new users to see improvements by default, and users that require strict back-compate to ask for that. By "strict back-compat", do you mean "people

Re: Lucene's default settings & back compatibility

2009-05-22 Thread Michael McCandless
OK, net/net it doesn't look like we're going reach agreement on some general approach for having users of Lucene always get the best default settings. We started with the *Settings classes, but that's really a very large project (goes far beyond managing defaults for new users). Then we went to t

Re: Lucene's default settings & back compatibility

2009-05-22 Thread Earwin Burrfoot
>> Custom analyzers. > No problem. How are they recorded in the index? >> Several indexes using the same analyzer. > No problem.  Only necessary if the analyzer is costly or has some esoteric > need for shared state.  And possible via subclassing Schema or Analyzer. It is. >> Intentionally differ

Re: Lucene's default settings & back compatibility

2009-05-22 Thread Michael McCandless
On Fri, May 22, 2009 at 2:27 PM, DM Smith wrote: > Marvin Humphrey wrote: >>> >>> I feel the opposite: I'd like new users to see improvements by >>> default, and users that require strict back-compate to ask for that. >>> >> >> By "strict back-compat", do you mean "people who would like their sear

Re: Lucene's default settings & back compatibility

2009-05-22 Thread DM Smith
Marvin Humphrey wrote: I feel the opposite: I'd like new users to see improvements by default, and users that require strict back-compate to ask for that. By "strict back-compat", do you mean "people who would like their search app to not fail silently"? ;) A "new user" who follows your a

Re: Lucene's default settings & back compatibility

2009-05-22 Thread Marvin Humphrey
On Fri, May 22, 2009 at 09:06:32PM +0400, Earwin Burrfoot wrote: > > In KinoSearch SVN trunk, satellite classes like QueryParser and Highlighter > > have to be passed a Schema, which contains all the Analyzers.  Analyzers > > aren't satellite classes under this model -- they are a fixed property of

Re: Lucene's default settings & back compatibility

2009-05-22 Thread DM Smith
Michael McCandless wrote: On Fri, May 22, 2009 at 12:52 PM, Marvin Humphrey wrote: when working on 3.1 if we make some great improvement, I'd like new users in 3.1 to see the improvement by default. Sounds like an argument for more frequent major releases. Yeah. Or "rebrandin

Re: Lucene's default settings & back compatibility

2009-05-22 Thread Marvin Humphrey
On Fri, May 22, 2009 at 01:22:24PM -0400, Michael McCandless wrote: > > Sounds like an argument for more frequent major releases. > > Yeah. Or "rebranding" what we now call minor as major releases, by > changing our policy ;) Not sure how much of that is a jest, bug I don't think that's a good

Re: Lucene's default settings & back compatibility

2009-05-22 Thread Michael McCandless
On Fri, May 22, 2009 at 12:44 PM, Yonik Seeley wrote: > I'm not a lawyer, so I dislike trying to nail down every detail in > writing and try to solve future problems in the abstract. Agreed, and there's always leeway in what we work out here (LUCENE-1436 is a good recent example), but I think wor

Re: Lucene's default settings & back compatibility

2009-05-22 Thread Marvin Humphrey
> I feel the opposite: I'd like new users to see improvements by > default, and users that require strict back-compate to ask for that. By "strict back-compat", do you mean "people who would like their search app to not fail silently"? ;) A "new user" who follows your advice... // haha stupid

Re: Lucene's default settings & back compatibility

2009-05-22 Thread DM Smith
Yonik Seeley wrote: On Fri, May 22, 2009 at 1:22 PM, Michael McCandless wrote: (That said, unrelated to this discussion, I would actually like to record per-segment which version of Lucene wrote the segment; this would be very helpful when debugging issues like LUCENE-1474 where I need to kn

[jira] Updated: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

2009-05-22 Thread Ali Oral (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ali Oral updated LUCENE-1486: - Comment: was deleted (was: This issue is very interesting. I see that you use query rewrite for wildcar

Re: Lucene's default settings & back compatibility

2009-05-22 Thread Michael McCandless
On Fri, May 22, 2009 at 12:37 PM, Marvin Humphrey wrote: > I still like per-class settings classes. For instance, an IndexWriterSettings > class which allows you to hide away all the tweaky stuff that's cluttering up > the IndexWriter API. > > IndexWriterSettings settings = new IndexWriterSett

Re: Lucene's default settings & back compatibility

2009-05-22 Thread Yonik Seeley
On Fri, May 22, 2009 at 1:22 PM, Michael McCandless wrote: > (That said, unrelated to this discussion, I would actually like to > record per-segment which version of Lucene wrote the segment; this > would be very helpful when debugging issues like LUCENE-1474 where I > need to know if the segments

Re: Lucene's default settings & back compatibility

2009-05-22 Thread Michael McCandless
On Fri, May 22, 2009 at 12:52 PM, Marvin Humphrey wrote: > >> when working on 3.1 if we make some great improvement, I'd like new users in >> 3.1 to see the improvement by default. > > Sounds like an argument for more frequent major releases. Yeah. Or "rebranding" what we now call minor as major

Re: Lucene's default settings & back compatibility

2009-05-22 Thread Earwin Burrfoot
> In KinoSearch SVN trunk, satellite classes like QueryParser and Highlighter > have to be passed a Schema, which contains all the Analyzers.  Analyzers > aren't satellite classes under this model -- they are a fixed property of a > FullTextType field spec.  Think of them as baked into an SQL field

Re: Lucene's default settings & back compatibility

2009-05-22 Thread Marvin Humphrey
On Fri, May 22, 2009 at 11:33:33AM -0400, Michael McCandless wrote: > when working on 3.1 if we make some great improvement, I'd like new users in > 3.1 to see the improvement by default. Sounds like an argument for more frequent major releases. But I'm not exactly one to talk. ;) > On think

Re: Lucene's default settings & back compatibility

2009-05-22 Thread Yonik Seeley
I'm not a lawyer, so I dislike trying to nail down every detail in writing and try to solve future problems in the abstract. Lucene has never really been 100% back compatible... we've just tried to keep it that way... it's more of a mindset than a reality, and I'm wary of changing that mindset too

Re: Lucene's default settings & back compatibility

2009-05-22 Thread Marvin Humphrey
On Fri, May 22, 2009 at 11:53:02AM -0400, Michael McCandless wrote: > 1. If we deprecate an API in the 2.1 release, we can remove it in > the next minor release (2.2). > > 2. JAR drop-in-ability is only guaranteed on point releases (2.4.1 > is a drop-in replacement to 2.4.0). When

Re: Lucene's default settings & back compatibility

2009-05-22 Thread Earwin Burrfoot
>  1. If we deprecate an API in the 2.1 release, we can remove it in >     the next minor release (2.2). Agree. Maybe also this? 1a. If deprecated functionality is trivially implemented with new one, we reserve the right to delete deprecated things right away with appropriate CHANGES note. Sample I

Re: Lucene's default settings & back compatibility

2009-05-22 Thread Michael McCandless
So, iterating on the proposed changes to back-compat policy: 1. If we deprecate an API in the 2.1 release, we can remove it in the next minor release (2.2). 2. JAR drop-in-ability is only guaranteed on point releases (2.4.1 is a drop-in replacement to 2.4.0). When switching to a ne

Re: Lucene's default settings & back compatibility

2009-05-22 Thread Michael McCandless
On Thu, May 21, 2009 at 6:53 PM, Marvin Humphrey wrote: > Lastly, I think a major java Lucene release is justified already. > Won't this discussion die down somewhat if you can get 3.0 out? Somewhat, yes, but then when working on 3.1 if we make some great improvement, I'd like new users in 3.1 t

Re: Lucene's default settings & back compatibility

2009-05-22 Thread Michael McCandless
OK it sounds like a single global actsAsVersion is too problematic. So how about, for cases where back compat default settings are important (analyzers, query scoring changes, etc.) we add actsAsVersion as a mandatory ctor argument to those classes (deprecating the other ctors)? We would do this

Re: svn commit: r777525 - /lucene/java/trunk/src/java/org/apache/lucene/util/AttributeSource.java

2009-05-22 Thread Michael McCandless
In general I agree, but in this case I think the check is warranted because it used to be "fine" (in 2.4) to pass null -- nothing bad would happen. But as of the new TokenStream API, you'll suddenly hit an NPE, so I think we should throw an informed exception so it's clear to users what used to be

Re: svn commit: r777525 - /lucene/java/trunk/src/java/org/apache/lucene/util/AttributeSource.java

2009-05-22 Thread Yonik Seeley
Why do stuff like this? Null params are almost never valid unless documented... I dislike cluttering up code with validity checks, slightly penalizing users who use the APIs correctly. I recognize that I may be in the minority though. But in this specific instance, the caller will get an immedia

[jira] Reopened: (LUCENE-1636) TokenFilters with a null value in the constructor fail

2009-05-22 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reopened LUCENE-1636: > TokenFilters with a null value in the constructor fail > ---

[jira] Resolved: (LUCENE-1636) TokenFilters with a null value in the constructor fail

2009-05-22 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1636. Resolution: Won't Fix > TokenFilters with a null value in the constructor fail > -

[jira] Commented: (LUCENE-1474) Incorrect SegmentInfo.delCount when IndexReader.flush() is used

2009-05-22 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712084#action_12712084 ] Michael McCandless commented on LUCENE-1474: Thanks Erik. Can you answer my o

[jira] Commented: (LUCENE-1313) Realtime Search

2009-05-22 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712082#action_12712082 ] Michael McCandless commented on LUCENE-1313: I think generally we are close.

Re: Lucene's default settings & back compatibility

2009-05-22 Thread Grant Ingersoll
Perhaps it is wise to take a step back before we play all of these "what if" games... I think the best way forward is to simply ask ourselves, when confronted with an actual issue, is what is the cost of back compat. for this issue and then address it on a case by case basis, with a bias

[jira] Commented: (LUCENE-1636) TokenFilters with a null value in the constructor fail

2009-05-22 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712076#action_12712076 ] Michael McCandless commented on LUCENE-1636: OK I'll add a null check in Attri

Re: Lucene's default settings & back compatibility

2009-05-22 Thread Matthew Hall
Earwin Burrfoot wrote: As I said, my app uses around ten indexes, which one should I use? :) Even more here, this would be a reasonably painful solution for us. Matt - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.

[jira] Updated: (LUCENE-1542) NearSpansUnordered.getPayload does not always return the correct payloads when terms are located at the same position

2009-05-22 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1542: --- Fix Version/s: 2.9 > NearSpansUnordered.getPayload does not always return the correc

[jira] Commented: (LUCENE-1636) TokenFilters with a null value in the constructor fail

2009-05-22 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712025#action_12712025 ] Uwe Schindler commented on LUCENE-1636: --- Oh, you already committed this :) > TokenF

[jira] Commented: (LUCENE-1636) TokenFilters with a null value in the constructor fail

2009-05-22 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712021#action_12712021 ] Uwe Schindler commented on LUCENE-1636: --- Thanks, I am still in Japan and had no time

Re: Lucene's default settings & back compatibility

2009-05-22 Thread Earwin Burrfoot
> A funny thought: we can give those methods/classes really stupid/nasty names, > to emphasize the beauty of the existing API, to encourage people to stick > with the better API :) I believe I've seen google using internally names like thisisbadbadbadInstanceMap. :) > One thing we didn't address

[jira] Commented: (LUCENE-1591) Enable bzip compression in benchmark

2009-05-22 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712020#action_12712020 ] Uwe Schindler commented on LUCENE-1591: --- Committed revision 777458. > Enable bzip c

[jira] Commented: (LUCENE-1591) Enable bzip compression in benchmark

2009-05-22 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712019#action_12712019 ] Uwe Schindler commented on LUCENE-1591: --- I replaced the dev version by 1.0 and it co

[jira] Commented: (LUCENE-1636) TokenFilters with a null value in the constructor fail

2009-05-22 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712008#action_12712008 ] Michael McCandless commented on LUCENE-1636: Good questions Uwe! I tested the

[jira] Commented: (LUCENE-1591) Enable bzip compression in benchmark

2009-05-22 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712002#action_12712002 ] Michael McCandless commented on LUCENE-1591: Excellent! Yes I think so? > En

[jira] Commented: (LUCENE-1591) Enable bzip compression in benchmark

2009-05-22 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711985#action_12711985 ] Uwe Schindler commented on LUCENE-1591: --- Commons-Compress 1.0 is now released, we sh