Re: who clears attributes?

2009-08-12 Thread Michael Busch
+1. We don't use Solr, but have quite a bunch of medium and short-sized documents. Plus heaps of metadata fields. I'm yet to read Uwe's example, but I feel I'm a bit misunderstood by Did you read it yet? What do you think about it? some of you. My gripe with new API is not that it

Re: who clears attributes?

2009-08-11 Thread Michael Busch
I'm not just responding to just you there, but more to the growing pack of those speaking against the new API. I don't see specific issues being brought up - the only issues I have seen brought up have been addressed in JIRA issues that have received no comments indicating the fix was not

Re: 2.5 versus 2.9, was Re: who clears attributes?

2009-08-11 Thread Michael McCandless
On Mon, Aug 10, 2009 at 9:12 PM, Grant Ingersollgsing...@apache.org wrote: Or... and this is one crazy idea... maybe we should simply release 3.0 next, not removing any deprecated APIs until 3.1 or later.  Ie, normal software on having so many major changes would release an X.0 release; I

Re: who clears attributes?

2009-08-11 Thread Robert Muir
On Tue, Aug 11, 2009 at 4:28 AM, Michael Buschbusch...@gmail.com wrote: There was a performance test in Solr that apparently ran much slower after upgrading to the new Lucene jar. This test is testing a rather uncommon scenario: very very short documents. Actually, its more uncommon than that:

Re: who clears attributes?

2009-08-11 Thread Yonik Seeley
On Tue, Aug 11, 2009 at 6:50 AM, Robert Muirrcm...@gmail.com wrote: On Tue, Aug 11, 2009 at 4:28 AM, Michael Buschbusch...@gmail.com wrote: There was a performance test in Solr that apparently ran much slower after upgrading to the new Lucene jar. This test is testing a rather uncommon

Re: who clears attributes?

2009-08-11 Thread Grant Ingersoll
On Aug 11, 2009, at 4:28 AM, Michael Busch wrote: I'm not just responding to just you there, but more to the growing pack of those speaking against the new API. I don't see specific issues being brought up - the only issues I have seen brought up have been addressed in JIRA issues that

Re: who clears attributes?

2009-08-11 Thread Earwin Burrfoot
On Tue, Aug 11, 2009 at 15:09, Yonik Seeleyyo...@lucidimagination.com wrote: On Tue, Aug 11, 2009 at 6:50 AM, Robert Muirrcm...@gmail.com wrote: On Tue, Aug 11, 2009 at 4:28 AM, Michael Buschbusch...@gmail.com wrote: There was a performance test in Solr that apparently ran much slower after

Re: who clears attributes?

2009-08-11 Thread Mark Miller
Earwin Burrfoot wrote: The only person that tried to disprove this claim is Uwe. Others either say the problems are solved, so it's okay to move to the new API, or this will be usable when flexindexing arrives. Others (not me) have spent a lot of time going over this before (more than once I

Re: who clears attributes?

2009-08-11 Thread Earwin Burrfoot
The only person that tried to disprove this claim is Uwe. Others either say the problems are solved, so it's okay to move to the new API, or this will be usable when flexindexing arrives. Others (not me) have spent a lot of time going over this before (more than once I think) - they prob are

Re: who clears attributes?

2009-08-11 Thread Mark Miller
Earwin Burrfoot wrote: The only person that tried to disprove this claim is Uwe. Others either say the problems are solved, so it's okay to move to the new API, or this will be usable when flexindexing arrives. Others (not me) have spent a lot of time going over this before (more than

Re: who clears attributes?

2009-08-11 Thread Michael McCandless
I think extensible analysis (the new TokenStream API) is a net positive: it gives us strongly typed and high performance extensibility to a Token, so apps can now add whatever attrs they want. And, I see it as the first (of 3) big legs that we need to reach flexible indexing. We really have to

Beta (was Re: who clears attributes?)

2009-08-11 Thread DM Smith
On 08/11/2009 08:22 AM, Michael McCandless wrote: I do still think a longish 2.9 beta is warranted, if we can succeed in getting users outside the dev group to kick the tires and uncover stuff. I think a beta would be a great idea. Not sure it needs to be longish. Having not looked at it,

Re: who clears attributes?

2009-08-11 Thread DM Smith
*From:* Shai Erera [mailto:ser...@gmail.com] *Sent:* Monday, August 10, 2009 11:13 PM *To:* java-dev@lucene.apache.org *Subject:* Re: who clears attributes? It sounds like the 'old' API should stay a bit longer than 3.0. We'd like to give more people a chance to experiment w

RE: Beta (was Re: who clears attributes?)

2009-08-11 Thread Steven A Rowe
branch release maintenance would be a new thing.) Steve -Original Message- From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] Sent: Tuesday, August 11, 2009 12:51 PM To: java-dev@lucene.apache.org Subject: Re: Beta (was Re: who clears attributes?) I thought 2.9 was on track

Re: who clears attributes?

2009-08-11 Thread Michael Busch
On 8/11/09 4:13 AM, Grant Ingersoll wrote: On Aug 11, 2009, at 4:28 AM, Michael Busch wrote: I'm not just responding to just you there, but more to the growing pack of those speaking against the new API. I don't see specific issues being brought up - the only issues I have seen brought up

Re: who clears attributes?

2009-08-11 Thread Grant Ingersoll
On Aug 11, 2009, at 3:21 PM, Michael Busch wrote: On 8/11/09 4:13 AM, Grant Ingersoll wrote: On Aug 11, 2009, at 4:28 AM, Michael Busch wrote: I'm not just responding to just you there, but more to the growing pack of those speaking against the new API. I don't see specific issues

RE: who clears attributes?

2009-08-11 Thread Uwe Schindler
@lucene.apache.org Subject: Re: who clears attributes? Uwe, Is this example available? I think that an example like this would help the user community see the current value in the change. At least, I'd love to see the code for it. -- DM On 08/10/2009 06:49 PM, Uwe Schindler wrote: UIMA The new API

who clears attributes?

2009-08-10 Thread Yonik Seeley
CharTokenizer.incrementToken() clears *all* attributes in the entire tokenizer chain. StandardTokenizer.incrementToken() clears only the term attribute. So... which is right? Seems like the tokenizer should be responsible? On a performance related note, CharTokenizer.clearAttribtes() could be

RE: who clears attributes?

2009-08-10 Thread Uwe Schindler
...@thetaphi.de -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Monday, August 10, 2009 6:01 PM To: java-dev@lucene.apache.org Subject: who clears attributes? CharTokenizer.incrementToken() clears *all* attributes in the entire

Re: who clears attributes?

2009-08-10 Thread Yonik Seeley
, or each tokenizer should read or each Tokenizer or TokenFilter On Mon, Aug 10, 2009 at 12:55 PM, Yonik Seeleyyo...@lucidimagination.com wrote: On Mon, Aug 10, 2009 at 12:44 PM, Uwe Schindleru...@thetaphi.de wrote: the CharTokenizer should only clear the TermAttribute, as it is only using

RE: who clears attributes?

2009-08-10 Thread Uwe Schindler
On Mon, Aug 10, 2009 at 12:44 PM, Uwe Schindleru...@thetaphi.de wrote: the CharTokenizer should only clear the TermAttribute, as it is only using this attribute. I changed this in the latest patch for https://issues.apache.org/jira/browse/LUCENE-1796 It's certainly not clear to me - is there

Re: who clears attributes?

2009-08-10 Thread Yonik Seeley
Thinking through this a little more, I don't see an alternative to the tokenizer clearing all attributes at the start of incrementToken(). Consider a DefaultPayloadTokenFilter that only sets a payload if one isn't already set - it's clear that this filter can't clear the payload attribute, so it

Re: who clears attributes?

2009-08-10 Thread Michael Busch
Clearing the attributes should be required in those places where we cleared (or reinit'ed) Token previously, right? Michael On 8/10/09 10:42 AM, Yonik Seeley wrote: Thinking through this a little more, I don't see an alternative to the tokenizer clearing all attributes at the start of

RE: who clears attributes?

2009-08-10 Thread Uwe Schindler
-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Monday, August 10, 2009 7:42 PM To: java-dev@lucene.apache.org Subject: Re: who clears attributes? Thinking

Re: who clears attributes?

2009-08-10 Thread Earwin Burrfoot
Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Monday, August 10, 2009 7:42 PM To: java-dev@lucene.apache.org Subject: Re: who clears

Re: who clears attributes?

2009-08-10 Thread Grant Ingersoll
On Aug 10, 2009, at 2:00 PM, Earwin Burrfoot wrote: I'll deviate from the topic somewhat. What are exact benefits that new tokenstream API yields? Are we sure we want it released with 2.9? By now I only see various elaborate problems, but haven't seen a single piece of code becoming simpler.

Re: who clears attributes?

2009-08-10 Thread Mark Miller
Grant Ingersoll wrote: On Aug 10, 2009, at 2:00 PM, Earwin Burrfoot wrote: 2.9 was _SUPPOSED_ to be a deprecation release, Whats a deprecation release? We deprecate stuff in every release ... does it make sense to do a release just to deprecate anything we might not have yet? And if you add

Re: who clears attributes?

2009-08-10 Thread Michael Busch
I think we should change the backwards-compatibility policy as proposed in LUCENE-1698 and remove some deprecated things (inlcuding the old TokenStream API, maybe query parser) in 3.1, not 3.0. I don't think we should have a 2.5 release - this clearly shows the disadvantages of our current

2.5 versus 2.9, was Re: who clears attributes?

2009-08-10 Thread Grant Ingersoll
On Aug 10, 2009, at 3:06 PM, Michael Busch wrote: I think we should change the backwards-compatibility policy as proposed in LUCENE-1698 and remove some deprecated things (inlcuding the old TokenStream API, maybe query parser) in 3.1, not 3.0. Maybe. I'm not convinced yet that the

Re: who clears attributes?

2009-08-10 Thread Mark Miller
Michael Busch wrote: I think we should change the backwards-compatibility policy as proposed in LUCENE-1698 and remove some deprecated things (inlcuding the old TokenStream API, maybe query parser) in 3.1, not 3.0. I don't think we should have a 2.5 release - this clearly shows the

Re: 2.5 versus 2.9, was Re: who clears attributes?

2009-08-10 Thread Michael Busch
You didn't really comment on my proposal: I suggested to not remove the old Token API and old queryparser in 3.0. Instead with 3.0 change the bw-policy, so that we can remove deprecated things in minor releases (e.g. 3.1 in this case). I think your 2.5 proposal has drawbacks: if we release

Re: who clears attributes?

2009-08-10 Thread Earwin Burrfoot
On Mon, Aug 10, 2009 at 22:50, Grant Ingersollgsing...@apache.org wrote: On Aug 10, 2009, at 2:00 PM, Earwin Burrfoot wrote: I'll deviate from the topic somewhat. What are exact benefits that new tokenstream API yields? Are we sure we want it released with 2.9? By now I only see various

RE: who clears attributes?

2009-08-10 Thread Uwe Schindler
Hi Grant, I have serious doubts about releasing this new API until these performance issues are resolved and better proven out from a usability standpoint. I think LUCENE-1796 has fixed the performance problems, which was caused by a missing reflection-cache needed for bw compatibility. I

Re: who clears attributes?

2009-08-10 Thread Michael Busch
On 8/10/09 12:52 PM, Uwe Schindler wrote: Michael: The TokenWrapper added cost was there in 2.9 before the TokenStream overhaul, too, as the TokenWrapper-like code was there implemented similarily inside DocInverter. You're right. It will only be more costly in case you mix multiple old

RE: who clears attributes?

2009-08-10 Thread Uwe Schindler
...@gmail.com] Sent: Monday, August 10, 2009 9:58 PM To: java-dev@lucene.apache.org Subject: Re: who clears attributes? On 8/10/09 12:52 PM, Uwe Schindler wrote: Michael: The TokenWrapper added cost was there in 2.9 before the TokenStream overhaul, too, as the TokenWrapper-like code

RE: who clears attributes?

2009-08-10 Thread Uwe Schindler
Busch [mailto:busch...@gmail.com] Sent: Monday, August 10, 2009 10:09 PM To: java-dev@lucene.apache.org Subject: Re: who clears attributes? On 8/10/09 1:02 PM, Uwe Schindler wrote: If both filters would only implement new API there would be direct calls from the filter to the input

Re: 2.5 versus 2.9, was Re: who clears attributes?

2009-08-10 Thread Grant Ingersoll
On Aug 10, 2009, at 3:36 PM, Michael Busch wrote: You didn't really comment on my proposal: I suggested to not remove the old Token API and old queryparser in 3.0. Instead with 3.0 change the bw-policy, so that we can remove deprecated things in minor releases (e.g. 3.1 in this case).

Re: who clears attributes?

2009-08-10 Thread Michael Busch
On 8/10/09 1:02 PM, Uwe Schindler wrote: If both filters would only implement new API there would be direct calls from the filter to the input TokenStream. If all streams/filters would implement only the old API, the bw-delegation would only be used for the incrementToken() calls from

Re: 2.5 versus 2.9, was Re: who clears attributes?

2009-08-10 Thread Michael Busch
On 8/10/09 1:30 PM, Grant Ingersoll wrote: I think your 2.5 proposal has drawbacks: if we release 2.5 now to test the new major features in the field, then do you want to stop adding new features to trunk until we release 2.9 to not have the same situation then again? How long should this

Re: who clears attributes?

2009-08-10 Thread Grant Ingersoll
On Aug 10, 2009, at 3:52 PM, Uwe Schindler wrote: Hi Grant, I have serious doubts about releasing this new API until these performance issues are resolved and better proven out from a usability standpoint. I think LUCENE-1796 has fixed the performance problems, which was caused by a

RE: who clears attributes?

2009-08-10 Thread Uwe Schindler
I have serious doubts about releasing this new API until these performance issues are resolved and better proven out from a usability standpoint. I think LUCENE-1796 has fixed the performance problems, which was caused by a missing reflection-cache needed for bw compatibility. I

Re: 2.5 versus 2.9, was Re: who clears attributes?

2009-08-10 Thread Earwin Burrfoot
On Tue, Aug 11, 2009 at 00:37, Michael Buschbusch...@gmail.com wrote: On 8/10/09 1:30 PM, Grant Ingersoll wrote: I think your 2.5 proposal has drawbacks: if we release 2.5 now to test the new major features in the field, then do you want to stop adding new features to trunk until we release

Re: 2.5 versus 2.9, was Re: who clears attributes?

2009-08-10 Thread Michael McCandless
I do agree 2.9 has tons of changes: new analysis API, segment-based searching/collection/sorting, new QP, etc. One option might be to have a looong beta period for 2.9, and focus on testing/docs? Or... and this is one crazy idea... maybe we should simply release 3.0 next, not removing any

Re: who clears attributes?

2009-08-10 Thread Earwin Burrfoot
On Tue, Aug 11, 2009 at 00:54, Uwe Schindleru...@thetaphi.de wrote: I have serious doubts about releasing this new API until these performance issues are resolved and better proven out from a usability standpoint. I think LUCENE-1796 has fixed the performance problems, which was

Re: who clears attributes?

2009-08-10 Thread Shai Erera
It sounds like the 'old' API should stay a bit longer than 3.0. We'd like to give more people a chance to experiment w/ the new API before we claim it is the new Analysis API in Lucene. And that means that more users will have to live w/ the bit of slowness more than what is believed in this

Re: 2.5 versus 2.9, was Re: who clears attributes?

2009-08-10 Thread Shai Erera
Does this mean we still move to Java 5 in 3.0? If so, +1 from me too. On Tue, Aug 11, 2009 at 12:06 AM, Mark Miller markrmil...@gmail.com wrote: Michael McCandless wrote: Or... and this is one crazy idea... maybe we should simply release 3.0 next, not removing any deprecated APIs until 3.1

Re: 2.5 versus 2.9, was Re: who clears attributes?

2009-08-10 Thread Mark Miller
You'll sell your vote for pork? :) If by some miracle we went with this, with so many back compat issues with this update, I don't see why we wouldn't throw Java 1.5 in as well. That just complicates things here though. I'd save that discussion. Shai Erera wrote: Does this mean we still move

Re: who clears attributes?

2009-08-10 Thread Grant Ingersoll
On Aug 10, 2009, at 5:12 PM, Shai Erera wrote: Maybe we should follow what I seem to read from Earwin and Grant - come up w/ real use cases, try to implement them w/ the current API, then if it's impossible, discuss how we can make the current API more adaptive. If at the end of this

Re: who clears attributes?

2009-08-10 Thread Michael Busch
On 8/10/09 3:19 PM, Grant Ingersoll wrote: Oh, and now it seems the new QP is dependent on it all. The new QP uses Attributes for config settings, but doesn't require the TokenStream to be an AttributeSource. - To

Re: who clears attributes?

2009-08-10 Thread Mark Miller
Grant Ingersoll wrote: On Aug 10, 2009, at 5:12 PM, Shai Erera wrote: Maybe we should follow what I seem to read from Earwin and Grant - come up w/ real use cases, try to implement them w/ the current API, then if it's impossible, discuss how we can make the current API more adaptive. If

Re: who clears attributes?

2009-08-10 Thread Earwin Burrfoot
Well, I have real use cases for it, but all of it is still missing the biggest piece:  search side support.  It's the 900 lb. elephant in the room.   The 500 lb. elephant is the fact that all these attributes, AIUI, require you to hook in your own indexing chain, etc. in order to even be

Re: 2.5 versus 2.9, was Re: who clears attributes?

2009-08-10 Thread Michael Busch
On 8/10/09 2:05 PM, Michael McCandless wrote: Or... and this is one crazy idea... maybe we should simply release 3.0 next, not removing any deprecated APIs until 3.1 or later. Ie, normal software on having so many major changes would release an X.0 release; I agree the deprecation release is

RE: who clears attributes?

2009-08-10 Thread Uwe Schindler
Erera [mailto:ser...@gmail.com] Sent: Monday, August 10, 2009 11:13 PM To: java-dev@lucene.apache.org Subject: Re: who clears attributes? It sounds like the 'old' API should stay a bit longer than 3.0. We'd like to give more people a chance to experiment w/ the new API before we claim

Re: 2.5 versus 2.9, was Re: who clears attributes?

2009-08-10 Thread Grant Ingersoll
On Aug 10, 2009, at 18:48, Michael Busch busch...@gmail.com wrote: On 8/10/09 2:05 PM, Michael McCandless wrote: Or... and this is one crazy idea... maybe we should simply release 3.0 next, not removing any deprecated APIs until 3.1 or later. Ie, normal software on having so many major

Re: who clears attributes?

2009-08-10 Thread Grant Ingersoll
On Aug 10, 2009, at 6:28 PM, Mark Miller wrote: Grant Ingersoll wrote: On Aug 10, 2009, at 5:12 PM, Shai Erera wrote: Maybe we should follow what I seem to read from Earwin and Grant - come up w/ real use cases, try to implement them w/ the current API, then if it's impossible, discuss