[jira] Updated: (LUCENE-1749) FieldCache introspection API

2009-08-10 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated LUCENE-1749: - Attachment: LUCENE-1749.patch slightly revised patch based on java-...@lucene discussion... the sortFie

Re: [jira] Commented: (LUCENE-1771) Using explain may double ram reqs for fieldcaches when using ValueSourceQuery/CustomScoreQuery or for ConstantScoreQuerys that use a caching Filter.

2009-08-10 Thread Chris Hostetter
: Hoss Man uses Chris Hostetter in Changes? Weak. I'll update it before committing. blame Hatcher, He started it... http://svn.apache.org/viewvc/lucene/java/trunk/CHANGES.txt?r1=150654&r2=150658 Once i became a committer, I just followed the only rule of CHANGES.txt: "Maintain Convention" (A

[jira] Created: (LUCENE-1799) Unicode compression

2009-08-10 Thread DM Smith (JIRA)
Unicode compression --- Key: LUCENE-1799 URL: https://issues.apache.org/jira/browse/LUCENE-1799 Project: Lucene - Java Issue Type: New Feature Components: Store Affects Versions: 2.4.1 Reporter: DM S

Re: who clears attributes?

2009-08-10 Thread Mark Miller
Grant Ingersoll wrote: On Aug 10, 2009, at 6:28 PM, Mark Miller wrote: Grant Ingersoll wrote: On Aug 10, 2009, at 5:12 PM, Shai Erera wrote: Maybe we should follow what I seem to read from Earwin and Grant - come up w/ real use cases, try to implement them w/ the current API, then if it'

Re: who clears attributes?

2009-08-10 Thread Grant Ingersoll
On Aug 10, 2009, at 6:28 PM, Mark Miller wrote: Grant Ingersoll wrote: On Aug 10, 2009, at 5:12 PM, Shai Erera wrote: Maybe we should follow what I seem to read from Earwin and Grant - come up w/ real use cases, try to implement them w/ the current API, then if it's impossible, discuss

[jira] Issue Comment Edited: (LUCENE-1771) Using explain may double ram reqs for fieldcaches when using ValueSourceQuery/CustomScoreQuery or for ConstantScoreQuerys that use a caching Filter.

2009-08-10 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741657#action_12741657 ] Mark Miller edited comment on LUCENE-1771 at 8/10/09 6:29 PM: --

[jira] Commented: (LUCENE-1771) Using explain may double ram reqs for fieldcaches when using ValueSourceQuery/CustomScoreQuery or for ConstantScoreQuerys that use a caching Filter.

2009-08-10 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741657#action_12741657 ] Mark Miller commented on LUCENE-1771: - I'm going to give the one another day or two an

[jira] Commented: (LUCENE-1771) Using explain may double ram reqs for fieldcaches when using ValueSourceQuery/CustomScoreQuery or for ConstantScoreQuerys that use a caching Filter.

2009-08-10 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741656#action_12741656 ] Mark Miller commented on LUCENE-1771: - Hoss Man uses Chris Hostetter in Changes? Weak.

[jira] Updated: (LUCENE-1771) Using explain may double ram reqs for fieldcaches when using ValueSourceQuery/CustomScoreQuery or for ConstantScoreQuerys that use a caching Filter.

2009-08-10 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-1771: Attachment: LUCENE-1771.patch updates to apply to trunk and adds a stab at reconciling Changes >

Re: 2.5 versus 2.9, was Re: who clears attributes?

2009-08-10 Thread Grant Ingersoll
On Aug 10, 2009, at 18:48, Michael Busch wrote: On 8/10/09 2:05 PM, Michael McCandless wrote: Or... and this is one crazy idea... maybe we should simply release 3.0 next, not removing any deprecated APIs until 3.1 or later. Ie, "normal" software on having so many major changes would rel

[jira] Commented: (LUCENE-1760) TokenStream API javadoc improvements

2009-08-10 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741648#action_12741648 ] Michael Busch commented on LUCENE-1760: --- Thanks, Mark. > TokenStream API javadoc im

[jira] Closed: (LUCENE-1308) Remove String.intern() from Field.java to increase performance and lower contention

2009-08-10 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller closed LUCENE-1308. --- Resolution: Duplicate > Remove String.intern() from Field.java to increase performance and lower >

[jira] Closed: (LUCENE-1439) Inconsistent API

2009-08-10 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller closed LUCENE-1439. --- Resolution: Won't Fix > Inconsistent API > - > > Key: LUCENE-1439 >

Re: pieces missing in reusable analyzers?

2009-08-10 Thread Yonik Seeley
On Mon, Aug 10, 2009 at 6:56 PM, Uwe Schindler wrote: >> Then how do you notify the other filters that they should reset their >> state? >> TokenStream.reset()?  The javadoc specifies that it's actually used >> for something else - but perhaps it can be reused for this purpose? > > TokenStream.rese

[jira] Commented: (LUCENE-1087) MultiSearcher.explain returns incorrect score/explanation relating to docFreq

2009-08-10 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741636#action_12741636 ] Mark Miller commented on LUCENE-1087: - Not sure how to fix this, but I think LUCENE-17

[jira] Updated: (LUCENE-1359) FrenchAnalyzer's tokenStream method does not honour the contract of Analyzer

2009-08-10 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-1359: Priority: Minor (was: Major) Depends on how you read things - it must be able to handle null for

[jira] Updated: (LUCENE-1224) NGramTokenFilter creates bad TokenStream

2009-08-10 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-1224: Priority: Minor (was: Critical) Fix Version/s: 3.1 Sounds like this should really be add

[jira] Resolved: (LUCENE-1760) TokenStream API javadoc improvements

2009-08-10 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller resolved LUCENE-1760. - Resolution: Fixed I took care of it. > TokenStream API javadoc improvements > -

[jira] Resolved: (LUCENE-1628) Persian Analyzer

2009-08-10 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-1628. - Resolution: Fixed Committed revision 802955. > Persian Analyzer > > >

Re: who clears attributes?

2009-08-10 Thread Mark Miller
Right - this API is not required, even for Flexible indexing how its appeared it will emerge. I think its just there to help. Originally, I think the idea was to reduce how much casting was going to be needed. Also, a given chain will be more easily able to just deal with just the attributes th

RE: pieces missing in reusable analyzers?

2009-08-10 Thread Uwe Schindler
> Then how do you notify the other filters that they should reset their > state? > TokenStream.reset()? The javadoc specifies that it's actually used > for something else - but perhaps it can be reused for this purpose? TokenStream.reset() is always called before the first incrementToken call by

Re: pieces missing in reusable analyzers?

2009-08-10 Thread Earwin Burrfoot
>> I'm just keeping a reference to Tokenizer, so I can reset it with a >> new reader. Though this situation is awkward, TS definetly does not >> need a reset(Reader). > > Then how do you notify the other filters that they should reset their state? > TokenStream.reset()?  The javadoc specifies that

RE: who clears attributes?

2009-08-10 Thread Uwe Schindler
> UIMA The new API looks like UIMA, you have streams that are attributed with various attributes that can be exchanged between TokenStreams/TokenFilters. Just like the current FlagsAttribute or TypeAttribute, that can easily misused for such things. About a real use case for the new API:

[jira] Issue Comment Edited: (LUCENE-1794) implement reusableTokenStream for all contrib analyzers

2009-08-10 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741601#action_12741601 ] Robert Muir edited comment on LUCENE-1794 at 8/10/09 3:46 PM: --

Re: 2.5 versus 2.9, was Re: who clears attributes?

2009-08-10 Thread Michael Busch
On 8/10/09 2:05 PM, Michael McCandless wrote: Or... and this is one crazy idea... maybe we should simply release 3.0 next, not removing any deprecated APIs until 3.1 or later. Ie, "normal" software on having so many major changes would release an X.0 release; I agree the "deprecation release" is

Re: who clears attributes?

2009-08-10 Thread Earwin Burrfoot
> Well, I have real use cases for it, but all of it is still missing the > biggest piece:  search side support.  It's the 900 lb. elephant in the room. >   The 500 lb. elephant is the fact that all these attributes, AIUI, require > you to hook in your own indexing chain, etc. in order to even be in

[jira] Commented: (LUCENE-1794) implement reusableTokenStream for all contrib analyzers

2009-08-10 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741601#action_12741601 ] Robert Muir commented on LUCENE-1794: - I am thinking of expanding this patch to includ

Re: pieces missing in reusable analyzers?

2009-08-10 Thread Robert Muir
> Then how do you notify the other filters that they should reset their state? > TokenStream.reset()? The javadoc specifies that it's actually used > for something else - but perhaps it can be reused for this purpose? Yonik, I did exactly this with several in lucene contrib. For these i had to ex

Re: pieces missing in reusable analyzers?

2009-08-10 Thread Yonik Seeley
On Mon, Aug 10, 2009 at 6:21 PM, Earwin Burrfoot wrote: > I'm just keeping a reference to Tokenizer, so I can reset it with a > new reader. Though this situation is awkward, TS definetly does not > need a reset(Reader). Then how do you notify the other filters that they should reset their state? T

Re: who clears attributes?

2009-08-10 Thread Mark Miller
Grant Ingersoll wrote: On Aug 10, 2009, at 5:12 PM, Shai Erera wrote: Maybe we should follow what I seem to read from Earwin and Grant - come up w/ real use cases, try to implement them w/ the current API, then if it's impossible, discuss how we can make the current API more adaptive. If at

Re: who clears attributes?

2009-08-10 Thread Michael Busch
On 8/10/09 3:19 PM, Grant Ingersoll wrote: Oh, and now it seems the new QP is dependent on it all. The new QP uses Attributes for config settings, but doesn't require the TokenStream to be an AttributeSource. - To unsubscrib

Re: pieces missing in reusable analyzers?

2009-08-10 Thread Robert Muir
Also, FYI, if you are testing this with Solr or whatever, I want to warn you that also inside LUCENE-1794 is impls of reset(Reader) and reset() for tokenizers and filters that did not have it before (i.e. CJK). So it is not enough to reuse in the analyzer, its streams that keep state really need t

Re: pieces missing in reusable analyzers?

2009-08-10 Thread Earwin Burrfoot
> I had thought that implementing reusable analyzers in solr was going > to be cake... but either I'm missing something, or Lucene is missing > something. > > Here's the way that one used to create custom analyzers: > > class CustomAnalyzer extends Analyzer { >  public TokenStream tokenStream(Strin

[jira] Commented: (LUCENE-1796) Speed up repeated TokenStream init

2009-08-10 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741595#action_12741595 ] Michael Busch commented on LUCENE-1796: --- I think Token.reset() wasn't called before

Re: who clears attributes?

2009-08-10 Thread Grant Ingersoll
On Aug 10, 2009, at 5:12 PM, Shai Erera wrote: Maybe we should follow what I seem to read from Earwin and Grant - come up w/ real use cases, try to implement them w/ the current API, then if it's impossible, discuss how we can make the current API more adaptive. If at the end of this we'l

RE: pieces missing in reusable analyzers?

2009-08-10 Thread Uwe Schindler
You have to reuse the TokenStream and also its root Tokenizer to get access to the Reader. This is what the latest patch of Robert does with this helper class. Implementing reset(Reader) in TokenStream is somehow - wrong. There may be TokenStreams that have no Readers at all (NumericTokenStream).

[jira] Commented: (LUCENE-1796) Speed up repeated TokenStream init

2009-08-10 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741594#action_12741594 ] Uwe Schindler commented on LUCENE-1796: --- We had no conclusion on this. I think we sh

Re: pieces missing in reusable analyzers?

2009-08-10 Thread Robert Muir
you can only call reset(Reader) on a Tokenizer, not any TokenStream. this is why there is the SavedStreams mess in Standard/Stop core analyzers and in every analyzer in LUCENE-1794... On Mon, Aug 10, 2009 at 6:10 PM, Yonik Seeley wrote: > I had thought that implementing reusable analyzers in solr

[jira] Commented: (LUCENE-1796) Speed up repeated TokenStream init

2009-08-10 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741590#action_12741590 ] Yonik Seeley commented on LUCENE-1796: -- bq. (I only removed the clearAttributes() cal

pieces missing in reusable analyzers?

2009-08-10 Thread Yonik Seeley
I had thought that implementing reusable analyzers in solr was going to be cake... but either I'm missing something, or Lucene is missing something. Here's the way that one used to create custom analyzers: class CustomAnalyzer extends Analyzer { public TokenStream tokenStream(String fieldName,

[jira] Commented: (LUCENE-1796) Speed up repeated TokenStream init

2009-08-10 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741571#action_12741571 ] Michael Busch commented on LUCENE-1796: --- {quote} I think I commit this now and leave

[jira] Closed: (LUCENE-1796) Speed up repeated TokenStream init

2009-08-10 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler closed LUCENE-1796. - Resolution: Fixed Committed revision: 802930 (I only removed the clearAttributes() call again, w

[jira] Commented: (LUCENE-1796) Speed up repeated TokenStream init

2009-08-10 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741567#action_12741567 ] Uwe Schindler commented on LUCENE-1796: --- The shorter the text, the more the construc

[jira] Commented: (LUCENE-1796) Speed up repeated TokenStream init

2009-08-10 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741564#action_12741564 ] Robert Muir commented on LUCENE-1796: - I just want to say I think that 10% test case m

Re: 2.5 versus 2.9, was Re: who clears attributes?

2009-08-10 Thread Mark Miller
You'll sell your vote for pork? :) If by some miracle we went with this, with so many back compat issues with this update, I don't see why we wouldn't throw Java 1.5 in as well. That just complicates things here though. I'd save that discussion. Shai Erera wrote: Does this mean we still move

Re: 2.5 versus 2.9, was Re: who clears attributes?

2009-08-10 Thread Shai Erera
Does this mean we still move to Java 5 in 3.0? If so, +1 from me too. On Tue, Aug 11, 2009 at 12:06 AM, Mark Miller wrote: > Michael McCandless wrote: > >> Or... and this is one crazy idea... maybe we should simply release 3.0 >> next, not removing any deprecated APIs until 3.1 or later. Ie, >>

Re: who clears attributes?

2009-08-10 Thread Shai Erera
It sounds like the 'old' API should stay a bit longer than 3.0. We'd like to give more people a chance to experiment w/ the new API before we claim it is the new Analysis API in Lucene. And that means that more users will have to live w/ the "bit of slowness" more than what is believed in this thre

Re: who clears attributes?

2009-08-10 Thread Earwin Burrfoot
On Tue, Aug 11, 2009 at 00:54, Uwe Schindler wrote: >> >> I have serious doubts about releasing this new API until these >> >> performance issues are resolved and better proven out from a >> >> usability >> >> standpoint. >> > >> > I think LUCENE-1796 has fixed the performance problems, which was >

Re: 2.5 versus 2.9, was Re: who clears attributes?

2009-08-10 Thread Mark Miller
Michael McCandless wrote: Or... and this is one crazy idea... maybe we should simply release 3.0 next, not removing any deprecated APIs until 3.1 or later. Ie, "normal" software on having so many major changes would release an X.0 release; I agree the "deprecation release" is unusual. This way

Re: 2.5 versus 2.9, was Re: who clears attributes?

2009-08-10 Thread Michael McCandless
I do agree 2.9 has tons of changes: new analysis API, segment-based searching/collection/sorting, new QP, etc. One option might be to have a looong beta period for 2.9, and focus on testing/docs? Or... and this is one crazy idea... maybe we should simply release 3.0 next, not removing any depreca

[jira] Issue Comment Edited: (LUCENE-1796) Speed up repeated TokenStream init

2009-08-10 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741545#action_12741545 ] Mark Miller edited comment on LUCENE-1796 at 8/10/09 2:02 PM: --

[jira] Updated: (LUCENE-1796) Speed up repeated TokenStream init

2009-08-10 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-1796: Attachment: afterAndLucene1796.png after.png before.png > Speed up

[jira] Commented: (LUCENE-1796) Speed up repeated TokenStream init

2009-08-10 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741545#action_12741545 ] Mark Miller commented on LUCENE-1796: - Just to complete my report: The tests I report

Re: 2.5 versus 2.9, was Re: who clears attributes?

2009-08-10 Thread Earwin Burrfoot
On Tue, Aug 11, 2009 at 00:37, Michael Busch wrote: > On 8/10/09 1:30 PM, Grant Ingersoll wrote: >> >>> >>> I think your 2.5 proposal has drawbacks: if we release 2.5 now to test >>> the new major features in the field, then do you want to stop adding new >>> features to trunk until we release 2.9

RE: who clears attributes?

2009-08-10 Thread Uwe Schindler
> >> I have serious doubts about releasing this new API until these > >> performance issues are resolved and better proven out from a > >> usability > >> standpoint. > > > > I think LUCENE-1796 has fixed the performance problems, which was > > caused by > > a missing reflection-cache needed for bw

[jira] Updated: (LUCENE-1796) Speed up repeated TokenStream init

2009-08-10 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1796: -- Attachment: LUCENE-1796.patch OK. Last patch, I only added a test in TestAttributeSource that

Re: who clears attributes?

2009-08-10 Thread Grant Ingersoll
On Aug 10, 2009, at 3:52 PM, Uwe Schindler wrote: Hi Grant, I have serious doubts about releasing this new API until these performance issues are resolved and better proven out from a usability standpoint. I think LUCENE-1796 has fixed the performance problems, which was caused by a mi

Re: 2.5 versus 2.9, was Re: who clears attributes?

2009-08-10 Thread Michael Busch
On 8/10/09 1:30 PM, Grant Ingersoll wrote: I think your 2.5 proposal has drawbacks: if we release 2.5 now to test the new major features in the field, then do you want to stop adding new features to trunk until we release 2.9 to not have the same situation then again? How long should this t

[jira] Commented: (LUCENE-1796) Speed up repeated TokenStream init

2009-08-10 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741532#action_12741532 ] Mark Miller commented on LUCENE-1796: - And mine was a misreport - sorry - a wine progr

Re: 2.5 versus 2.9, was Re: who clears attributes?

2009-08-10 Thread Grant Ingersoll
On Aug 10, 2009, at 3:36 PM, Michael Busch wrote: You didn't really comment on my proposal: I suggested to not remove the old Token API and old queryparser in 3.0. Instead with 3.0 change the bw-policy, so that we can remove deprecated things in minor releases (e.g. 3.1 in this case). Wh

RE: who clears attributes?

2009-08-10 Thread Uwe Schindler
I think, they are optimized away by the JRE... The figure from Mark does not have TokenWrapper hot spots in it, only TokenWrapper.termLength() is mentioned, but this is because Token.termLength() is often called and takes the same time (so the TokenWrapper time is equal to the inner Token call).

[jira] Commented: (LUCENE-1796) Speed up repeated TokenStream init

2009-08-10 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741525#action_12741525 ] Robert Muir commented on LUCENE-1796: - uwe in my case the latest patch performs approx

[jira] Commented: (LUCENE-1796) Speed up repeated TokenStream init

2009-08-10 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741524#action_12741524 ] Mark Miller commented on LUCENE-1796: - I was getting 46-47 with both of the first two

[jira] Commented: (LUCENE-1796) Speed up repeated TokenStream init

2009-08-10 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741523#action_12741523 ] Uwe Schindler commented on LUCENE-1796: --- Hm, and with the termAtt.clear() instead of

Re: who clears attributes?

2009-08-10 Thread Michael Busch
On 8/10/09 1:02 PM, Uwe Schindler wrote: If both filters would only implement new API there would be direct calls from the filter to the input TokenStream. If all streams/filters would implement only the old API, the bw-delegation would only be used for the incrementToken() calls from DocInverter

RE: who clears attributes?

2009-08-10 Thread Uwe Schindler
But TokenWrapper is used there every time, it is not used for delegating, only for exchanging the inner Token instance. The delegation cost are there because a Filter implementing the old-API in front of a new-API-Tokenizer would need to be wrapped 2 times: DocInverter -> oldAPIFilter.incrementTok

[jira] Commented: (LUCENE-1796) Speed up repeated TokenStream init

2009-08-10 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741520#action_12741520 ] Mark Miller commented on LUCENE-1796: - The latest patch appears to hurt the Solr use c

Re: who clears attributes?

2009-08-10 Thread Michael Busch
On 8/10/09 12:52 PM, Uwe Schindler wrote: Michael: The TokenWrapper added cost was there in 2.9 before the TokenStream overhaul, too, as the TokenWrapper-like code was there implemented similarily inside DocInverter. You're right. It will only be more costly in case you mix multiple old a

[jira] Commented: (LUCENE-1796) Speed up repeated TokenStream init

2009-08-10 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741514#action_12741514 ] Mark Miller commented on LUCENE-1796: - Nice work Uwe! > Speed up repeated TokenStream

RE: who clears attributes?

2009-08-10 Thread Uwe Schindler
Hi Grant, > I have serious doubts about releasing this new API until these > performance issues are resolved and better proven out from a usability > standpoint. I think LUCENE-1796 has fixed the performance problems, which was caused by a missing reflection-cache needed for bw compatibility. I h

[jira] Updated: (LUCENE-1796) Speed up repeated TokenStream init

2009-08-10 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1796: -- Attachment: LUCENE-1796.patch New patch that optimizes the iteration over the AttributeImpls u

Re: who clears attributes?

2009-08-10 Thread Earwin Burrfoot
On Mon, Aug 10, 2009 at 22:50, Grant Ingersoll wrote: > > On Aug 10, 2009, at 2:00 PM, Earwin Burrfoot wrote: > >> I'll deviate from the topic somewhat. >> What are exact benefits that new tokenstream API yields? Are we sure >> we want it released with 2.9? >> By now I only see various elaborate pr

Re: 2.5 versus 2.9, was Re: who clears attributes?

2009-08-10 Thread Michael Busch
You didn't really comment on my proposal: I suggested to not remove the old Token API and old queryparser in 3.0. Instead with 3.0 change the bw-policy, so that we can remove deprecated things in minor releases (e.g. 3.1 in this case). I think your 2.5 proposal has drawbacks: if we release 2.5

Re: who clears attributes?

2009-08-10 Thread Mark Miller
Michael Busch wrote: I think we should change the backwards-compatibility policy as proposed in LUCENE-1698 and remove some deprecated things (inlcuding the old TokenStream API, maybe query parser) in 3.1, not 3.0. I don't think we should have a 2.5 release - this clearly shows the disadvantage

2.5 versus 2.9, was Re: who clears attributes?

2009-08-10 Thread Grant Ingersoll
On Aug 10, 2009, at 3:06 PM, Michael Busch wrote: I think we should change the backwards-compatibility policy as proposed in LUCENE-1698 and remove some deprecated things (inlcuding the old TokenStream API, maybe query parser) in 3.1, not 3.0. Maybe. I'm not convinced yet that the current

Re: who clears attributes?

2009-08-10 Thread Michael Busch
I think we should change the backwards-compatibility policy as proposed in LUCENE-1698 and remove some deprecated things (inlcuding the old TokenStream API, maybe query parser) in 3.1, not 3.0. I don't think we should have a 2.5 release - this clearly shows the disadvantages of our current bw-po

Re: who clears attributes?

2009-08-10 Thread Mark Miller
Grant Ingersoll wrote: On Aug 10, 2009, at 2:00 PM, Earwin Burrfoot wrote: 2.9 was _SUPPOSED_ to be a deprecation release, Whats a deprecation release? We deprecate stuff in every release ... does it make sense to do a release just to deprecate anything we might not have yet? And if you add

Re: who clears attributes?

2009-08-10 Thread Grant Ingersoll
On Aug 10, 2009, at 2:00 PM, Earwin Burrfoot wrote: I'll deviate from the topic somewhat. What are exact benefits that new tokenstream API yields? Are we sure we want it released with 2.9? By now I only see various elaborate problems, but haven't seen a single piece of code becoming simpler.

[jira] Resolved: (LUCENE-1797) new QueryParser over-increment position for MultiPhraseQuery

2009-08-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1797. Resolution: Fixed > new QueryParser over-increment position for MultiPhraseQuery >

[jira] Commented: (LUCENE-1789) getDocValues should provide a MultiReader DocValues abstraction

2009-08-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741484#action_12741484 ] Michael McCandless commented on LUCENE-1789: bq. What if we just added a new M

[jira] Commented: (LUCENE-1798) FieldCacheSanityChecker called directly by FieldCache.get*

2009-08-10 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741480#action_12741480 ] Hoss Man commented on LUCENE-1798: -- https://issues.apache.org/jira/browse/LUCENE-1749?foc

[jira] Commented: (LUCENE-1749) FieldCache introspection API

2009-08-10 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741479#action_12741479 ] Hoss Man commented on LUCENE-1749: -- bq. Maybe we should simply print a warning, eg to Sys

Re: Sorting cleanup and FieldCacheImpl.Entry confusion

2009-08-10 Thread Michael McCandless
On Mon, Aug 10, 2009 at 1:57 PM, Chris Hostetter wrote: > : I don't know why Entry has "int type" and "String locale", either.  I > : agree it'd be cleaner for FieldSortedHitQueue to store these on its > : own, privately. > : > : Note that FieldSortedHitQueue is deprecated in favor of > : FieldValu

[jira] Resolved: (LUCENE-1784) Make BooleanWeight and DisjunctionMaxWeight protected

2009-08-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1784. Resolution: Fixed Fix Version/s: 2.9 Thanks Tim! > Make BooleanWeight and

[jira] Created: (LUCENE-1798) FieldCacheSanityChecker called directly by FieldCache.get*

2009-08-10 Thread Hoss Man (JIRA)
FieldCacheSanityChecker called directly by FieldCache.get* -- Key: LUCENE-1798 URL: https://issues.apache.org/jira/browse/LUCENE-1798 Project: Lucene - Java Issue Type: Improvement

[jira] Updated: (LUCENE-1797) new QueryParser over-increment position for MultiPhraseQuery

2009-08-10 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1797: --- Attachment: LUCENE-1797.patch Attached patch. I plan to commit soon. > new QueryPa

[jira] Created: (LUCENE-1797) new QueryParser over-increment position for MultiPhraseQuery

2009-08-10 Thread Michael McCandless (JIRA)
new QueryParser over-increment position for MultiPhraseQuery Key: LUCENE-1797 URL: https://issues.apache.org/jira/browse/LUCENE-1797 Project: Lucene - Java Issue Type: Bug

[jira] Commented: (LUCENE-1789) getDocValues should provide a MultiReader DocValues abstraction

2009-08-10 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741464#action_12741464 ] Hoss Man commented on LUCENE-1789: -- bq. How about this: we add a new param to the ctors o

[jira] Commented: (LUCENE-1793) remove custom encoding support in Greek/Russian Analyzers

2009-08-10 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741462#action_12741462 ] Robert Muir commented on LUCENE-1793: - it seems no one is against this, I will clean t

[jira] Commented: (LUCENE-1796) Speed up repeated TokenStream init

2009-08-10 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741459#action_12741459 ] Uwe Schindler commented on LUCENE-1796: --- Ah, you are right! I will try this out. The

[jira] Commented: (LUCENE-1796) Speed up repeated TokenStream init

2009-08-10 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741457#action_12741457 ] Michael Busch commented on LUCENE-1796: --- You don't have to call captureState and clo

[jira] Issue Comment Edited: (LUCENE-1796) Speed up repeated TokenStream init

2009-08-10 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741447#action_12741447 ] Uwe Schindler edited comment on LUCENE-1796 at 8/10/09 11:04 AM: ---

Re: who clears attributes?

2009-08-10 Thread Earwin Burrfoot
I'll deviate from the topic somewhat. What are exact benefits that new tokenstream API yields? Are we sure we want it released with 2.9? By now I only see various elaborate problems, but haven't seen a single piece of code becoming simpler. On Mon, Aug 10, 2009 at 21:50, Uwe Schindler wrote: > Yes

[jira] Commented: (LUCENE-1796) Speed up repeated TokenStream init

2009-08-10 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741447#action_12741447 ] Uwe Schindler commented on LUCENE-1796: --- I have another idea: Why not make the Attri

Re: Sorting cleanup and FieldCacheImpl.Entry confusion

2009-08-10 Thread Chris Hostetter
: I don't know why Entry has "int type" and "String locale", either. I : agree it'd be cleaner for FieldSortedHitQueue to store these on its : own, privately. : : Note that FieldSortedHitQueue is deprecated in favor of : FieldValueHitQueue, and that FieldValueHitQueue doesn't cache : comparators

[jira] Commented: (LUCENE-1796) Speed up repeated TokenStream init

2009-08-10 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741445#action_12741445 ] Uwe Schindler commented on LUCENE-1796: --- But if you use the State and there is no st

[jira] Commented: (LUCENE-1796) Speed up repeated TokenStream init

2009-08-10 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741444#action_12741444 ] Michael Busch commented on LUCENE-1796: --- Another good cache, Uwe! :) AttributeSourc

RE: who clears attributes?

2009-08-10 Thread Uwe Schindler
Yes. Is there a way to enforce this for all Tokenizers automatically? As incrementToken() will be abstract in 3.0, there cannot be a default impl. So all Tokenizers should call clearAttributes() as first call in incrementToken(). Then we have still the problem of the slow iterator creation (which

Re: who clears attributes?

2009-08-10 Thread Michael Busch
Clearing the attributes should be required in those places where we cleared (or reinit'ed) Token previously, right? Michael On 8/10/09 10:42 AM, Yonik Seeley wrote: Thinking through this a little more, I don't see an alternative to the tokenizer clearing all attributes at the start of increme

Re: who clears attributes?

2009-08-10 Thread Yonik Seeley
Thinking through this a little more, I don't see an alternative to the tokenizer clearing all attributes at the start of incrementToken(). Consider a DefaultPayloadTokenFilter that only sets a payload if one isn't already set - it's clear that this filter can't clear the payload attribute, so it m

  1   2   >