[jira] Closed: (LUCENE-1697) MoreLikeThis should use the new Token API

2009-07-24 Thread Michael Busch (JIRA)
the new Token API > - > > Key: LUCENE-1697 > URL: https://issues.apache.org/jira/browse/LUCENE-1697 > Project: Lucene - Java > Issue Type: Improvement >R

[jira] Closed: (LUCENE-1696) Added New Token API impl for ASCIIFoldingFilter

2009-07-24 Thread Uwe Schindler (JIRA)
New Token API impl for ASCIIFoldingFilter > --- > > Key: LUCENE-1696 > URL: https://issues.apache.org/jira/browse/LUCENE-1696 > Project: Lucene - Java > Issue Type: Improvement

[jira] Assigned: (LUCENE-1697) MoreLikeThis should use the new Token API

2009-07-14 Thread Mark Miller (JIRA)
> MoreLikeThis should use the new Token API > - > > Key: LUCENE-1697 > URL: https://issues.apache.org/jira/browse/LUCENE-1697 > Project: Lucene - Java > Issue Type: Improvement >

[jira] Commented: (LUCENE-1696) Added New Token API impl for ASCIIFoldingFilter

2009-07-14 Thread Mark Miller (JIRA)
. It feels like I committed this so long ago that it couldn't possibly be new ;) > Added New Token API impl for ASCIIFoldingFilter > --- > > Key: LUCENE-1696 > URL: https://issues.apache.org

[jira] Commented: (LUCENE-1696) Added New Token API impl for ASCIIFoldingFilter

2009-07-14 Thread Uwe Schindler (JIRA)
2.9 :-) ASCIIFoldingFilter is not in 2.4.1 > Added New Token API impl for ASCIIFoldingFilter > --- > > Key: LUCENE-1696 > URL: https://issues.apache.org/jira/browse/LUCENE-1696 > Project: Lucene - Java

[jira] Issue Comment Edited: (LUCENE-1696) Added New Token API impl for ASCIIFoldingFilter

2009-07-14 Thread Mark Miller (JIRA)
6 AM: -- Heh - hate to sound like a broken record, but: making this class final breaks back compat? was (Author: markrmil...@gmail.com): Heh - hate to sound like a broken record, but: making this class finally breaks back compat? > Added New Token API impl for ASCIIFoldin

[jira] Assigned: (LUCENE-1696) Added New Token API impl for ASCIIFoldingFilter

2009-07-14 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reassigned LUCENE-1696: - Assignee: Uwe Schindler (was: Mark Miller) > Added New Token API impl

[jira] Commented: (LUCENE-1696) Added New Token API impl for ASCIIFoldingFilter

2009-07-14 Thread Mark Miller (JIRA)
but: making this class finally breaks back compat? > Added New Token API impl for ASCIIFoldingFilter > --- > > Key: LUCENE-1696 > URL: https://issues.apache.org/jira/browse/LUCENE-1696 >

[jira] Commented: (LUCENE-1696) Added New Token API impl for ASCIIFoldingFilter

2009-07-14 Thread Uwe Schindler (JIRA)
this filter for LUCENE-1693. Patch will come shortly together with this issue. The old API can be removed, the filter is now final and so next() and nextToken() can be left unimplemented. > Added New Token API impl for ASCIIFoldin

[jira] Commented: (LUCENE-1696) Added New Token API impl for ASCIIFoldingFilter

2009-06-18 Thread Simon Willnauer (JIRA)
t it if it needs some changes. If I do not react please send me a ping on this issue. Thanks > Added New Token API impl for ASCIIFoldingFilter > --- > > Key: LUCENE-1696 > URL: https://issues.apache.org/jira

[jira] Commented: (LUCENE-1696) Added New Token API impl for ASCIIFoldingFilter

2009-06-17 Thread Mark Miller (JIRA)
till the token api improvement patch is finished, just in case we need to make an adjustment here. > Added New Token API impl for ASCIIFoldingFilter > --- > > Key: LUCENE-1696 > URL: https://issues.apach

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-17 Thread Grant Ingersoll
On Jun 15, 2009, at 2:11 PM, Grant Ingersoll wrote: More questions: 1. What about Highlighter and MoreLikeThis? They have not been converted. Also, what are they going to do if the attributes they need are not available? Caveat emptor? 2. Same for TermVectors. What if the user specif

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-17 Thread Michael Busch
On 6/15/09 10:10 AM, Grant Ingersoll wrote: But, as Michael M reminded me, it is complex, so please accept my apologies. No worries, Grant! I was not really offended, but rather confused... Thanks for clarifying. Michael

[jira] Commented: (LUCENE-1697) MoreLikeThis should use the new Token API

2009-06-16 Thread Mark Miller (JIRA)
d. If you don't want this one Grant, we should assign to Michael as this is a part of LUCENE-1460. > MoreLikeThis should use the new Token API > - > > Key: LUCENE-1697 > URL: https://issues.apach

[jira] Commented: (LUCENE-1696) Added New Token API impl for ASCIIFoldingFilter

2009-06-16 Thread Robert Muir (JIRA)
alent' means different things to different people in different languages... > Added New Token API impl for ASCIIFoldingFilter > --- > > Key: LUCENE-1696 > URL: https://issues.apache.org/jira/browse

[jira] Commented: (LUCENE-1696) Added New Token API impl for ASCIIFoldingFilter

2009-06-16 Thread Simon Willnauer (JIRA)
its documented you can use ENGLISH collator and it will behave like asciifolding filter (simply remove all diacritics). you could then apply the tailorings like the example and get the behavior you want, versus maintaining a custom asciifoldingfilter... will try, thanks! > Added New Token API i

[jira] Commented: (LUCENE-1696) Added New Token API impl for ASCIIFoldingFilter

2009-06-16 Thread Robert Muir (JIRA)
can use ENGLISH collator and it will behave like asciifolding filter (simply remove all diacritics). you could then apply the tailorings like the example and get the behavior you want, versus maintaining a custom asciifoldingfilter... > Added New Token API impl for ASCIIFoldin

[jira] Created: (LUCENE-1697) MoreLikeThis should use the new Token API

2009-06-16 Thread Grant Ingersoll (JIRA)
MoreLikeThis should use the new Token API - Key: LUCENE-1697 URL: https://issues.apache.org/jira/browse/LUCENE-1697 Project: Lucene - Java Issue Type: Improvement Reporter: Grant Ingersoll

[jira] Commented: (LUCENE-1696) Added New Token API impl for ASCIIFoldingFilter

2009-06-16 Thread Simon Willnauer (JIRA)
bout collation before and I validated it for the usecase - I do not know what language / local my docs are so I can not set the correct one. Nevermind. :) > Added New Token API impl for ASCIIFoldingFilter > --- > >

[jira] Updated: (LUCENE-1696) Added New Token API impl for ASCIIFoldingFilter

2009-06-16 Thread Robert Muir (JIRA)
since its non-standard collation behavior, but not too difficult. you can do this with the jdk version too, i always show the ICU implementation because of its performance. both are available in contrib/collation > Added New Token API impl for ASCIIFoldingFil

[jira] Assigned: (LUCENE-1696) Added New Token API impl for ASCIIFoldingFilter

2009-06-16 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller reassigned LUCENE-1696: --- Assignee: Mark Miller > Added New Token API impl for ASCIIFoldingFil

[jira] Commented: (LUCENE-1696) Added New Token API impl for ASCIIFoldingFilter

2009-06-16 Thread Simon Willnauer (JIRA)
y changes for umlauts at least. :) > Added New Token API impl for ASCIIFoldingFilter > --- > > Key: LUCENE-1696 > URL: https://issues.apache.org/jira/browse/LUCENE-1696 > Project: Lucene -

[jira] Commented: (LUCENE-1696) Added New Token API impl for ASCIIFoldingFilter

2009-06-16 Thread Robert Muir (JIRA)
1581 showing how this works with contrib/collation. > Added New Token API impl for ASCIIFoldingFilter > --- > > Key: LUCENE-1696 > URL: https://issues.apache.org/jira/browse/LUCENE-1696 >

[jira] Commented: (LUCENE-1696) Added New Token API impl for ASCIIFoldingFilter

2009-06-16 Thread Robert Muir (JIRA)
ents in a language-dependent/correct way, you can use contrib/collation for this purpose. i don't see an alternative, otherwise you will end out with 50-100 sets of language-dependent rules [essentially duplicating the logic collation already knows about] > Added New Token

[jira] Created: (LUCENE-1696) Added New Token API impl for ASCIIFoldingFilter

2009-06-16 Thread Simon Willnauer (JIRA)
Added New Token API impl for ASCIIFoldingFilter --- Key: LUCENE-1696 URL: https://issues.apache.org/jira/browse/LUCENE-1696 Project: Lucene - Java Issue Type: Improvement Components

[jira] Updated: (LUCENE-1696) Added New Token API impl for ASCIIFoldingFilter

2009-06-16 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-1696: Attachment: ASCIIFoldingFilter._newTokenAPI.patch all tests pass > Added New Token

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Mark Miller
Grant Ingersoll wrote: 1. What about Highlighter I would guess Highlighter has not been updated because its kind of a royal * :) -- - Mark http://www.lucidimagination.com - To unsubscribe, e-mail: java-dev-unsubscr

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Mark Miller
Mark Miller wrote: Grant Ingersoll wrote: On Jun 14, 2009, at 8:05 PM, Michael Busch wrote: I'd be happy to discuss other API proposals that anybody brings up here, that have the same advantages and are more intuitive. We could also beef up the documentation and give a better example about

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Mark Miller
Grant Ingersoll wrote: On Jun 14, 2009, at 8:05 PM, Michael Busch wrote: I'd be happy to discuss other API proposals that anybody brings up here, that have the same advantages and are more intuitive. We could also beef up the documentation and give a better example about how to convert a st

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Michael Busch
*Sent:* Monday, June 15, 2009 10:39 PM *To:* java-dev@lucene.apache.org *Subject:* Re: New Token API was Re: Payloads and TrieRangeQuery I have implemented most of that actually (the interface part and Token implementing all of them). The problem is a paradigm change with the new API: the assum

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Robert Muir
yeah about 5 seconds in I saw that and decided to stick with what I know :) On Mon, Jun 15, 2009 at 5:10 PM, Mark Miller wrote: > I may do the Highlighter. Its annoying though - I'll have to break back > compat because Token is part of the public API (Fragmenter, etc). > > Robert Muir wrote: >> >>

Some SVN cleanup, was: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Uwe Schindler
rg > Subject: Re: New Token API was Re: Payloads and TrieRangeQuery > > On Mon, Jun 15, 2009 at 4:21 PM, Uwe Schindler wrote: > > > And, in tests: test/o/a/l/index/store is somehow wrong placed. The class > > inside should be in test/o/a/l/store. Shou

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Mark Miller
I may do the Highlighter. Its annoying though - I'll have to break back compat because Token is part of the public API (Fragmenter, etc). Robert Muir wrote: Michael OK, I plan on adding some tests for the analyzers that don't have any. I didn't try to migrate things such as highlighter, which

RE: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Uwe Schindler
@lucene.apache.org Subject: Re: New Token API was Re: Payloads and TrieRangeQuery I have implemented most of that actually (the interface part and Token implementing all of them). The problem is a paradigm change with the new API: the assumption is that there is always only one single instance of an Attribute

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Robert Muir
Michael OK, I plan on adding some tests for the analyzers that don't have any. I didn't try to migrate things such as highlighter, which are definitely just as important, only because I'm not familiar with that territory. But I think I can figure out what the various language analyzers are trying

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Michael Busch
I agree. It's my fault, the task of changing the contribs (LUCENE-1460) is assigned to me for a while now - I just haven't found the time to do it yet. It's great that you started the work on that! I'll try to review the patch in the next couple of days and help with fixing the remaining ones

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Michael McCandless
On Mon, Jun 15, 2009 at 4:21 PM, Uwe Schindler wrote: > And, in tests: test/o/a/l/index/store is somehow wrong placed. The class > inside should be in test/o/a/l/store. Should I move? Please do! Mike - To unsubscribe, e-mail: j

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Michael Busch
I have implemented most of that actually (the interface part and Token implementing all of them). The problem is a paradigm change with the new API: the assumption is that there is always only one single instance of an Attribute. With the old API, it is recommended to reuse the passed-in token

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Robert Muir
Michael, again I am terrible with such things myself... Personally I am impressed that you have the back compat, even if you don't change any code at all I think some reformatting of javadocs might make the situation a lot friendlier. I just listed everything that came to my mind immediately. I g

RE: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Uwe Schindler
> And I don't like the *useNewAPI*() methods either. I spent a lot of time > thinking about backwards compatibility for this API. It's tricky to do > without sacrificing performance. In API patches I find myself spending > more time for backwards-compatibility than for the actual new feature! :(

RE: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Uwe Schindler
hetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Uwe Schindler [mailto:u...@thetaphi.de] > Sent: Monday, June 15, 2009 10:18 PM > To: java-dev@lucene.apache.org > Subject: RE: New Token API was Re: Payloads and TrieRangeQuery > > > there's also

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Michael Busch
This is excellent feedback, Robert! I agree this is confusing; especially having a deprecated API and only a experimental one that replaces the old one. We need to change that. And I don't like the *useNewAPI*() methods either. I spent a lot of time thinking about backwards compatibility for th

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Mark Miller
Some great points - especially the decision between a deprecated API, and a new experimental one subject to change. Bit of a rock and a hard place for a new user. Perhaps we should add a little note with some guidance. - Mark Robert Muir wrote: let me try some slightly more constructive fee

RE: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Uwe Schindler
> there's also a stray bold tag gone haywire somewhere, possibly > .incrementToken() I fixed this. This was going me on my nerves the whole day when I wrote javadocs for NumericTokenStream... Uwe - To unsubscribe, e-mail: java-

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Robert Muir
let me try some slightly more constructive feedback: new user looks at TokenStream javadocs: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/org/apache/lucene/analysis/TokenStream.html immediately they see deprecated, text in red with the words "experimental", warnings in bold, the

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Robert Muir
Mark, I'll see if I can get tests produced for some of those analyzers. as a new user of the new api myself, I think I can safely say the most confusing thing about it is having the old deprecated API mixed in the javadocs with it :) On Mon, Jun 15, 2009 at 2:53 PM, Mark Miller wrote: > Robert Mu

RE: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Uwe Schindler
> If you understood that, you'd be able to look > at the actual token value if you were interested in what shift was > used. So it's redundant, has a runtime cost, it's not currently used > anywhere, and it's not useful to fields other than Trie. Perhaps it > shouldn't exist (yet)? You are right

RE: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Uwe Schindler
> On Mon, Jun 15, 2009 at 3:00 PM, Uwe Schindler wrote: > > There is a new Attribute called ShiftAttribute (or > NumericShiftAttribute), > > when trie range is moved to core. This attribute contains the shifted- > away > > bits from the prefix encoded value during trie indexing. > > I was wonderin

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Yonik Seeley
On Mon, Jun 15, 2009 at 3:00 PM, Uwe Schindler wrote: > There is a new Attribute called ShiftAttribute (or NumericShiftAttribute), > when trie range is moved to core. This attribute contains the shifted-away > bits from the prefix encoded value during trie indexing. I was wondering about this

RE: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Uwe Schindler
> Also, what about the case where one might have attributes that are meant > for downstream TokenFilters, but not necessarily for indexing? Offsets > and type come to mind. Is it the case now that those attributes are not > automatically added to the index? If they are ignored now, what if I

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Mark Miller
Robert Muir wrote: Mark, I created an issue for this. Thanks Robert, great idea. I just think you know, converting an analyzer to the new api is really not that bad. I don't either. I'm really just complaining about the initial readability. Once you know whats up, its not too much differ

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Robert Muir
Mark, I created an issue for this. I just think you know, converting an analyzer to the new api is really not that bad. reverse engineering what one of them does is not necessarily obvious, and is completely unrelated but necessary if they are to be migrated. I'd be willing to assist with some o

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Mark Miller
Robert Muir wrote: As Lucene's contrib hasn't been fully converted either (and its been quite some time now), someone has probably heard that groan before. hope this doesn't sound like a complaint, Complaints are fine in any case. Every now and then, it might cause a little rant from me o

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Grant Ingersoll
On Jun 14, 2009, at 8:05 PM, Michael Busch wrote: I'd be happy to discuss other API proposals that anybody brings up here, that have the same advantages and are more intuitive. We could also beef up the documentation and give a better example about how to convert a stream/filter from the

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Robert Muir
> > As Lucene's contrib hasn't been fully converted either (and its been quite > some time now), someone has probably heard that groan before. hope this doesn't sound like a complaint, but in my opinion this is because many do not have any tests. I converted a few of these and its just grunt work

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Mark Miller
Yonik Seeley wrote: The high-level description of the new API looks good (being able to add arbitrary properties to tokens), unfortunately, I've never had the time to try and use it and give any constructive feedback. As far as difficulty of use, I assume this only applies to implementing your o

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Yonik Seeley
The high-level description of the new API looks good (being able to add arbitrary properties to tokens), unfortunately, I've never had the time to try and use it and give any constructive feedback. As far as difficulty of use, I assume this only applies to implementing your own TokenFilter? It see

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Grant Ingersoll
On Jun 14, 2009, at 8:05 PM, Michael Busch wrote: I'm not sure why this (currently having to implement next() too) is such an issue for you. You brought it up at the Lucene meetup too. No user will ever have to implement both (the new API and the old) in their streams/filters. The only reas

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Grant Ingersoll
On Jun 15, 2009, at 12:19 PM, Michael McCandless wrote: I don't think anything was "held back" in this effort. Grant, are you referring to LUCENE-1458? That's "held back" simply because the only person working on it (me) got distracted by other things to work on. I'm sorry, I didn't mean to

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-15 Thread Michael McCandless
ote), but I have to admit it's only > because I've got quite comfortable with the existing API, and did not have > the time to try the new one yet. > > Shai > > On Mon, Jun 15, 2009 at 3:49 AM, Mark Miller wrote: >> >> Mark Miller wrote: >>> >>&g

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-14 Thread Shai Erera
uite comfortable with the existing API, and did not have the time to try the new one yet. Shai On Mon, Jun 15, 2009 at 3:49 AM, Mark Miller wrote: > Mark Miller wrote: > >> I don't know how I feel about rolling the new token api back. >> >> I will say that I originally h

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-14 Thread Mark Miller
Mark Miller wrote: I don't know how I feel about rolling the new token api back. I will say that I originally had no issue with it because I am very excited about Lucene-1458. At the same time though, I'm thinking Lucene-1458 is a very advanced issue that will likely be for rea

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-14 Thread Mark Miller
I don't know how I feel about rolling the new token api back. I will say that I originally had no issue with it because I am very excited about Lucene-1458. At the same time though, I'm thinking Lucene-1458 is a very advanced issue that will likely be for really expert usage (th

Re: New Token API was Re: Payloads and TrieRangeQuery

2009-06-14 Thread Michael Busch
On 6/14/09 5:17 AM, Grant Ingersoll wrote: Agreed. I've been bringing it up for a while now and made the same comments when it was first introduced, but felt like the lone voice in the wilderness on it and gave way [1], [2], [3]. Now that others are writing/converting, I think it is worth rev

New Token API was Re: Payloads and TrieRangeQuery

2009-06-14 Thread Grant Ingersoll
and debug. It's a net win since the indexing performance improvements were so fantastic. I agree - very hard to follow, worth the improvements. Just to throw something out, the new Token API is not very consumable in my experience. The old one was very intuitive and very easy to follow t

Re: new Token API

2007-11-21 Thread Endre Stølsvik
Yonik Seeley wrote: On Nov 19, 2007 7:02 PM, Doug Cutting <[EMAIL PROTECTED]> wrote: Yonik Seeley wrote: 1) If we are deprecating some methods like String termText(), how about at the same time deprecating "String type"? If we want lightweight per-token metadata for communication between filte

Re: new Token API

2007-11-19 Thread Yonik Seeley
On Nov 19, 2007 7:02 PM, Doug Cutting <[EMAIL PROTECTED]> wrote: > Yonik Seeley wrote: > > 1) If we are deprecating some methods like String termText(), how > > about at the same time deprecating "String type"? If we want > > lightweight per-token metadata for communication between filters, an > >

Re: new Token API

2007-11-19 Thread Doug Cutting
Yonik Seeley wrote: 1) If we are deprecating some methods like String termText(), how about at the same time deprecating "String type"? If we want lightweight per-token metadata for communication between filters, an int or a long used as a bitvector (32 or 64 independent boolean vars per token)

Re: new Token API

2007-11-19 Thread Michael McCandless
"Yonik Seeley" <[EMAIL PROTECTED]> wrote: > On Nov 18, 2007 6:07 AM, Michael McCandless <[EMAIL PROTECTED]> > wrote: > > a quick test tokenizing all of Wikipedia w/ > > SimpleAnalyzer showed 6-8% overall slowdown if I call token.clear() in > > ReadTokensTask.java. > > We could slim down clear() a

Re: new Token API

2007-11-19 Thread Yonik Seeley
On Nov 18, 2007 6:07 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: > a quick test tokenizing all of Wikipedia w/ > SimpleAnalyzer showed 6-8% overall slowdown if I call token.clear() in > ReadTokensTask.java. We could slim down clear() a little by only resetting certain things... startOffset a

Re: new Token API

2007-11-18 Thread Yonik Seeley
On Nov 18, 2007 6:07 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > "Yonik Seeley" <[EMAIL PROTECTED]> wrote: > > > 1) If we are deprecating some methods like String termText(), how > > about at the same time deprecating "String type"? If we want > > lightweight per-token metadata for commu

Re: new Token API

2007-11-18 Thread Michael McCandless
"Yonik Seeley" <[EMAIL PROTECTED]> wrote: > On Nov 18, 2007 6:07 AM, Michael McCandless <[EMAIL PROTECTED]> > wrote: > > How about: if you are re-using your token, then whoever set the > > payload, positionIncrement, etc, should always clear/reset it on the > > next token? > > I considered this,

Re: new Token API

2007-11-18 Thread Yonik Seeley
On Nov 18, 2007 6:07 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: > How about: if you are re-using your token, then whoever set the > payload, positionIncrement, etc, should always clear/reset it on the > next token? I considered this, but it doesn't really seem practical since a filter doesn

Re: new Token API

2007-11-18 Thread Michael McCandless
"Yonik Seeley" <[EMAIL PROTECTED]> wrote: > 1) If we are deprecating some methods like String termText(), how > about at the same time deprecating "String type"? If we want > lightweight per-token metadata for communication between filters, an > int or a long used as a bitvector (32 or 64 indepe

new Token API

2007-11-17 Thread Yonik Seeley
Regarding the recent changes in Token (reusability and use char[] instead of Token) 1) If we are deprecating some methods like String termText(), how about at the same time deprecating "String type"? If we want lightweight per-token metadata for communication between filters, an int or a long use