On Jun 15, 2009, at 2:11 PM, Grant Ingersoll wrote:
More questions:
1. What about Highlighter and MoreLikeThis? They have not been
converted. Also, what are they going to do if the attributes they
need are not available? Caveat emptor?
2. Same for TermVectors. What if the user specif
On 6/15/09 10:10 AM, Grant Ingersoll wrote:
But, as Michael M reminded me, it is complex, so please accept my
apologies.
No worries, Grant! I was not really offended, but rather confused...
Thanks for clarifying.
Michael
Grant Ingersoll wrote:
1. What about Highlighter
I would guess Highlighter has not been updated because its kind of a
royal * :)
--
- Mark
http://www.lucidimagination.com
-
To unsubscribe, e-mail: java-dev-unsubscr
Mark Miller wrote:
Grant Ingersoll wrote:
On Jun 14, 2009, at 8:05 PM, Michael Busch wrote:
I'd be happy to discuss other API proposals that anybody brings up
here, that have the same advantages and are more intuitive. We could
also beef up the documentation and give a better example about
Grant Ingersoll wrote:
On Jun 14, 2009, at 8:05 PM, Michael Busch wrote:
I'd be happy to discuss other API proposals that anybody brings up
here, that have the same advantages and are more intuitive. We could
also beef up the documentation and give a better example about how to
convert a st
*Sent:* Monday, June 15, 2009 10:39 PM
*To:* java-dev@lucene.apache.org
*Subject:* Re: New Token API was Re: Payloads and TrieRangeQuery
I have implemented most of that actually (the interface part and Token
implementing all of them).
The problem is a paradigm change with the new API: the assum
yeah about 5 seconds in I saw that and decided to stick with what I know :)
On Mon, Jun 15, 2009 at 5:10 PM, Mark Miller wrote:
> I may do the Highlighter. Its annoying though - I'll have to break back
> compat because Token is part of the public API (Fragmenter, etc).
>
> Robert Muir wrote:
>>
>>
I may do the Highlighter. Its annoying though - I'll have to break back
compat because Token is part of the public API (Fragmenter, etc).
Robert Muir wrote:
Michael OK, I plan on adding some tests for the analyzers that don't have any.
I didn't try to migrate things such as highlighter, which
@lucene.apache.org
Subject: Re: New Token API was Re: Payloads and TrieRangeQuery
I have implemented most of that actually (the interface part and Token
implementing all of them).
The problem is a paradigm change with the new API: the assumption is that
there is always only one single instance of an Attribute
Michael OK, I plan on adding some tests for the analyzers that don't have any.
I didn't try to migrate things such as highlighter, which are
definitely just as important, only because I'm not familiar with that
territory.
But I think I can figure out what the various language analyzers are
trying
I agree. It's my fault, the task of changing the contribs (LUCENE-1460)
is assigned to me for a while now - I just haven't found the time to do
it yet.
It's great that you started the work on that! I'll try to review the
patch in the next couple of days and help with fixing the remaining
ones
On Mon, Jun 15, 2009 at 4:21 PM, Uwe Schindler wrote:
> And, in tests: test/o/a/l/index/store is somehow wrong placed. The class
> inside should be in test/o/a/l/store. Should I move?
Please do!
Mike
-
To unsubscribe, e-mail: j
I have implemented most of that actually (the interface part and Token
implementing all of them).
The problem is a paradigm change with the new API: the assumption is
that there is always only one single instance of an Attribute. With the
old API, it is recommended to reuse the passed-in token
Michael, again I am terrible with such things myself...
Personally I am impressed that you have the back compat, even if you
don't change any code at all I think some reformatting of javadocs
might make the situation a lot friendlier. I just listed everything
that came to my mind immediately.
I g
> And I don't like the *useNewAPI*() methods either. I spent a lot of time
> thinking about backwards compatibility for this API. It's tricky to do
> without sacrificing performance. In API patches I find myself spending
> more time for backwards-compatibility than for the actual new feature! :(
hetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Uwe Schindler [mailto:u...@thetaphi.de]
> Sent: Monday, June 15, 2009 10:18 PM
> To: java-dev@lucene.apache.org
> Subject: RE: New Token API was Re: Payloads and TrieRangeQuery
>
> > there's also
This is excellent feedback, Robert!
I agree this is confusing; especially having a deprecated API and only a
experimental one that replaces the old one. We need to change that.
And I don't like the *useNewAPI*() methods either. I spent a lot of time
thinking about backwards compatibility for th
Some great points - especially the decision between a deprecated API,
and a new experimental one subject to change. Bit of a rock and a hard
place for a new user.
Perhaps we should add a little note with some guidance.
- Mark
Robert Muir wrote:
let me try some slightly more constructive fee
> there's also a stray bold tag gone haywire somewhere, possibly
> .incrementToken()
I fixed this. This was going me on my nerves the whole day when I wrote
javadocs for NumericTokenStream...
Uwe
-
To unsubscribe, e-mail: java-
let me try some slightly more constructive feedback:
new user looks at TokenStream javadocs:
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/org/apache/lucene/analysis/TokenStream.html
immediately they see deprecated, text in red with the words
"experimental", warnings in bold, the
Mark, I'll see if I can get tests produced for some of those analyzers.
as a new user of the new api myself, I think I can safely say the most
confusing thing about it is having the old deprecated API mixed in the
javadocs with it :)
On Mon, Jun 15, 2009 at 2:53 PM, Mark Miller wrote:
> Robert Mu
> If you understood that, you'd be able to look
> at the actual token value if you were interested in what shift was
> used. So it's redundant, has a runtime cost, it's not currently used
> anywhere, and it's not useful to fields other than Trie. Perhaps it
> shouldn't exist (yet)?
You are right
> On Mon, Jun 15, 2009 at 3:00 PM, Uwe Schindler wrote:
> > There is a new Attribute called ShiftAttribute (or
> NumericShiftAttribute),
> > when trie range is moved to core. This attribute contains the shifted-
> away
> > bits from the prefix encoded value during trie indexing.
>
> I was wonderin
On Mon, Jun 15, 2009 at 3:00 PM, Uwe Schindler wrote:
> There is a new Attribute called ShiftAttribute (or NumericShiftAttribute),
> when trie range is moved to core. This attribute contains the shifted-away
> bits from the prefix encoded value during trie indexing.
I was wondering about this
> Also, what about the case where one might have attributes that are meant
> for downstream TokenFilters, but not necessarily for indexing? Offsets
> and type come to mind. Is it the case now that those attributes are not
> automatically added to the index? If they are ignored now, what if I
Robert Muir wrote:
Mark, I created an issue for this.
Thanks Robert, great idea.
I just think you know, converting an analyzer to the new api is really
not that bad.
I don't either. I'm really just complaining about the initial
readability. Once you know whats up, its not too much differ
Mark, I created an issue for this.
I just think you know, converting an analyzer to the new api is really
not that bad.
reverse engineering what one of them does is not necessarily obvious,
and is completely unrelated but necessary if they are to be migrated.
I'd be willing to assist with some o
Robert Muir wrote:
As Lucene's contrib hasn't been fully converted either (and its been quite
some time now), someone has probably heard that groan before.
hope this doesn't sound like a complaint,
Complaints are fine in any case. Every now and then, it might cause a
little rant from me o
On Jun 14, 2009, at 8:05 PM, Michael Busch wrote:
I'd be happy to discuss other API proposals that anybody brings up
here, that have the same advantages and are more intuitive. We could
also beef up the documentation and give a better example about how
to convert a stream/filter from the
>
> As Lucene's contrib hasn't been fully converted either (and its been quite
> some time now), someone has probably heard that groan before.
hope this doesn't sound like a complaint, but in my opinion this is
because many do not have any tests.
I converted a few of these and its just grunt work
Yonik Seeley wrote:
The high-level description of the new API looks good (being able to
add arbitrary properties to tokens), unfortunately, I've never had the
time to try and use it and give any constructive feedback.
As far as difficulty of use, I assume this only applies to
implementing your o
The high-level description of the new API looks good (being able to
add arbitrary properties to tokens), unfortunately, I've never had the
time to try and use it and give any constructive feedback.
As far as difficulty of use, I assume this only applies to
implementing your own TokenFilter? It see
On Jun 14, 2009, at 8:05 PM, Michael Busch wrote:
I'm not sure why this (currently having to implement next() too) is
such an issue for you. You brought it up at the Lucene meetup too.
No user will ever have to implement both (the new API and the old)
in their streams/filters. The only reas
On Jun 15, 2009, at 12:19 PM, Michael McCandless wrote:
I don't think anything was "held back" in this effort. Grant, are you
referring to LUCENE-1458? That's "held back" simply because the only
person working on it (me) got distracted by other things to work on.
I'm sorry, I didn't mean to
I thought the primary goal of switching to AttributeSource (yes, the
name is very generic...) was to allow extensibility to what's created
per-Token, so that an app could add their own attrs without costly
subclassing/casting per Token, independent of other other "things"
adding their tokens, etc.
The "old" API is deprecated, and therefore when we release 2.9 there might
be some people who'd think they should move away from it, to better prepare
for 3.0 (while in fact this many not be the case). Also, we should make sure
that when we remove all the deprecations, this will still exist (and
th
Mark Miller wrote:
I don't know how I feel about rolling the new token api back.
I will say that I originally had no issue with it because I am very
excited about Lucene-1458.
At the same time though, I'm thinking Lucene-1458 is a very advanced
issue that will likely be for really expert usa
I don't know how I feel about rolling the new token api back.
I will say that I originally had no issue with it because I am very
excited about Lucene-1458.
At the same time though, I'm thinking Lucene-1458 is a very advanced
issue that will likely be for really expert usage (though I can see
On 6/14/09 5:17 AM, Grant Ingersoll wrote:
Agreed. I've been bringing it up for a while now and made the same
comments when it was first introduced, but felt like the lone voice in
the wilderness on it and gave way [1], [2], [3]. Now that others are
writing/converting, I think it is worth rev
Yonik Seeley wrote:
On Nov 19, 2007 7:02 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:
Yonik Seeley wrote:
1) If we are deprecating some methods like String termText(), how
about at the same time deprecating "String type"? If we want
lightweight per-token metadata for communication between filte
On Nov 19, 2007 7:02 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:
> Yonik Seeley wrote:
> > 1) If we are deprecating some methods like String termText(), how
> > about at the same time deprecating "String type"? If we want
> > lightweight per-token metadata for communication between filters, an
> >
Yonik Seeley wrote:
1) If we are deprecating some methods like String termText(), how
about at the same time deprecating "String type"? If we want
lightweight per-token metadata for communication between filters, an
int or a long used as a bitvector (32 or 64 independent boolean vars
per token)
"Yonik Seeley" <[EMAIL PROTECTED]> wrote:
> On Nov 18, 2007 6:07 AM, Michael McCandless <[EMAIL PROTECTED]>
> wrote:
> > a quick test tokenizing all of Wikipedia w/
> > SimpleAnalyzer showed 6-8% overall slowdown if I call token.clear() in
> > ReadTokensTask.java.
>
> We could slim down clear() a
On Nov 18, 2007 6:07 AM, Michael McCandless <[EMAIL PROTECTED]> wrote:
> a quick test tokenizing all of Wikipedia w/
> SimpleAnalyzer showed 6-8% overall slowdown if I call token.clear() in
> ReadTokensTask.java.
We could slim down clear() a little by only resetting certain things...
startOffset a
On Nov 18, 2007 6:07 AM, Michael McCandless <[EMAIL PROTECTED]> wrote:
>
> "Yonik Seeley" <[EMAIL PROTECTED]> wrote:
>
> > 1) If we are deprecating some methods like String termText(), how
> > about at the same time deprecating "String type"? If we want
> > lightweight per-token metadata for commu
"Yonik Seeley" <[EMAIL PROTECTED]> wrote:
> On Nov 18, 2007 6:07 AM, Michael McCandless <[EMAIL PROTECTED]>
> wrote:
> > How about: if you are re-using your token, then whoever set the
> > payload, positionIncrement, etc, should always clear/reset it on the
> > next token?
>
> I considered this,
On Nov 18, 2007 6:07 AM, Michael McCandless <[EMAIL PROTECTED]> wrote:
> How about: if you are re-using your token, then whoever set the
> payload, positionIncrement, etc, should always clear/reset it on the
> next token?
I considered this, but it doesn't really seem practical since a filter
doesn
"Yonik Seeley" <[EMAIL PROTECTED]> wrote:
> 1) If we are deprecating some methods like String termText(), how
> about at the same time deprecating "String type"? If we want
> lightweight per-token metadata for communication between filters, an
> int or a long used as a bitvector (32 or 64 indepe
48 matches
Mail list logo