Re: Payload API

2007-11-20 Thread Michael Busch
Grant Ingersoll wrote: > Scratch my last comment. I was thinking it only pertained to payloads. > > In that light, I think we should modify the scorePayload method for the > time being, then we can deprecate it when we go to per field sim. > > -Grant > OK sounds good. Will make the change with

Re: Payload API

2007-11-20 Thread Grant Ingersoll
Scratch my last comment. I was thinking it only pertained to payloads. In that light, I think we should modify the scorePayload method for the time being, then we can deprecate it when we go to per field sim. -Grant On Nov 20, 2007, at 2:34 PM, Michael Busch wrote: Yonik Seeley wrote: P

Re: Payload API

2007-11-20 Thread Grant Ingersoll
Well, we are making an awful lot of improvements for Payloads, I think we should try to get it in now and make 2.3 wait a bit more, since we all have more or less agreed that 2.9 (next after 2.3) is going to be a deprecation release before moving to 3.0 -Grant On Nov 20, 2007, at 2:34 PM,

Re: Payload API

2007-11-20 Thread Michael Busch
Yonik Seeley wrote: > > Per field similarity would certainly be more efficient since it moves > the field->similarity lookup from the inner loop to the outer loop. > I agree. Then I'll leave the scorePayload() API as is for now. And I don't think the per-field similarity should block 2.3, so let

Re: Payload API

2007-11-20 Thread Yonik Seeley
On Nov 20, 2007 2:17 PM, Michael Busch <[EMAIL PROTECTED]> wrote: > Grant Ingersoll wrote: > > +1 for adding the field name. > > > > The question is whether we should add the field name to the > Similarity#scorePayload() method or if we should support a per-field > similarity in the future? Per fi

Re: Payload API

2007-11-20 Thread Michael Busch
Grant Ingersoll wrote: > +1 for adding the field name. > > The question is whether we should add the field name to the Similarity#scorePayload() method or if we should support a per-field similarity in the future? -Michael - T

Re: Payload API

2007-11-20 Thread Yonik Seeley
"Yonik Seeley" <[EMAIL PROTECTED]> wrote: > > If we used a Payload object, it would save 8 bytes per Token for > > fields not using payloads. Of course with Token reuse, saving 8 bytes isn't important any more either since it's only allocated once per field. -Yonik --

Re: Payload API

2007-11-20 Thread Michael Busch
Michael McCandless wrote: > "Yonik Seeley" <[EMAIL PROTECTED]> wrote: >> On Nov 19, 2007 6:52 PM, Michael Busch <[EMAIL PROTECTED]> wrote: >>> Yonik Seeley wrote: So I think we all agree to do payloads by reference (do not make a copy of byte[] like termBuffer does), and to allow payload

Re: Payload API

2007-11-20 Thread Michael McCandless
"Yonik Seeley" <[EMAIL PROTECTED]> wrote: > On Nov 19, 2007 6:52 PM, Michael Busch <[EMAIL PROTECTED]> wrote: > > Yonik Seeley wrote: > > > > > > So I think we all agree to do payloads by reference (do not make a > > > copy of byte[] like termBuffer does), and to allow payload reuse. > > > > > > So

Re: Payload API

2007-11-19 Thread Yonik Seeley
On Nov 19, 2007 6:52 PM, Michael Busch <[EMAIL PROTECTED]> wrote: > Yonik Seeley wrote: > > > > So I think we all agree to do payloads by reference (do not make a > > copy of byte[] like termBuffer does), and to allow payload reuse. > > > > So now we still have 3 viable options still on the table I

Re: Payload API

2007-11-19 Thread Michael Busch
Yonik Seeley wrote: > > So I think we all agree to do payloads by reference (do not make a > copy of byte[] like termBuffer does), and to allow payload reuse. > > So now we still have 3 viable options still on the table I think: > Token{ byte[] payload, int payloadLength, ...} > Token{ byte[] pay

Re: Payload API

2007-11-19 Thread Yonik Seeley
On Nov 19, 2007 3:18 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > I'm not sure this is good? Don't we want to [efficiently] allow > filters down the line to modify a payload (just like filters can > modify the char[] termBuffer)? Admittedly I would expect it to be > rare but I'm not sure w

Re: Payload API

2007-11-19 Thread Yonik Seeley
On Nov 19, 2007 3:31 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > On Nov 19, 2007 3:13 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > > The filter that is setting payloads can't use the same byte[] with a > > different value each time... it must allocate a new byte[] so it > > doesn't change the ol

Re: Payload API

2007-11-19 Thread Yonik Seeley
On Nov 19, 2007 3:13 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > The filter that is setting payloads can't use the same byte[] with a > different value each time... it must allocate a new byte[] so it > doesn't change the old one, right? OK... I think Mike is right... everything should be OK as

Re: Payload API

2007-11-19 Thread Michael McCandless
"Yonik Seeley" <[EMAIL PROTECTED]> wrote: > On Nov 19, 2007 3:06 PM, Michael Busch <[EMAIL PROTECTED]> wrote: > > Yonik Seeley wrote: > > > > > > That's not immutable if I can change the bytes in the byte[] (and it's > > > legal to do so), since it will result in the value of other payload > > > o

Re: Payload API

2007-11-19 Thread Michael McCandless
"Yonik Seeley" <[EMAIL PROTECTED]> wrote: > On Nov 19, 2007 2:03 PM, Michael McCandless <[EMAIL PROTECTED]> > wrote: > > Maybe you mean that each Token must be fully independent because there > > are plenty of filters that hold onto each Token long after next() is > > called again, and then serve

Re: Payload API

2007-11-19 Thread Yonik Seeley
On Nov 19, 2007 3:06 PM, Michael Busch <[EMAIL PROTECTED]> wrote: > Yonik Seeley wrote: > > > > That's not immutable if I can change the bytes in the byte[] (and it's > > legal to do so), since it will result in the value of other payload > > objects changing. > > > > -Yonik > > > > True. I think w

Re: Payload API

2007-11-19 Thread Michael Busch
Yonik Seeley wrote: > > That's not immutable if I can change the bytes in the byte[] (and it's > legal to do so), since it will result in the value of other payload > objects changing. > > -Yonik > True. I think what I mean here is that the caller that sets the Payload doesn't have to worry abo

Re: Payload API

2007-11-19 Thread Yonik Seeley
On Nov 19, 2007 2:42 PM, Michael Busch <[EMAIL PROTECTED]> wrote: > Yonik Seeley wrote: > > > > Immutable implies that the user needs to do a new byte[] for every payload, > > yes? > > It seems like this would be slower if payloads were common and faster > > if very rare. > > > > No, Payload has t

Re: Payload API

2007-11-19 Thread Michael Busch
Yonik Seeley wrote: > > Immutable implies that the user needs to do a new byte[] for every payload, > yes? > It seems like this would be slower if payloads were common and faster > if very rare. > No, Payload has this ctr: public Payload(byte[] data, int offset, int length); So the same byte[]

Re: Payload API

2007-11-19 Thread Yonik Seeley
On Nov 19, 2007 2:03 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > Maybe you mean that each Token must be fully independent because there > are plenty of filters that hold onto each Token long after next() is > called again, and then serve them up again later Yes. > But this is why we have

Re: Payload API

2007-11-19 Thread Michael McCandless
k I'd lean towards leaving payload "by reference"? > > It seems difficult to allow the payload setter to reuse their byte[], > unless we break back compatibility with other token filters. Do you > have a solution in mind? I think I must be missing something. The payloa

Re: Payload API

2007-11-19 Thread Yonik Seeley
On Nov 19, 2007 1:21 PM, Michael Busch <[EMAIL PROTECTED]> wrote: > Yonik Seeley wrote: > > On Nov 19, 2007 11:38 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: > >>> If we opt to treat payload like termBuffer and copy the bytes, then we > >>> need no offset member. > > I'd argue that the curren

Re: Payload API

2007-11-19 Thread Michael Busch
Yonik Seeley wrote: > On Nov 19, 2007 11:38 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: >>> If we opt to treat payload like termBuffer and copy the bytes, then we >>> need no offset member. I'd argue that the current approach (creating a very lightweight wrapper object that is immutable for

Re: Payload API

2007-11-19 Thread Yonik Seeley
On Nov 19, 2007 11:38 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > If we opt to treat payload like termBuffer and copy the bytes, then we > > need no offset member. > > I think I'd lean towards leaving payload "by reference"? It seems difficult to allow the payload setter to reuse their b

Re: Payload API

2007-11-19 Thread Michael McCandless
"Yonik Seeley" <[EMAIL PROTECTED]> wrote: > On Nov 18, 2007 1:19 PM, Michael McCandless <[EMAIL PROTECTED]> > wrote: > > "Michael Busch" <[EMAIL PROTECTED]> wrote: > > > Oh and Yonik, I think in addition we'd also need a payloadOffset member? > > > > Oh yes, we need offset too. > > I was trying t

Re: Payload API

2007-11-19 Thread Yonik Seeley
On Nov 18, 2007 1:19 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > "Michael Busch" <[EMAIL PROTECTED]> wrote: > > Oh and Yonik, I think in addition we'd also need a payloadOffset member? > > Oh yes, we need offset too. I was trying to save another member, and thought that the offset had fewe

Re: Payload API

2007-11-18 Thread Michael Busch
Michael McCandless wrote: > > Exactly: DocumentsWriter writes the bytes immediately into the > proxStream. You should only need to change addPosition: I think it > can access these new [package protected] fields directly from the > Token, instead of using the separate Payload object. > Cool, I'

Re: Payload API

2007-11-18 Thread Michael McCandless
"Michael Busch" <[EMAIL PROTECTED]> wrote: > Michael McCandless wrote: > > >> > >> class Token { > >> byte[] payload; > >> int payloadLength; > >> void setPayload(byte[], int length) > >> byte[] getPayload() > >> int getPayloadLength() > >> ... > >> } > > > > +1 > > > > Mike

Re: Payload API

2007-11-18 Thread Michael Busch
Michael McCandless wrote: >> >> class Token { >> byte[] payload; >> int payloadLength; >> void setPayload(byte[], int length) >> byte[] getPayload() >> int getPayloadLength() >> ... >> } > > +1 > > Mike > Mike, just to clarify: I had suggested this at the ApacheCon because

Re: Payload API

2007-11-18 Thread Grant Ingersoll
+1 for the signature changes and +1 for adding the field name. -Grant On Nov 18, 2007, at 6:07 AM, Michael McCandless wrote: "Yonik Seeley" <[EMAIL PROTECTED]> wrote: So I think we should change + finalize the payload API before Lucene 2.3 comes out. Single biggest drawba

Re: Payload API

2007-11-18 Thread Michael McCandless
"Yonik Seeley" <[EMAIL PROTECTED]> wrote: > So I think we should change + finalize the payload API before Lucene > 2.3 comes out. > > Single biggest drawback about current payloads is that there isn't any > explicit support for adding different types of pay

Re: Payload API

2007-11-17 Thread Yonik Seeley
Thanks for the reminder Mike, that should be in there too I think. -Yonik On Nov 17, 2007 9:07 PM, Mike Klaas <[EMAIL PROTECTED]> wrote: > At some point there was support for my suggestion of changing the > deserialization api in Similarity from > > public float scorePayload(byte [] payload, int o

Re: Payload API

2007-11-17 Thread Mike Klaas
On 17-Nov-07, at 5:49 PM, Yonik Seeley wrote: So I think we should change + finalize the payload API before Lucene 2.3 comes out. Single biggest drawback about current payloads is that there isn't any explicit support for adding different types of payloads to the same token. I don't

Payload API

2007-11-17 Thread Yonik Seeley
So I think we should change + finalize the payload API before Lucene 2.3 comes out. Single biggest drawback about current payloads is that there isn't any explicit support for adding different types of payloads to the same token. I don't really see a good fix to that though, so I

Re: payload api (scorePayload)

2007-09-10 Thread Mike Klaas
On 10-Sep-07, at 3:00 PM, Grant Ingersoll wrote: What I truly pine for is a way to globally override Similarity on a per-field basis. Wishful thinking... Instead of wishful thinking, let's figure out a patch... :-) Someday, I will find the time to delve more deeply into lucene wishful

Re: payload api (scorePayload)

2007-09-10 Thread Grant Ingersoll
On Sep 10, 2007, at 5:33 PM, Mike Klaas wrote: This is the current api for scorePayload: public float scorePayload(byte [] payload, int offset, int length) { ISTM that this function depends greatly on the field--what if the end user wants to store two completely different kinds of values

Re: payload api (scorePayload)

2007-09-10 Thread Yonik Seeley
On 9/10/07, Mike Klaas <[EMAIL PROTECTED]> wrote: > This is the current api for scorePayload: > >public float scorePayload(byte [] payload, int offset, int length) { > > ISTM that this function depends greatly on the field--what if the end > user wants to store two completely different kinds of

payload api (scorePayload)

2007-09-10 Thread Mike Klaas
This is the current api for scorePayload: public float scorePayload(byte [] payload, int offset, int length) { ISTM that this function depends greatly on the field--what if the end user wants to store two completely different kinds of values in different fields? Could fieldName be added?

Re: Token/Payload API

2007-05-15 Thread Grant Ingersoll
One thing that I forgot to add that is now possible, via the Payload mechanism is based on a comment during your ApacheCon EU presentation, something to the effect that we can't score binary fields. Now with Payload scoring, a binary Field is essentially a Document level payload. It shoul

Re: Token/Payload API

2007-05-11 Thread Yonik Seeley
On 5/11/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote: On May 11, 2007, at 4:31 PM, Yonik Seeley wrote: > I hadn't kept up with the payload discussion/patch, and just got > around to looking at Token. > > public class Token implements Cloneable { > String termText;

Re: Token/Payload API

2007-05-11 Thread Grant Ingersoll
On May 11, 2007, at 4:31 PM, Yonik Seeley wrote: I hadn't kept up with the payload discussion/patch, and just got around to looking at Token. public class Token implements Cloneable { String termText; // the text of the term int startOffset;

Token/Payload API

2007-05-11 Thread Yonik Seeley
I hadn't kept up with the payload discussion/patch, and just got around to looking at Token. public class Token implements Cloneable { String termText; // the text of the term int startOffset; // start in source text int endOffset