Grant Ingersoll wrote:
> Scratch my last comment. I was thinking it only pertained to payloads.
>
> In that light, I think we should modify the scorePayload method for the
> time being, then we can deprecate it when we go to per field sim.
>
> -Grant
>
OK sounds good. Will make the change with
Scratch my last comment. I was thinking it only pertained to payloads.
In that light, I think we should modify the scorePayload method for
the time being, then we can deprecate it when we go to per field sim.
-Grant
On Nov 20, 2007, at 2:34 PM, Michael Busch wrote:
Yonik Seeley wrote:
P
Well, we are making an awful lot of improvements for Payloads, I think
we should try to get it in now and make 2.3 wait a bit more, since we
all have more or less agreed that 2.9 (next after 2.3) is going to be
a deprecation release before moving to 3.0
-Grant
On Nov 20, 2007, at 2:34 PM,
Yonik Seeley wrote:
>
> Per field similarity would certainly be more efficient since it moves
> the field->similarity lookup from the inner loop to the outer loop.
>
I agree. Then I'll leave the scorePayload() API as is for now. And I
don't think the per-field similarity should block 2.3, so let
On Nov 20, 2007 2:17 PM, Michael Busch <[EMAIL PROTECTED]> wrote:
> Grant Ingersoll wrote:
> > +1 for adding the field name.
> >
>
> The question is whether we should add the field name to the
> Similarity#scorePayload() method or if we should support a per-field
> similarity in the future?
Per fi
Grant Ingersoll wrote:
> +1 for adding the field name.
>
>
The question is whether we should add the field name to the
Similarity#scorePayload() method or if we should support a per-field
similarity in the future?
-Michael
-
T
"Yonik Seeley" <[EMAIL PROTECTED]> wrote:
> > If we used a Payload object, it would save 8 bytes per Token for
> > fields not using payloads.
Of course with Token reuse, saving 8 bytes isn't important any more
either since it's only allocated once per field.
-Yonik
--
Michael McCandless wrote:
> "Yonik Seeley" <[EMAIL PROTECTED]> wrote:
>> On Nov 19, 2007 6:52 PM, Michael Busch <[EMAIL PROTECTED]> wrote:
>>> Yonik Seeley wrote:
So I think we all agree to do payloads by reference (do not make a
copy of byte[] like termBuffer does), and to allow payload
"Yonik Seeley" <[EMAIL PROTECTED]> wrote:
> On Nov 19, 2007 6:52 PM, Michael Busch <[EMAIL PROTECTED]> wrote:
> > Yonik Seeley wrote:
> > >
> > > So I think we all agree to do payloads by reference (do not make a
> > > copy of byte[] like termBuffer does), and to allow payload reuse.
> > >
> > > So
On Nov 19, 2007 6:52 PM, Michael Busch <[EMAIL PROTECTED]> wrote:
> Yonik Seeley wrote:
> >
> > So I think we all agree to do payloads by reference (do not make a
> > copy of byte[] like termBuffer does), and to allow payload reuse.
> >
> > So now we still have 3 viable options still on the table I
Yonik Seeley wrote:
>
> So I think we all agree to do payloads by reference (do not make a
> copy of byte[] like termBuffer does), and to allow payload reuse.
>
> So now we still have 3 viable options still on the table I think:
> Token{ byte[] payload, int payloadLength, ...}
> Token{ byte[] pay
On Nov 19, 2007 3:18 PM, Michael McCandless <[EMAIL PROTECTED]> wrote:
> I'm not sure this is good? Don't we want to [efficiently] allow
> filters down the line to modify a payload (just like filters can
> modify the char[] termBuffer)? Admittedly I would expect it to be
> rare but I'm not sure w
On Nov 19, 2007 3:31 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> On Nov 19, 2007 3:13 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> > The filter that is setting payloads can't use the same byte[] with a
> > different value each time... it must allocate a new byte[] so it
> > doesn't change the ol
On Nov 19, 2007 3:13 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> The filter that is setting payloads can't use the same byte[] with a
> different value each time... it must allocate a new byte[] so it
> doesn't change the old one, right?
OK... I think Mike is right... everything should be OK as
"Yonik Seeley" <[EMAIL PROTECTED]> wrote:
> On Nov 19, 2007 3:06 PM, Michael Busch <[EMAIL PROTECTED]> wrote:
> > Yonik Seeley wrote:
> > >
> > > That's not immutable if I can change the bytes in the byte[] (and it's
> > > legal to do so), since it will result in the value of other payload
> > > o
"Yonik Seeley" <[EMAIL PROTECTED]> wrote:
> On Nov 19, 2007 2:03 PM, Michael McCandless <[EMAIL PROTECTED]>
> wrote:
> > Maybe you mean that each Token must be fully independent because there
> > are plenty of filters that hold onto each Token long after next() is
> > called again, and then serve
On Nov 19, 2007 3:06 PM, Michael Busch <[EMAIL PROTECTED]> wrote:
> Yonik Seeley wrote:
> >
> > That's not immutable if I can change the bytes in the byte[] (and it's
> > legal to do so), since it will result in the value of other payload
> > objects changing.
> >
> > -Yonik
> >
>
> True. I think w
Yonik Seeley wrote:
>
> That's not immutable if I can change the bytes in the byte[] (and it's
> legal to do so), since it will result in the value of other payload
> objects changing.
>
> -Yonik
>
True. I think what I mean here is that the caller that sets the Payload
doesn't have to worry abo
On Nov 19, 2007 2:42 PM, Michael Busch <[EMAIL PROTECTED]> wrote:
> Yonik Seeley wrote:
> >
> > Immutable implies that the user needs to do a new byte[] for every payload,
> > yes?
> > It seems like this would be slower if payloads were common and faster
> > if very rare.
> >
>
> No, Payload has t
Yonik Seeley wrote:
>
> Immutable implies that the user needs to do a new byte[] for every payload,
> yes?
> It seems like this would be slower if payloads were common and faster
> if very rare.
>
No, Payload has this ctr:
public Payload(byte[] data, int offset, int length);
So the same byte[]
On Nov 19, 2007 2:03 PM, Michael McCandless <[EMAIL PROTECTED]> wrote:
> Maybe you mean that each Token must be fully independent because there
> are plenty of filters that hold onto each Token long after next() is
> called again, and then serve them up again later
Yes.
> But this is why we have
k I'd lean towards leaving payload "by reference"?
>
> It seems difficult to allow the payload setter to reuse their byte[],
> unless we break back compatibility with other token filters. Do you
> have a solution in mind?
I think I must be missing something.
The payloa
On Nov 19, 2007 1:21 PM, Michael Busch <[EMAIL PROTECTED]> wrote:
> Yonik Seeley wrote:
> > On Nov 19, 2007 11:38 AM, Michael McCandless <[EMAIL PROTECTED]> wrote:
> >>> If we opt to treat payload like termBuffer and copy the bytes, then we
> >>> need no offset member.
>
> I'd argue that the curren
Yonik Seeley wrote:
> On Nov 19, 2007 11:38 AM, Michael McCandless <[EMAIL PROTECTED]> wrote:
>>> If we opt to treat payload like termBuffer and copy the bytes, then we
>>> need no offset member.
I'd argue that the current approach (creating a very lightweight wrapper
object that is immutable for
On Nov 19, 2007 11:38 AM, Michael McCandless <[EMAIL PROTECTED]> wrote:
> > If we opt to treat payload like termBuffer and copy the bytes, then we
> > need no offset member.
>
> I think I'd lean towards leaving payload "by reference"?
It seems difficult to allow the payload setter to reuse their b
"Yonik Seeley" <[EMAIL PROTECTED]> wrote:
> On Nov 18, 2007 1:19 PM, Michael McCandless <[EMAIL PROTECTED]>
> wrote:
> > "Michael Busch" <[EMAIL PROTECTED]> wrote:
> > > Oh and Yonik, I think in addition we'd also need a payloadOffset member?
> >
> > Oh yes, we need offset too.
>
> I was trying t
On Nov 18, 2007 1:19 PM, Michael McCandless <[EMAIL PROTECTED]> wrote:
> "Michael Busch" <[EMAIL PROTECTED]> wrote:
> > Oh and Yonik, I think in addition we'd also need a payloadOffset member?
>
> Oh yes, we need offset too.
I was trying to save another member, and thought that the offset had
fewe
Michael McCandless wrote:
>
> Exactly: DocumentsWriter writes the bytes immediately into the
> proxStream. You should only need to change addPosition: I think it
> can access these new [package protected] fields directly from the
> Token, instead of using the separate Payload object.
>
Cool, I'
"Michael Busch" <[EMAIL PROTECTED]> wrote:
> Michael McCandless wrote:
>
> >>
> >> class Token {
> >> byte[] payload;
> >> int payloadLength;
> >> void setPayload(byte[], int length)
> >> byte[] getPayload()
> >> int getPayloadLength()
> >> ...
> >> }
> >
> > +1
> >
> > Mike
Michael McCandless wrote:
>>
>> class Token {
>> byte[] payload;
>> int payloadLength;
>> void setPayload(byte[], int length)
>> byte[] getPayload()
>> int getPayloadLength()
>> ...
>> }
>
> +1
>
> Mike
>
Mike,
just to clarify: I had suggested this at the ApacheCon because
+1 for the signature changes and +1 for adding the field name.
-Grant
On Nov 18, 2007, at 6:07 AM, Michael McCandless wrote:
"Yonik Seeley" <[EMAIL PROTECTED]> wrote:
So I think we should change + finalize the payload API before Lucene
2.3 comes out.
Single biggest drawba
"Yonik Seeley" <[EMAIL PROTECTED]> wrote:
> So I think we should change + finalize the payload API before Lucene
> 2.3 comes out.
>
> Single biggest drawback about current payloads is that there isn't any
> explicit support for adding different types of pay
Thanks for the reminder Mike, that should be in there too I think.
-Yonik
On Nov 17, 2007 9:07 PM, Mike Klaas <[EMAIL PROTECTED]> wrote:
> At some point there was support for my suggestion of changing the
> deserialization api in Similarity from
>
> public float scorePayload(byte [] payload, int o
On 17-Nov-07, at 5:49 PM, Yonik Seeley wrote:
So I think we should change + finalize the payload API before Lucene
2.3 comes out.
Single biggest drawback about current payloads is that there isn't any
explicit support for adding different types of payloads to the same
token.
I don't
So I think we should change + finalize the payload API before Lucene
2.3 comes out.
Single biggest drawback about current payloads is that there isn't any
explicit support for adding different types of payloads to the same
token.
I don't really see a good fix to that though, so I
On 10-Sep-07, at 3:00 PM, Grant Ingersoll wrote:
What I truly pine for is a way to globally override Similarity on
a per-field basis. Wishful thinking...
Instead of wishful thinking, let's figure out a patch... :-)
Someday, I will find the time to delve more deeply into lucene wishful
On Sep 10, 2007, at 5:33 PM, Mike Klaas wrote:
This is the current api for scorePayload:
public float scorePayload(byte [] payload, int offset, int length) {
ISTM that this function depends greatly on the field--what if the
end user wants to store two completely different kinds of values
On 9/10/07, Mike Klaas <[EMAIL PROTECTED]> wrote:
> This is the current api for scorePayload:
>
>public float scorePayload(byte [] payload, int offset, int length) {
>
> ISTM that this function depends greatly on the field--what if the end
> user wants to store two completely different kinds of
This is the current api for scorePayload:
public float scorePayload(byte [] payload, int offset, int length) {
ISTM that this function depends greatly on the field--what if the end
user wants to store two completely different kinds of values in
different fields? Could fieldName be added?
One thing that I forgot to add that is now possible, via the Payload
mechanism is based on a comment during your ApacheCon EU
presentation, something to the effect that we can't score binary
fields. Now with Payload scoring, a binary Field is essentially a
Document level payload. It shoul
On 5/11/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
On May 11, 2007, at 4:31 PM, Yonik Seeley wrote:
> I hadn't kept up with the payload discussion/patch, and just got
> around to looking at Token.
>
> public class Token implements Cloneable {
> String termText;
On May 11, 2007, at 4:31 PM, Yonik Seeley wrote:
I hadn't kept up with the payload discussion/patch, and just got
around to looking at Token.
public class Token implements Cloneable {
String termText; // the text of the term
int startOffset;
I hadn't kept up with the payload discussion/patch, and just got
around to looking at Token.
public class Token implements Cloneable {
String termText; // the text of the term
int startOffset; // start in source text
int endOffset
43 matches
Mail list logo