Re: Payload TFIDF Similarity in Lucene 7.1.0

2018-03-14 Thread Michael Sokolov
Yes that (LUCENE-7854) was what I was referring to, and you are right that
it stores values as integers. This doesn't necessarily have to be a
blocker; you could scale your values by some factor, I guess.

On Mar 13, 2018 9:36 AM, "Erdan Genc"  wrote:

> @Erik: I didn't know that, how can I figure out which query types support
> payload scoring? The class I described is wrapped into an elasticsearch
> plugin so I don't have full control over this. Currently I'm using the
> SpanTermQuery, maybe another available query type will do, so I don't need
> to implement a custom query parser as well. Thank you!
>
> @Michael: This was my first thought as well but I couldn't find any
> resources when I first searched for it. I just discovered LUCENE-7854
> , the
> DelimitedTermFrequencyTokenFilter, but it can't handle floating values
> right? Thanks!
>
> 2018-03-13 12:14 GMT+01:00 Michael Sokolov :
>
> > Also, if you are no longer using the term frequency at all, you might
> > consider wiring your score (the one you are currently wiring into
> payloads)
> > in there, in place of the term frequency.
> >
> > On Mar 13, 2018 6:57 AM, "Erik Hatcher"  wrote:
> >
> > > Payloads are only scored from certain query types.   What query are you
> > > executing?
> > >
> > > > On Mar 13, 2018, at 04:58, Grdan Eenc 
> > wrote:
> > > >
> > > > Hej there,
> > > >
> > > > I want to extend the TFIDF Similarity class such that the term
> > frequency
> > > is
> > > > neglected and the value in the payload used instead. Therefore I
> > > basically
> > > > do this:
> > > >
> > > >@Override
> > > >public float tf(float freq) {
> > > >return 1f;
> > > >}
> > > >
> > > >public float scorePayload(int doc, int start, int end, BytesRef
> > > > payload) {
> > > >if (payload != null) {
> > > >return PayloadHelper.decodeFloat(payload.bytes,
> > > payload.offset);
> > > >} else {
> > > >return 1f;
> > > >}
> > > >}
> > > >
> > > > Complete class can be found here:
> > > >
> > > > https://gist.github.com/nadre/66be2a2a32214f2c5ec1ec1f6edcef08
> > > >
> > > > Unfortunately the scorePayload never gets called and I end up with
> the
> > > > wrong scoring. I know that scorePayload is deprecated in Lucene 7.2.1
> > but
> > > > it should work in 7.1.0 or am I missing something?
> > > >
> > > > I implemented the same thing by directly extending the basic
> Similarity
> > > > class and iterating through doc terms using the LeafReaderContext,
> > based
> > > on
> > > > the code in this repo:
> > > >
> > > > https://github.com/sdauletau/elasticsearch-position-similarity
> > > >
> > > > This works but is horribly slow which is why I would prefer the first
> > > idea.
> > > >
> > > > Any idea why scorePayload doesn't get called? I really couldn't find
> > any
> > > > resources on the net.
> > > >
> > > > Best, Erdan.
> > >
> > > -
> > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > > For additional commands, e-mail: java-user-h...@lucene.apache.org
> > >
> > >
> >
>


Re: Payload TFIDF Similarity in Lucene 7.1.0

2018-03-13 Thread Erdan Genc
@Erik: I didn't know that, how can I figure out which query types support
payload scoring? The class I described is wrapped into an elasticsearch
plugin so I don't have full control over this. Currently I'm using the
SpanTermQuery, maybe another available query type will do, so I don't need
to implement a custom query parser as well. Thank you!

@Michael: This was my first thought as well but I couldn't find any
resources when I first searched for it. I just discovered LUCENE-7854
, the
DelimitedTermFrequencyTokenFilter, but it can't handle floating values
right? Thanks!

2018-03-13 12:14 GMT+01:00 Michael Sokolov :

> Also, if you are no longer using the term frequency at all, you might
> consider wiring your score (the one you are currently wiring into payloads)
> in there, in place of the term frequency.
>
> On Mar 13, 2018 6:57 AM, "Erik Hatcher"  wrote:
>
> > Payloads are only scored from certain query types.   What query are you
> > executing?
> >
> > > On Mar 13, 2018, at 04:58, Grdan Eenc 
> wrote:
> > >
> > > Hej there,
> > >
> > > I want to extend the TFIDF Similarity class such that the term
> frequency
> > is
> > > neglected and the value in the payload used instead. Therefore I
> > basically
> > > do this:
> > >
> > >@Override
> > >public float tf(float freq) {
> > >return 1f;
> > >}
> > >
> > >public float scorePayload(int doc, int start, int end, BytesRef
> > > payload) {
> > >if (payload != null) {
> > >return PayloadHelper.decodeFloat(payload.bytes,
> > payload.offset);
> > >} else {
> > >return 1f;
> > >}
> > >}
> > >
> > > Complete class can be found here:
> > >
> > > https://gist.github.com/nadre/66be2a2a32214f2c5ec1ec1f6edcef08
> > >
> > > Unfortunately the scorePayload never gets called and I end up with the
> > > wrong scoring. I know that scorePayload is deprecated in Lucene 7.2.1
> but
> > > it should work in 7.1.0 or am I missing something?
> > >
> > > I implemented the same thing by directly extending the basic Similarity
> > > class and iterating through doc terms using the LeafReaderContext,
> based
> > on
> > > the code in this repo:
> > >
> > > https://github.com/sdauletau/elasticsearch-position-similarity
> > >
> > > This works but is horribly slow which is why I would prefer the first
> > idea.
> > >
> > > Any idea why scorePayload doesn't get called? I really couldn't find
> any
> > > resources on the net.
> > >
> > > Best, Erdan.
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
>


Re: Payload TFIDF Similarity in Lucene 7.1.0

2018-03-13 Thread Michael Sokolov
Also, if you are no longer using the term frequency at all, you might
consider wiring your score (the one you are currently wiring into payloads)
in there, in place of the term frequency.

On Mar 13, 2018 6:57 AM, "Erik Hatcher"  wrote:

> Payloads are only scored from certain query types.   What query are you
> executing?
>
> > On Mar 13, 2018, at 04:58, Grdan Eenc  wrote:
> >
> > Hej there,
> >
> > I want to extend the TFIDF Similarity class such that the term frequency
> is
> > neglected and the value in the payload used instead. Therefore I
> basically
> > do this:
> >
> >@Override
> >public float tf(float freq) {
> >return 1f;
> >}
> >
> >public float scorePayload(int doc, int start, int end, BytesRef
> > payload) {
> >if (payload != null) {
> >return PayloadHelper.decodeFloat(payload.bytes,
> payload.offset);
> >} else {
> >return 1f;
> >}
> >}
> >
> > Complete class can be found here:
> >
> > https://gist.github.com/nadre/66be2a2a32214f2c5ec1ec1f6edcef08
> >
> > Unfortunately the scorePayload never gets called and I end up with the
> > wrong scoring. I know that scorePayload is deprecated in Lucene 7.2.1 but
> > it should work in 7.1.0 or am I missing something?
> >
> > I implemented the same thing by directly extending the basic Similarity
> > class and iterating through doc terms using the LeafReaderContext, based
> on
> > the code in this repo:
> >
> > https://github.com/sdauletau/elasticsearch-position-similarity
> >
> > This works but is horribly slow which is why I would prefer the first
> idea.
> >
> > Any idea why scorePayload doesn't get called? I really couldn't find any
> > resources on the net.
> >
> > Best, Erdan.
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Payload TFIDF Similarity in Lucene 7.1.0

2018-03-13 Thread Erik Hatcher
Payloads are only scored from certain query types.   What query are you 
executing?

> On Mar 13, 2018, at 04:58, Grdan Eenc  wrote:
> 
> Hej there,
> 
> I want to extend the TFIDF Similarity class such that the term frequency is
> neglected and the value in the payload used instead. Therefore I basically
> do this:
> 
>@Override
>public float tf(float freq) {
>return 1f;
>}
> 
>public float scorePayload(int doc, int start, int end, BytesRef
> payload) {
>if (payload != null) {
>return PayloadHelper.decodeFloat(payload.bytes, payload.offset);
>} else {
>return 1f;
>}
>}
> 
> Complete class can be found here:
> 
> https://gist.github.com/nadre/66be2a2a32214f2c5ec1ec1f6edcef08
> 
> Unfortunately the scorePayload never gets called and I end up with the
> wrong scoring. I know that scorePayload is deprecated in Lucene 7.2.1 but
> it should work in 7.1.0 or am I missing something?
> 
> I implemented the same thing by directly extending the basic Similarity
> class and iterating through doc terms using the LeafReaderContext, based on
> the code in this repo:
> 
> https://github.com/sdauletau/elasticsearch-position-similarity
> 
> This works but is horribly slow which is why I would prefer the first idea.
> 
> Any idea why scorePayload doesn't get called? I really couldn't find any
> resources on the net.
> 
> Best, Erdan.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org