Re: Custom Similarity

2018-02-08 Thread Erick Erickson
As of Solr 6.6, payload support has been added to Solr, see:
SOLR-1485. Before that, it was much more difficult, see:
https://lucidworks.com/2014/06/13/end-to-end-payload-example-in-solr/

Best,
Erick

On Thu, Feb 8, 2018 at 8:36 AM, Ahmet Arslan  wrote:
>
>
> Hi Roy,
>
>
> In order to activate payloads during scoring, you need to do two separate 
> things at the same time:
> * use a payload aware query type: org.apache.lucene.queries.payloads.*
> * use payload aware similarity
>
> Here is an old post that might inspire you :  
> https://lucidworks.com/2009/08/05/getting-started-with-payloads/
>
>
> Ahmet
>
>
>
> On Saturday, January 27, 2018, 5:43:36 PM GMT+3, Dwaipayan Roy 
>  wrote:
>
>
>
>
>
> Thanks for your replies. But still, I am not sure about the way to do the
> thing. Can you please provide me with an example code snippet or, link to
> some page where I can find one?
>
> Thanks..
>
> On Tue, Jan 16, 2018 at 3:28 PM, Dwaipayan Roy 
> wrote:
>
>> I want to make a scoring function that will score the documents by the
>> following function:
>> given Q = {q1, q2, ... }
>> score(D,Q) =
>>for all qi:
>>  SUM of {
>>  LOG { weight_1(qi) + weight_2(qi) + weight_3(qi) }
>>  }
>>
>> I have stored weight_1, weight_2 and weight_3 for all term of all
>> documents as payload, with payload delimiter = | (pipe) during indexing.
>>
>> However, I am not sure on how to integrate all the weights during
>> retrieval. I am sure that I have to @Override some score() but not sure
>> about the exact class.
>>
>> Please help me here.
>>
>> Best,
>> Dwaipayan..
>
>>
>>
>
>
> --
> Dwaipayan Roy.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Custom Similarity

2018-02-08 Thread Ahmet Arslan


Hi Roy,


In order to activate payloads during scoring, you need to do two separate 
things at the same time:
* use a payload aware query type: org.apache.lucene.queries.payloads.*
* use payload aware similarity

Here is an old post that might inspire you :  
https://lucidworks.com/2009/08/05/getting-started-with-payloads/


Ahmet



On Saturday, January 27, 2018, 5:43:36 PM GMT+3, Dwaipayan Roy 
 wrote: 





Thanks for your replies. But still, I am not sure about the way to do the
thing. Can you please provide me with an example code snippet or, link to
some page where I can find one?

Thanks..

On Tue, Jan 16, 2018 at 3:28 PM, Dwaipayan Roy 
wrote:

> ​I want to make a scoring function that will score the documents by the
> following function:
> given Q = {q1, q2, ... }
> score(D,Q) =
>    for all qi:
>      SUM of {
>          LOG { weight_1(qi) + weight_2(qi) + weight_3(qi) }
>      }
>
> I have stored weight_1, weight_2 and weight_3 for all term of all
> documents as payload, with payload delimiter = | (pipe) during indexing.
>
> However, I am not sure on how to integrate all the weights during
> retrieval. I am sure that I have to @Override some score() but not sure
> about the exact class.
>
> Please help me here.
> ​
> Best,
> Dwaipayan..​

>
>


-- 
Dwaipayan Roy.


Re: Custom Similarity

2018-01-27 Thread Dwaipayan Roy
Thanks for your replies. But still, I am not sure about the way to do the
thing. Can you please provide me with an example code snippet or, link to
some page where I can find one?

Thanks..

On Tue, Jan 16, 2018 at 3:28 PM, Dwaipayan Roy 
wrote:

> ​I want to make a scoring function that will score the documents by the
> following function:
> given Q = {q1, q2, ... }
> score(D,Q) =
>for all qi:
>   SUM of {
>  LOG { weight_1(qi) + weight_2(qi) + weight_3(qi) }
>   }
>
> I have stored weight_1, weight_2 and weight_3 for all term of all
> documents as payload, with payload delimiter = | (pipe) during indexing.
>
> However, I am not sure on how to integrate all the weights during
> retrieval. I am sure that I have to @Override some score() but not sure
> about the exact class.
>
> Please help me here.
> ​
> Best,
> Dwaipayan..​
>
>


-- 
Dwaipayan Roy.


Re: Custom Similarity

2018-01-16 Thread Adrien Grand
If you are working with payloads, you will also want to have a look at
PayloadScoreQuery.

Le mar. 16 janv. 2018 à 12:26, Michael Sokolov  a
écrit :

> Have a look at Expressions class. It compiles JavaScript that can reference
> other values and can be used for ranking.
>
> On Jan 16, 2018 4:58 AM, "Dwaipayan Roy"  wrote:
>
> > ​I want to make a scoring function that will score the documents by the
> > following function:
> > given Q = {q1, q2, ... }
> > score(D,Q) =
> >for all qi:
> >   SUM of {
> >  LOG { weight_1(qi) + weight_2(qi) + weight_3(qi) }
> >   }
> >
> > I have stored weight_1, weight_2 and weight_3 for all term of all
> documents
> > as payload, with payload delimiter = | (pipe) during indexing.
> >
> > However, I am not sure on how to integrate all the weights during
> > retrieval. I am sure that I have to @Override some score() but not sure
> > about the exact class.
> >
> > Please help me here.
> > ​
> > Best,
> > Dwaipayan..​
> >
>


Re: Custom Similarity

2018-01-16 Thread Michael Sokolov
Have a look at Expressions class. It compiles JavaScript that can reference
other values and can be used for ranking.

On Jan 16, 2018 4:58 AM, "Dwaipayan Roy"  wrote:

> ​I want to make a scoring function that will score the documents by the
> following function:
> given Q = {q1, q2, ... }
> score(D,Q) =
>for all qi:
>   SUM of {
>  LOG { weight_1(qi) + weight_2(qi) + weight_3(qi) }
>   }
>
> I have stored weight_1, weight_2 and weight_3 for all term of all documents
> as payload, with payload delimiter = | (pipe) during indexing.
>
> However, I am not sure on how to integrate all the weights during
> retrieval. I am sure that I have to @Override some score() but not sure
> about the exact class.
>
> Please help me here.
> ​
> Best,
> Dwaipayan..​
>


Re: Custom Similarity

2011-10-08 Thread ppp c
That's what phaseQuery does.
Try phaseQuery to match the overlap, i think

On Sat, Oct 8, 2011 at 3:37 PM, Joel Halbert j...@su3analytics.com wrote:

 Hi,

 Does anyone have a modified scoring (Similarity) function they would
 care to share?

 I'm searching web page documents and find the default Similarity seems
 to assign too much weight to documents with frequent occurrence of a
 single term from the query and not enough weight to documents that
 contain a greater overlap of the search query terms.

 I've been playing around with overriding the default but wondering if
 anyone has an implementation they have found to work well that they
 would care to share.

 Thanks in advance,
 Joel


 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org




Re: Custom Similarity

2011-10-08 Thread Robert Muir
On Sat, Oct 8, 2011 at 3:37 AM, Joel Halbert j...@su3analytics.com wrote:
 Hi,

 Does anyone have a modified scoring (Similarity) function they would
 care to share?

 I'm searching web page documents and find the default Similarity seems
 to assign too much weight to documents with frequent occurrence of a
 single term from the query and not enough weight to documents that
 contain a greater overlap of the search query terms.

 I've been playing around with overriding the default but wondering if
 anyone has an implementation they have found to work well that they
 would care to share.


have a look at coord(), you might want to further punish documents
that don't contain all the query terms.

something like:

@Override
public float coord(int overlap, int maxOverlap) {
  return (overlap == maxOverlap)
  ? 1f
  : 0.5f * super.coord(overlap, maxOverlap);
}


-- 
lucidimagination.com

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: custom similarity based on tf but greater than 1.0

2007-01-23 Thread Vagelis Kotsonis

Even if I get what a I want using the coord method, I would still have the
same problem becuase the similarity would return a number  1 and
afterwards, the scoring mechanisms would normilize these number to something
1.0

Thank you!
Vagelis


Otis Gospodnetic wrote:
 
 Jumping in at this point and not having read other responses, I think the
 function that Vangelis is looking for is coord method in Similarity -
 that's for document terms/query terms overlap, I believe.
 
 Otis
 
 - Original Message 
 From: Mark Miller [EMAIL PROTECTED]
 To: java-user@lucene.apache.org
 Sent: Thursday, January 18, 2007 5:36:21 PM
 Subject: Re: custom similarity based on tf but greater than 1.0
 
 I just did the same thing. If you search the list you'll find the thread 
 where Hoss gave me the info you need. It really comes down to makeing a 
 FakeNormsIndexReader. The problem you are having is a result of the 
 field size normalization.
 
 - mark
 
 Vagelis Kotsonis wrote:
 Hi all.
 I am trying to make some experiments in an algorithm that scores results
 by
 counting how many words of the query submited are in a document.

 For example if i enter the query 

 A B D A

 The similarities I want to get for the documents follows:

 A A C F D (2-found A and D)
 A B D S S A (3 - found A, B and D)
 D D D (1 - only found D)

 I built a Similarity that actually sets everything's price as 1.0f except
 tf

 The tf functions returns 1.0f if freq0 and 0.0f else.

 I think that this change does count what I want, but when it comes to
 show
 the score, all are normalized. So, the greater similarity is equal to
 1.0f
 and the others are lower than 1.0f

 How can I deactivate the score normalization?

 Thank you!

 I want to 
   
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 

-- 
View this message in context: 
http://www.nabble.com/custom-similarity-based-on-tf-but-greater-than-1.0-tf3037071.html#a8550043
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: custom similarity based on tf but greater than 1.0

2007-01-23 Thread Vagelis Kotsonis

So the normalization was made through Hits. That was something I didn't
understand. 
If I was alone I would search in Scorer and query classes. 

Thank you for this.

Finally I used the following:

final HitQueue hq = new HitQueue(results.length());
searcher.search(qr, new HitCollector() {
public void collect(int doc, float score) {
hq.insert(new ScoreDoc(doc, score));
}
});
ScoreDoc[] scoreDocs = new ScoreDoc[hq.size()];
for (int i = hq.size()-1; i = 0; i--)// put docs in array
scoreDocs[i] = (ScoreDoc)hq.pop();

(the HitQueue extend PriorityQueue of lucene library)

I didn't understand how to use the TopDocs, so I followed the above example
I found in the following link:
http://www.devdaily.com/java/jwarehouse/lucene-1.3-final/src/java/org/apache/lucene/search/IndexSearcher.java.shtml
http://www.devdaily.com/java/jwarehouse/lucene-1.3-final/src/java/org/apache/lucene/search/IndexSearcher.java.shtml
 

The only problem I have is the case when the HitsQueue size must be
predefined...I don't know what to do then.
Currently I submit the same query twice, one for getting the size of the
results and one to use with the above code.

Thank you very much for your help!
Vagelis


markrmiller wrote:
 
 A...I brushed over your example too fast...looked like normal 
 counting to me...I see now what you mean. So OMIT_NORMS probably did 
 work. Are you getting the results through hits? Hits will normalize. Use 
 topdocs or a hitcollector.
 
 - Mark
 
 Vagelis Kotsonis wrote:
 But i don't want to get the frequency of each term in the doc.

 what I want is 1 if the term exists in the doc and 0 if it doesn't. After
 this, I want all thes 1s and 0s to be summed and give me a number to use
 as
 a score.

 If I set the TF value as 1 or 0, as I described above, I get the right
 number, but this number is normalized to 1.0 and smaller numbers.

 It is the normalization that I want to avoid.

 Thanks again!
 Vagelis


 markrmiller wrote:
   
 Dont return 1 for tf...just return the tf straight with no 
 changes...return freq. For everything else return 1. After that 
 OMIT_NORMS should work. If you want to try a custom reader:

 public class FakeNormsIndexReader extends FilterIndexReader {
 byte[] ones = SegmentReader.createFakeNorms(maxDoc());

 public FakeNormsIndexReader(IndexReader in) {
 super(in);
 }
 public synchronized byte[] norms(String field) throws IOException {
   System.out.println(returning fake norms...);
 return ones;
 }

 public synchronized void norms(String field, byte[] result, int 
 offset) {
   System.out.println(writing fake norms...);
 System.arraycopy(ones, 0, result, offset, maxDoc());
 }
 }

 The beauty of this reader is that you can flip between it and your 
 custom similarity and Lucene's default implementations live on the same 
 index.

 - Mark


 

   
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 

-- 
View this message in context: 
http://www.nabble.com/custom-similarity-based-on-tf-but-greater-than-1.0-tf3037071.html#a8550786
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: custom similarity based on tf but greater than 1.0

2007-01-19 Thread Otis Gospodnetic
Jumping in at this point and not having read other responses, I think the 
function that Vangelis is looking for is coord method in Similarity - that's 
for document terms/query terms overlap, I believe.

Otis

- Original Message 
From: Mark Miller [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Sent: Thursday, January 18, 2007 5:36:21 PM
Subject: Re: custom similarity based on tf but greater than 1.0

I just did the same thing. If you search the list you'll find the thread 
where Hoss gave me the info you need. It really comes down to makeing a 
FakeNormsIndexReader. The problem you are having is a result of the 
field size normalization.

- mark

Vagelis Kotsonis wrote:
 Hi all.
 I am trying to make some experiments in an algorithm that scores results by
 counting how many words of the query submited are in a document.

 For example if i enter the query 

 A B D A

 The similarities I want to get for the documents follows:

 A A C F D (2-found A and D)
 A B D S S A (3 - found A, B and D)
 D D D (1 - only found D)

 I built a Similarity that actually sets everything's price as 1.0f except tf

 The tf functions returns 1.0f if freq0 and 0.0f else.

 I think that this change does count what I want, but when it comes to show
 the score, all are normalized. So, the greater similarity is equal to 1.0f
 and the others are lower than 1.0f

 How can I deactivate the score normalization?

 Thank you!

 I want to 
   

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: custom similarity based on tf but greater than 1.0

2007-01-18 Thread Mark Miller
I just did the same thing. If you search the list you'll find the thread 
where Hoss gave me the info you need. It really comes down to makeing a 
FakeNormsIndexReader. The problem you are having is a result of the 
field size normalization.


- mark

Vagelis Kotsonis wrote:

Hi all.
I am trying to make some experiments in an algorithm that scores results by
counting how many words of the query submited are in a document.

For example if i enter the query 


A B D A

The similarities I want to get for the documents follows:

A A C F D (2-found A and D)
A B D S S A (3 - found A, B and D)
D D D (1 - only found D)

I built a Similarity that actually sets everything's price as 1.0f except tf

The tf functions returns 1.0f if freq0 and 0.0f else.

I think that this change does count what I want, but when it comes to show
the score, all are normalized. So, the greater similarity is equal to 1.0f
and the others are lower than 1.0f

How can I deactivate the score normalization?

Thank you!

I want to 
  


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: custom similarity based on tf but greater than 1.0

2007-01-18 Thread Vagelis Kotsonis

Before I make this questions I have been looking the list for over 2 hours
and I didn't find something to make me understand how to do what I want.

After you sent the message I made a quick pass through all your messages,
but I didn't find something. I also searched for FakeNormsIndexReader and
still didn't find something.

Can you plz tell me, if you remember, how you did it? 

Thank you!
Vagelis


markrmiller wrote:
 
 I just did the same thing. If you search the list you'll find the thread 
 where Hoss gave me the info you need. It really comes down to makeing a 
 FakeNormsIndexReader. The problem you are having is a result of the 
 field size normalization.
 
 - mark
 
-- 
View this message in context: 
http://www.nabble.com/custom-similarity-based-on-tf-but-greater-than-1.0-tf3037071.html#a8441331
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: custom similarity based on tf but greater than 1.0

2007-01-18 Thread Mark Miller
Sorry your having trouble find it! Allow me...bingo: 
http://www.gossamer-threads.com/lists/lucene/java-user/43251?search_string=sorting%20by%20per%20doc%20hit;#43251


Prob doesn't have great keyword for finding it. That should get you 
going though. Let me know if you have any questions.


- Mark

Vagelis Kotsonis wrote:

Before I make this questions I have been looking the list for over 2 hours
and I didn't find something to make me understand how to do what I want.

After you sent the message I made a quick pass through all your messages,
but I didn't find something. I also searched for FakeNormsIndexReader and
still didn't find something.

Can you plz tell me, if you remember, how you did it? 


Thank you!
Vagelis


markrmiller wrote:
  
I just did the same thing. If you search the list you'll find the thread 
where Hoss gave me the info you need. It really comes down to makeing a 
FakeNormsIndexReader. The problem you are having is a result of the 
field size normalization.


- mark




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: custom similarity based on tf but greater than 1.0

2007-01-18 Thread Vagelis Kotsonis

I feel kind of stupid...I don't get what hossman says in his post.

I got the thing abou the OMMIT_NORMS and I tried to do it by calling
Field.setOmitNorms(true); before adding a field in the index. After that I
re-indexed my collection but still not making any difference.

Tell me if I got it right. The second solution that you followed is building
a custom FilteredIndexReader and implement these 2 functions :

byte[] norms(String field)
void norms(String field, byte[] result, int offset)

Did I  get it right?

Thank you and excuse me for continuously asking the same thing.
Vagelis


markrmiller wrote:
 
 Sorry your having trouble find it! Allow me...bingo: 
 http://www.gossamer-threads.com/lists/lucene/java-user/43251?search_string=sorting%20by%20per%20doc%20hit;#43251
 
 Prob doesn't have great keyword for finding it. That should get you 
 going though. Let me know if you have any questions.
 
 - Mark
 
 

-- 
View this message in context: 
http://www.nabble.com/custom-similarity-based-on-tf-but-greater-than-1.0-tf3037071.html#a8442395
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: custom similarity based on tf but greater than 1.0

2007-01-18 Thread Mark Miller
Dont return 1 for tf...just return the tf straight with no 
changes...return freq. For everything else return 1. After that 
OMIT_NORMS should work. If you want to try a custom reader:


public class FakeNormsIndexReader extends FilterIndexReader {
   byte[] ones = SegmentReader.createFakeNorms(maxDoc());

   public FakeNormsIndexReader(IndexReader in) {
   super(in);
   }
   public synchronized byte[] norms(String field) throws IOException {
 System.out.println(returning fake norms...);
   return ones;
   }

   public synchronized void norms(String field, byte[] result, int 
offset) {

 System.out.println(writing fake norms...);
   System.arraycopy(ones, 0, result, offset, maxDoc());
   }
}

The beauty of this reader is that you can flip between it and your 
custom similarity and Lucene's default implementations live on the same 
index.


- Mark


Vagelis Kotsonis wrote:

I feel kind of stupid...I don't get what hossman says in his post.

I got the thing abou the OMMIT_NORMS and I tried to do it by calling
Field.setOmitNorms(true); before adding a field in the index. After that I
re-indexed my collection but still not making any difference.

Tell me if I got it right. The second solution that you followed is building
a custom FilteredIndexReader and implement these 2 functions :

byte[] norms(String field)
void norms(String field, byte[] result, int offset)

Did I  get it right?

Thank you and excuse me for continuously asking the same thing.
Vagelis


markrmiller wrote:
  
Sorry your having trouble find it! Allow me...bingo: 
http://www.gossamer-threads.com/lists/lucene/java-user/43251?search_string=sorting%20by%20per%20doc%20hit;#43251


Prob doesn't have great keyword for finding it. That should get you 
going though. Let me know if you have any questions.


- Mark





  


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: custom similarity based on tf but greater than 1.0

2007-01-18 Thread Vagelis Kotsonis

But i don't want to get the frequency of each term in the doc.

what I want is 1 if the term exists in the doc and 0 if it doesn't. After
this, I want all thes 1s and 0s to be summed and give me a number to use as
a score.

If I set the TF value as 1 or 0, as I described above, I get the right
number, but this number is normalized to 1.0 and smaller numbers.

It is the normalization that I want to avoid.

Thanks again!
Vagelis


markrmiller wrote:
 
 Dont return 1 for tf...just return the tf straight with no 
 changes...return freq. For everything else return 1. After that 
 OMIT_NORMS should work. If you want to try a custom reader:
 
 public class FakeNormsIndexReader extends FilterIndexReader {
 byte[] ones = SegmentReader.createFakeNorms(maxDoc());
 
 public FakeNormsIndexReader(IndexReader in) {
 super(in);
 }
 public synchronized byte[] norms(String field) throws IOException {
   System.out.println(returning fake norms...);
 return ones;
 }
 
 public synchronized void norms(String field, byte[] result, int 
 offset) {
   System.out.println(writing fake norms...);
 System.arraycopy(ones, 0, result, offset, maxDoc());
 }
 }
 
 The beauty of this reader is that you can flip between it and your 
 custom similarity and Lucene's default implementations live on the same 
 index.
 
 - Mark
 
 

-- 
View this message in context: 
http://www.nabble.com/custom-similarity-based-on-tf-but-greater-than-1.0-tf3037071.html#a8442711
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: custom similarity based on tf but greater than 1.0

2007-01-18 Thread Vagelis Kotsonis

It is 4 in the morning here in Greece, so I will try it tomorrow...sometime I
must sleep!
I will come up with the results tomorrow.

Thanks!
Vagelis


markrmiller wrote:
 
 A...I brushed over your example too fast...looked like normal 
 counting to me...I see now what you mean. So OMIT_NORMS probably did 
 work. Are you getting the results through hits? Hits will normalize. Use 
 topdocs or a hitcollector.
 
 - Mark
 
 

-- 
View this message in context: 
http://www.nabble.com/custom-similarity-based-on-tf-but-greater-than-1.0-tf3037071.html#a8442944
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]