Document metadata in ranking?

2021-02-25 Thread Philip Warner
I am sorry if this has been covered elsewhere, but I wanted to know if it’s 
possible to inject and use document metadata in ranking search results.

As an example, if I have a pool of documents that fit into 3 broad categories: 
“Peer Reviewed”, “Professional Journalism”, “Blog Post”, then I would like the 
documents from the first categeory to rank higher (all else being equal) than 
those in the second category, and so on.

I can imagine wanting to add more metadata: eg. For highly regarded authors, 
poorly regarded authors, highly rated journals etc etc.

Is this even possible? Is this what payloads are for? 

Any insights would be appreciated!






Re: Document metadata in ranking?

2021-02-25 Thread Paul Libbrecht

Hello Philip,

I’ll answer with a possibility that might be outdated and predates the 
existence of payloads (which I think are non-analysed parts so not 
appropriate).


Lucene has fields and you can include the metadata within fields in form 
of particular tokens. Then you can enrich every query by letting them be 
parsed then adding (maybe only for plain queries?) Termqueries as 
weighted or’s with an and clause which would boost the higher 
metadata.


As for an absolute ordering (where the highest category always comes 
first), you certainly need to add some limits on the scores so that the 
influence of a positive category takes precedence over the different 
orderings (TF-IDF per default).


At the end you can write custom-score-engine but I can only imagine 
ruining the performance when doing so...


paul

On 26 Feb 2021, at 3:40, Philip Warner wrote:

I am sorry if this has been covered elsewhere, but I wanted to know if 
it’s possible to inject and use document metadata in ranking search 
results.


As an example, if I have a pool of documents that fit into 3 broad 
categories: “Peer Reviewed”, “Professional Journalism”, 
“Blog Post”, then I would like the documents from the first 
categeory to rank higher (all else being equal) than those in the 
second category, and so on.


I can imagine wanting to add more metadata: eg. For highly regarded 
authors, poorly regarded authors, highly rated journals etc etc.


Is this even possible? Is this what payloads are for?

Any insights would be appreciated!


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org