Re: Dense Vector - Similarity Function

Charlie Hull Tue, 02 Jan 2024 06:51:37 -0800

I'm really glad Alessandro has been explicit about the challenges arounddeveloping open source software such as Solr.

Simply put, there is lots of work and not enough people able or willingto do it. At my previous company Flax we funded a lot of Lucene/Solrdevelopment over the years, and OpenSource Connections also employs Solrcommitters. Sometimes we're lucky enough to find clients willing to fundthe development of specific open source code, but it's sadly rare. Flax,OSC and Sease were/are all small consulting companies who make a livingfrom client work, therefore we have limited time to do 'unpaid' opensource work. This is also generally true for the many other individualcommitters.

If you work for a company who need Lucene or Solr to have a specificfeature or bugfix, *consider funding this development *- you'll get acredit in the code, a chance to get public kudos in other ways (inconference presentations or joint blogs for example), you'll get thefeature built by those who truly understand the code rather than havingto figure it out from scratch, it will happen now rather than at someunspecified time in the future, and it may not cost as much as you think(at OSC we discount our rates for open source work). If it getscommitted, you then get the benefit of this feature being maintainedalongside the rest of Lucene or Solr, rather than being in some horriblepatch you have to retrofit to new versions as they appear. It alsobenefits future users of the code of course, but it benefits you more,and now.

Sease, OSC or any of the others who could do this work are happy to helpyou develop a plan for your boss/management/funders, explain thebenefits and how the process works. You just have to ask!


Best

Charlie

On 02/01/2024 12:20, Alessandro Benedetti wrote:

Are you a Lucene or a Solr user?
I assumed the latter.

In Apache Solr It's not only not released yet, it's not implemented yet.

There's no official roadmap in Apache Solr no there's no release date for
that functionality.
That feature is among the ones I and my company want to contribute but we
lack funding at the moment (
https://sease.io/2023/10/apache-lucene-solr-ai-roadmap-do-you-want-to-make-it-happen.html
).
Of course, you are also welcome to contribute it yourself, as a community
we welcome new contributors.

Cheers
--------------------------
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*

e-mail:a.benede...@sease.io


*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io<http://sease.io/>
LinkedIn<https://linkedin.com/company/sease-ltd>  | Twitter
<https://twitter.com/seaseltd>  | Youtube
<https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ>  | Github
<https://github.com/seaseltd>


On Tue, 2 Jan 2024 at 11:20, Iram Tariq<iram.ta...@northbaysolutions.net>
wrote:

Hi,

What Mikhail mentioned is the answer to all my questions but sadly it is
not released yet. Is there any way I can use this unreleased version for
now or when it is  going to be released ?

and yes I want to rescore all topK results based on recency.

Regards,


Iram Tariq | Software Architect

NorthBay

Direct:  +1 (902) 329-7329

iram.ta...@northbaysolutions.net

www.northbaysolutions.com




On Tue, Jan 2, 2024 at 5:50 AM Alessandro Benedetti<a.benede...@sease.io>
wrote:

Hi Iram, following up on Mikhail's answer:

1) K Nearest Neighbour is a retrieval approach intended to look for the
closest(approximate) K vectors to a query one.
"override the existing function or write a
custom method to give a high score to the latest documents"
  It seems suspicious to me.
It's like you want to combine two features for the final score:

    - Vector Similarity
    - Recency

As you can imagine a first question arises:
How do you want to combine these features?
Linearly? Non linearly? Do you want to re-score the top-k calculating

this

recency?
Learning To Rank  (

https://solr.apache.org/guide/solr/latest/query-guide/learning-to-rank.html

)
or general reranking(

https://solr.apache.org/guide/solr/latest/query-guide/query-re-ranking.html

)
could be what you want.
Please make sure you are familiar with function queries as well (

https://solr.apache.org/guide/solr/latest/query-guide/function-queries.html

)

2) The pull request mentioned by Mikhail is on the spot, but it has not
been ported to Apache Solr yet.
Technically, it is going to be useful, but from a pragmatic perspective,

am much more sceptical:
With current models, finding a threshold won't be easy at all.

Cheers
--------------------------
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*

e-mail:a.benede...@sease.io


*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io<http://sease.io/>
LinkedIn<https://linkedin.com/company/sease-ltd>  | Twitter
<https://twitter.com/seaseltd>  | Youtube
<https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ>  | Github
<https://github.com/seaseltd>


On Mon, 25 Dec 2023 at 22:55, Iram Tariq <

iram.ta...@northbaysolutions.net

wrote:

Hi All,

Right now I am using the cosine similarity function for dense vectors.
Is there any way I can override the existing function or write a
custom method to give a high score to the latest documents.


Also KNN Query Parser returns topK results matched with the input, but

is

there anyway possible we can get all documents for which similarity

score

is greater than a specific number?

Any sort of answer will be helpful. Looking forward for the feedback.

Regards,


Iram Tariq | Software Architect

NorthBay

Direct:  +1 (902) 329-7329

iram.ta...@northbaysolutions.net

www.northbaysolutions.com

--
Charlie Hull - Managing Consultant at OpenSource Connections Limited
Founding member of The Search Network and co-author of Searching the Enterprise
tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828

OpenSource Connections Europe GmbH | Pappelallee 78/79 | 10437 Berlin
Amtsgericht Charlottenburg | HRB 230712 B
Geschäftsführer: John M. Woodell | David E. Pugh
Finanzamt: Berlin Finanzamt für Körperschaften II

Re: Dense Vector - Similarity Function

Reply via email to