I'm really glad Alessandro has been explicit about the challenges around
developing open source software such as Solr.
Simply put, there is lots of work and not enough people able or willing
to do it. At my previous company Flax we funded a lot of Lucene/Solr
development over the years, and OpenSource Connections also employs Solr
committers. Sometimes we're lucky enough to find clients willing to fund
the development of specific open source code, but it's sadly rare. Flax,
OSC and Sease were/are all small consulting companies who make a living
from client work, therefore we have limited time to do 'unpaid' open
source work. This is also generally true for the many other individual
committers.
If you work for a company who need Lucene or Solr to have a specific
feature or bugfix, *consider funding this development *- you'll get a
credit in the code, a chance to get public kudos in other ways (in
conference presentations or joint blogs for example), you'll get the
feature built by those who truly understand the code rather than having
to figure it out from scratch, it will happen now rather than at some
unspecified time in the future, and it may not cost as much as you think
(at OSC we discount our rates for open source work). If it gets
committed, you then get the benefit of this feature being maintained
alongside the rest of Lucene or Solr, rather than being in some horrible
patch you have to retrofit to new versions as they appear. It also
benefits future users of the code of course, but it benefits you more,
and now.
Sease, OSC or any of the others who could do this work are happy to help
you develop a plan for your boss/management/funders, explain the
benefits and how the process works. You just have to ask!
Best
Charlie
On 02/01/2024 12:20, Alessandro Benedetti wrote:
Are you a Lucene or a Solr user?
I assumed the latter.
In Apache Solr It's not only not released yet, it's not implemented yet.
There's no official roadmap in Apache Solr no there's no release date for
that functionality.
That feature is among the ones I and my company want to contribute but we
lack funding at the moment (
https://sease.io/2023/10/apache-lucene-solr-ai-roadmap-do-you-want-to-make-it-happen.html
).
Of course, you are also welcome to contribute it yourself, as a community
we welcome new contributors.
Cheers
--------------------------
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*
e-mail:a.benede...@sease.io
*Sease* - Information Retrieval Applied
Consulting | Training | Open Source
Website: Sease.io<http://sease.io/>
LinkedIn<https://linkedin.com/company/sease-ltd> | Twitter
<https://twitter.com/seaseltd> | Youtube
<https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
<https://github.com/seaseltd>
On Tue, 2 Jan 2024 at 11:20, Iram Tariq<iram.ta...@northbaysolutions.net>
wrote:
Hi,
What Mikhail mentioned is the answer to all my questions but sadly it is
not released yet. Is there any way I can use this unreleased version for
now or when it is going to be released ?
and yes I want to rescore all topK results based on recency.
Regards,
Iram Tariq | Software Architect
NorthBay
Direct: +1 (902) 329-7329
iram.ta...@northbaysolutions.net
www.northbaysolutions.com
On Tue, Jan 2, 2024 at 5:50 AM Alessandro Benedetti<a.benede...@sease.io>
wrote:
Hi Iram, following up on Mikhail's answer:
1) K Nearest Neighbour is a retrieval approach intended to look for the
closest(approximate) K vectors to a query one.
"override the existing function or write a
custom method to give a high score to the latest documents"
It seems suspicious to me.
It's like you want to combine two features for the final score:
- Vector Similarity
- Recency
As you can imagine a first question arises:
How do you want to combine these features?
Linearly? Non linearly? Do you want to re-score the top-k calculating
this
recency?
Learning To Rank (
https://solr.apache.org/guide/solr/latest/query-guide/learning-to-rank.html
)
or general reranking(
https://solr.apache.org/guide/solr/latest/query-guide/query-re-ranking.html
)
could be what you want.
Please make sure you are familiar with function queries as well (
https://solr.apache.org/guide/solr/latest/query-guide/function-queries.html
)
2) The pull request mentioned by Mikhail is on the spot, but it has not
been ported to Apache Solr yet.
Technically, it is going to be useful, but from a pragmatic perspective,
I
am much more sceptical:
With current models, finding a threshold won't be easy at all.
Cheers
--------------------------
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*
e-mail:a.benede...@sease.io
*Sease* - Information Retrieval Applied
Consulting | Training | Open Source
Website: Sease.io<http://sease.io/>
LinkedIn<https://linkedin.com/company/sease-ltd> | Twitter
<https://twitter.com/seaseltd> | Youtube
<https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
<https://github.com/seaseltd>
On Mon, 25 Dec 2023 at 22:55, Iram Tariq <
iram.ta...@northbaysolutions.net
wrote:
Hi All,
Right now I am using the cosine similarity function for dense vectors.
Is there any way I can override the existing function or write a
custom method to give a high score to the latest documents.
Also KNN Query Parser returns topK results matched with the input, but
is
there anyway possible we can get all documents for which similarity
score
is greater than a specific number?
Any sort of answer will be helpful. Looking forward for the feedback.
Regards,
Iram Tariq | Software Architect
NorthBay
Direct: +1 (902) 329-7329
iram.ta...@northbaysolutions.net
www.northbaysolutions.com
--
Charlie Hull - Managing Consultant at OpenSource Connections Limited
Founding member of The Search Network and co-author of Searching the Enterprise
tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828
OpenSource Connections Europe GmbH | Pappelallee 78/79 | 10437 Berlin
Amtsgericht Charlottenburg | HRB 230712 B
Geschäftsführer: John M. Woodell | David E. Pugh
Finanzamt: Berlin Finanzamt für Körperschaften II