Re: [JENKINS] Lucene » Lucene-Coverage-main - Build # 924 - Still Failing!

2023-10-15 Thread Dawid Weiss
Filed a security policy correction to allow jacoco to use a custom class
loader, here:
https://github.com/apache/lucene/pull/12684

On Sun, Oct 15, 2023 at 5:05 PM Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> Build:
> https://ci-builds.apache.org/job/Lucene/job/Lucene-Coverage-main/924/
>
> All tests passed
>
> Build Log:
> [...truncated 1221 lines...]
> BUILD FAILED in 31m 5s
> 320 actionable tasks: 320 executed
>
> Publishing build scan...
> https://ge.apache.org/s/fbhdgk7liizem
>
> Build step 'Invoke Gradle script' changed build result to FAILURE
> Build step 'Invoke Gradle script' marked build as failure
> Archiving artifacts
> Recording test results
> [Checks API] No suitable checks publisher found.
> Email was triggered for: Failure - Any
> Sending email for trigger: Failure - Any
>
> -
> To unsubscribe, e-mail: builds-unsubscr...@lucene.apache.org
> For additional commands, e-mail: builds-h...@lucene.apache.org


Re: Multimodal search

2023-10-15 Thread Michael Wechner

Hi Navneet

I also observe that various "vector search DBs" are implementing hybrid 
search, because the accuracy with embeddings is often not good enough.
Vectors are often too "mushy" and hybrid search can help to improve 
accuracy, just as re-ranking does, but I think there is a better way.


Depending on the dataset and the expertise of a human, answers by 
"humans" are much more accurate, because I think "humans" are extracting 
"features" from input and then operate on these "features". See for example


https://medium.com/aleph-alpha-blog/multimodality-attention-is-all-you-need-is-all-we-needed-526c45abdf0

and see the principles behind DALL-E and CLIP.

I think the same or similar principles could be re-used to implement a 
more accurate search.


I have built a very simple PoC and it looks promising, that using this 
approach provides a much higher accuracy, because the similarity score 
is much more distinct.


Of course there are various challenges, but I think it is worth exploring.

I also understand that within an existing "ecosystem" change, resp. 
trying something new can be difficult, but I guess I am not the only one 
seeing low accuracy as a fundamental problem, right?


Thanks

Michael





Am 14.10.23 um 09:38 schrieb Navneet Verma:

Hi Michael,
Please correct me if I am wrong, I think what you are trying to say 
with multimodal search is to combine both text search and vector 
search to improve the accuracy of search results. As per my 
understanding of search space people are coining this as Hybrid 
search. We recently launched a query clause in OpenSearch called 
"hybrid" which takes this hybrid approach and combines scores of text 
and vector search 
globally(https://opensearch.org/blog/hybrid-search/). As per our 
experiments we saw accuracy being better than text search and vector 
search alone. Just curious if you are thinking something like this or 
you have a completely different thought.


I agree that currently to improve the accuracy of search results there 
have been techniques like re-ranking that are very popular.



Thanks
Navneet

On Fri, Oct 13, 2023 at 8:53 AM Michael Wechner 
 wrote:


Thanks for your feedback and the link to the OpenSearch
implementation!

I think the embedding approach as it exists today is not and will
not be able to provide good enough accuracy.
Many people try to fix this with re-ranking, which helps, but does
not really fix the actual problem.

I think we focus too much on text, because text/language is
actually just a representation of the "models" we create in our
minds from the reality we perceive via our senses.

When you take multimodality into account from the very beginning,
then you will be forced to approach search differently
and I would argue that this will lead to a much more powerful
search implementation, which is able to provide better accuracy
and also the capability that the implementation knows much better
what it does not know.

I do not mean to sound philosophical, but actually have a quite
clear implementation in my mind resp. on paper, but I would be
interested
to know whether the Lucene community is interested to reconsider
search from the ground up?

I think the Lucene community has a fantastic knowledge /
expertise, but I think it is time to evolve quite radically, and
not just do another vector search implementation.

WDYT?

Thanks

Michael







Am 13.10.23 um 00:49 schrieb Michael Froh:

We recently added multimodal search in OpenSearch:
https://github.com/opensearch-project/neural-search/pull/359

Since Lucene ultimately just cares about embeddings, does Lucene
itself really need to be multimodal? Wherever the embeddings come
from, Lucene can index the vectors and combine with textual
queries, right?

Thanks,
Froh

On Thu, Oct 12, 2023 at 12:59 PM Michael Wechner
 wrote:

Hi

Did anyone of the Lucene committers consider making Lucene
multimodal?

With a quick Google search I found for example

https://dl.acm.org/doi/abs/10.1145/3503161.3548768

https://sigir-ecom.github.io/ecom2018/ecom18Papers/paper7.pdf

Thanks

Michael



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org