from:"Michael Wechner"

Re: Does Lucene Vector Search support int8 and / or even binary?

2024-03-29 Thread Michael Wechner


thanks for your feedback and pointers!

To play with binary vectors the following project might be useful

https://github.com/cohere-ai/BinaryVectorDB

Re Lucene, I will try to better understand what you suggest below.

Thanks

Michael

Am 29.03.24 um 07:35 schrieb Shubham Chaudhary:

btw, what about native binary embedding quantization support by Lucene?


This sounds like a good idea to have in Lucene.

Would this require another VetctorField /VectorsFormat?


Based on current implementation, one way would be to use another KNN format
or alternatively maybe a better approach would be to make
Lucene99ScalarQuantizedVectorsFormat
<https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsFormat.html>
configurable
to accept the type of quantization like this new work in progress PR for
int4 quantization <https://github.com/apache/lucene/pull/13197> support
which takes the number of bits to use for quantizing as input. Since this
change allows passing 1 for bits to be used for quantization, it looks to
me like an enabler for binary quantization.

- Shubham

On Sun, Mar 24, 2024 at 4:34 AM Michael Wechner 
wrote:


btw, what about native binary embedding quantization support by Lucene?


https://www.linkedin.com/posts/tomaarsen_binary-and-scalar-embedding-quantization-activity-7176966403332132864-lJzH?utm_source=share_medium=member_desktop

Would this require another VetctorField /VectorsFormat?

Thanks

Michael

Am 19.03.24 um 21:57 schrieb Shubham Chaudhary:

Hi Michael,

Lucene already had int8 vector support since 9.5 (#1054
<https://github.com/apache/lucene/pull/1054>) but it was left to the

user

to get those quantized vectors and index using KnnByteVectorField
<

https://lucene.apache.org/core/9_5_0/core/org/apache/lucene/document/KnnByteVectorField.html

,
but with Lucene 9.9 out now there is a native support for int8 scalar
quantization (#12582 <https://github.com/apache/lucene/pull/12582>)

using

Lucene99ScalarQuantizedVectorsFormat
<

https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsFormat.html

that
expects a confidence interval from 90-100. Here is a nice blog(s) that
talks about how it works in Lucene.

-


https://www.elastic.co/search-labs/blog/articles/scalar-quantization-in-lucene

-

https://www.elastic.co/search-labs/blog/articles/scalar-quantization-101

Some other references :
-


https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsFormat.html

-


https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsReader.html

-


https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsWriter.html



On Wed, Mar 20, 2024 at 1:54 AM Michael Wechner <

michael.wech...@wyona.com>

wrote:


Hi

Cohere recently announced there "compressed" embeddings

https://twitter.com/Nils_Reimers/status/1769809006762037368



https://www.linkedin.com/posts/bhavsarpratik_rag-genai-search-activity-7175850704928989187-Ki1N/?utm_source=share_medium=member_desktop

Does Lucene Vector Search support this already, or is somebody working
on this?

Thanks

Michael

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Support of RRF (Reciprocal Rank Fusion) by Lucene?

2024-03-26 Thread Michael Wechner


thanks, will try to get started asap :-)

Am 26.03.24 um 15:37 schrieb Adrien Grand:

GitHub issue or PR directly, whatever works best for you is going to work
for us.

On Tue, Mar 26, 2024 at 3:12 PM Michael Wechner 
wrote:


Hi Adrien

Cool, thanks for your quick feedback!

Yes, IIUC it should not be too difficult.

Should I create github issue to discuss in more detail

https://github.com/apache/lucene/issues

Thanks

Michael

Am 26.03.24 um 14:56 schrieb Adrien Grand:

Hey Michael,

I agree that it would be a nice addition. Plus it should be pretty easy

to

implement. This sounds like a good fit for a utility method on the

TopDocs

class?

On Tue, Mar 26, 2024 at 2:54 PM Michael Wechner <

michael.wech...@wyona.com>

wrote:


Hi

IIUC Lucene does not contain a RRF implementation, for example to merge
keyword/BM25 and vector search results, right?

I think it would be nice to have within Lucene, WDYT?

Thanks

Michael

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Support of RRF (Reciprocal Rank Fusion) by Lucene?

2024-03-26 Thread Michael Wechner


Hi Adrien

Cool, thanks for your quick feedback!

Yes, IIUC it should not be too difficult.

Should I create github issue to discuss in more detail

https://github.com/apache/lucene/issues

Thanks

Michael

Am 26.03.24 um 14:56 schrieb Adrien Grand:

Hey Michael,

I agree that it would be a nice addition. Plus it should be pretty easy to
implement. This sounds like a good fit for a utility method on the TopDocs
class?

On Tue, Mar 26, 2024 at 2:54 PM Michael Wechner 
wrote:


Hi

IIUC Lucene does not contain a RRF implementation, for example to merge
keyword/BM25 and vector search results, right?

I think it would be nice to have within Lucene, WDYT?

Thanks

Michael

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Support of RRF (Reciprocal Rank Fusion) by Lucene?

2024-03-26 Thread Michael Wechner


Hi

IIUC Lucene does not contain a RRF implementation, for example to merge 
keyword/BM25 and vector search results, right?


I think it would be nice to have within Lucene, WDYT?

Thanks

Michael

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Does Lucene Vector Search support int8 and / or even binary?

2024-03-23 Thread Michael Wechner


btw, what about native binary embedding quantization support by Lucene?

https://www.linkedin.com/posts/tomaarsen_binary-and-scalar-embedding-quantization-activity-7176966403332132864-lJzH?utm_source=share_medium=member_desktop

Would this require another VetctorField /VectorsFormat?

Thanks

Michael

Am 19.03.24 um 21:57 schrieb Shubham Chaudhary:

Hi Michael,

Lucene already had int8 vector support since 9.5 (#1054
<https://github.com/apache/lucene/pull/1054>) but it was left to the user
to get those quantized vectors and index using KnnByteVectorField
<https://lucene.apache.org/core/9_5_0/core/org/apache/lucene/document/KnnByteVectorField.html>,
but with Lucene 9.9 out now there is a native support for int8 scalar
quantization (#12582 <https://github.com/apache/lucene/pull/12582>) using
Lucene99ScalarQuantizedVectorsFormat
<https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsFormat.html>
that
expects a confidence interval from 90-100. Here is a nice blog(s) that
talks about how it works in Lucene.

-
https://www.elastic.co/search-labs/blog/articles/scalar-quantization-in-lucene
- https://www.elastic.co/search-labs/blog/articles/scalar-quantization-101

Some other references :
-
https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsFormat.html
-
https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsReader.html
-
https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsWriter.html



On Wed, Mar 20, 2024 at 1:54 AM Michael Wechner 
wrote:


Hi

Cohere recently announced there "compressed" embeddings

https://twitter.com/Nils_Reimers/status/1769809006762037368

https://www.linkedin.com/posts/bhavsarpratik_rag-genai-search-activity-7175850704928989187-Ki1N/?utm_source=share_medium=member_desktop

Does Lucene Vector Search support this already, or is somebody working
on this?

Thanks

Michael

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Does Lucene Vector Search support int8 and / or even binary?

2024-03-22 Thread Michael Wechner


Hi Shubham

Thanks again for your pointers!

I have implemented it now into Katie https://github.com/wyona/katie-backend

https://github.com/wyona/katie-backend/blob/8e73368ff2da1375471605d568f0dafb2c82e47d/src/main/java/com/wyona/katie/handlers/LuceneVectorSearchQuestionAnswerImpl.java#L224C20-L224C41

and it works very fine so far, whereas I have tested it with the Cohere 
int8 embeddings.


Thanks

Michael



Am 20.03.24 um 06:56 schrieb Michael Wechner:

Hi Shubham

Thanks very much for your feedback!

I will try it asap :-)

Michael

Am 19.03.24 um 21:57 schrieb Shubham Chaudhary:

Hi Michael,

Lucene already had int8 vector support since 9.5 (#1054
<https://github.com/apache/lucene/pull/1054>) but it was left to the 
user

to get those quantized vectors and index using KnnByteVectorField
<https://lucene.apache.org/core/9_5_0/core/org/apache/lucene/document/KnnByteVectorField.html>, 


but with Lucene 9.9 out now there is a native support for int8 scalar
quantization (#12582 <https://github.com/apache/lucene/pull/12582>) 
using

Lucene99ScalarQuantizedVectorsFormat
<https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsFormat.html> 


that
expects a confidence interval from 90-100. Here is a nice blog(s) that
talks about how it works in Lucene.

-
https://www.elastic.co/search-labs/blog/articles/scalar-quantization-in-lucene 

- 
https://www.elastic.co/search-labs/blog/articles/scalar-quantization-101


Some other references :
-
https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsFormat.html 


-
https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsReader.html 


-
https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsWriter.html 





On Wed, Mar 20, 2024 at 1:54 AM Michael Wechner 


wrote:


Hi

Cohere recently announced there "compressed" embeddings

https://twitter.com/Nils_Reimers/status/1769809006762037368

https://www.linkedin.com/posts/bhavsarpratik_rag-genai-search-activity-7175850704928989187-Ki1N/?utm_source=share_medium=member_desktop 



Does Lucene Vector Search support this already, or is somebody working
on this?

Thanks

Michael

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org







-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Does Lucene Vector Search support int8 and / or even binary?

2024-03-19 Thread Michael Wechner


Hi Shubham

Thanks very much for your feedback!

I will try it asap :-)

Michael

Am 19.03.24 um 21:57 schrieb Shubham Chaudhary:

Hi Michael,

Lucene already had int8 vector support since 9.5 (#1054
<https://github.com/apache/lucene/pull/1054>) but it was left to the user
to get those quantized vectors and index using KnnByteVectorField
<https://lucene.apache.org/core/9_5_0/core/org/apache/lucene/document/KnnByteVectorField.html>,
but with Lucene 9.9 out now there is a native support for int8 scalar
quantization (#12582 <https://github.com/apache/lucene/pull/12582>) using
Lucene99ScalarQuantizedVectorsFormat
<https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsFormat.html>
that
expects a confidence interval from 90-100. Here is a nice blog(s) that
talks about how it works in Lucene.

-
https://www.elastic.co/search-labs/blog/articles/scalar-quantization-in-lucene
- https://www.elastic.co/search-labs/blog/articles/scalar-quantization-101

Some other references :
-
https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsFormat.html
-
https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsReader.html
-
https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsWriter.html



On Wed, Mar 20, 2024 at 1:54 AM Michael Wechner 
wrote:


Hi

Cohere recently announced there "compressed" embeddings

https://twitter.com/Nils_Reimers/status/1769809006762037368

https://www.linkedin.com/posts/bhavsarpratik_rag-genai-search-activity-7175850704928989187-Ki1N/?utm_source=share_medium=member_desktop

Does Lucene Vector Search support this already, or is somebody working
on this?

Thanks

Michael

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Does Lucene Vector Search support int8 and / or even binary?

2024-03-19 Thread Michael Wechner


Hi

Cohere recently announced there "compressed" embeddings

https://twitter.com/Nils_Reimers/status/1769809006762037368
https://www.linkedin.com/posts/bhavsarpratik_rag-genai-search-activity-7175850704928989187-Ki1N/?utm_source=share_medium=member_desktop

Does Lucene Vector Search support this already, or is somebody working 
on this?


Thanks

Michael

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Right Way to Read vectors from Index

2024-02-12 Thread Michael Wechner

it is good, that you asked this question, because it made me realize, 
that there is some room for improvement for our own application :-)


Thanks

Michael

Am 12.02.24 um 11:35 schrieb Uthra:

@Gautam - Thanks for your response. Our leaf reader API approach is the one you 
mentioned. I wanted to make sure the best way to read vectors for our case.

@Michael - Yes Michael that’s the case here.

Regards,
Uthra


On 12-Feb-2024, at 1:23 PM, Michael Wechner  wrote:

thanks for explainig, Uthra!

IIUC the text / data for which the vector was originally generated was not 
changed, only some other data (e.g. meta data) which is also part of the Lucene 
document, right?
So, if you want to update the other data within the Lucene document, you first 
retrieve the Lucene document, create a new Lucene document, update the changed 
data, but keep the unchanged vector, which means you don't need to re-generate 
the vector, right?

Thanks

Michael



Am 11.02.24 um 13:39 schrieb Uthra:

Hi Michael,
The use case is to handle index updates along with its vector field 
without resending the vector in change data every time. The change data will 
consist of only “updated_field(s):value(s)” wherein I will read the vector 
value from Index to update the document.

Thanks,
Uthra


On 09-Feb-2024, at 7:13 PM, Michael Wechner  wrote:

Can you describe your use case in more detail (beyond having to read the 
vectors)?

Thanks

Michael

Am 09.02.24 um 12:28 schrieb Uthra:

Hi,
Our project uses Lucene 9_7_0 and we have a requirement of frequent 
vector read operation from the Index for a set of documents. We tried two 
approaches
1. Index vector as Stored field and retrieve whenever needed using StoredFields 
APIs.
2. Using LeafReader’s API to read vector. Here the Random accessing of 
documents is very slow.
Which one is the right approach and can you suggest me a better approach.Also 
why isn’t there a straightforward API like the StoredFields API to read vector.

Regards,
Uthra

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Right Way to Read vectors from Index

2024-02-11 Thread Michael Wechner


thanks for explainig, Uthra!

IIUC the text / data for which the vector was originally generated was 
not changed, only some other data (e.g. meta data) which is also part of 
the Lucene document, right?
So, if you want to update the other data within the Lucene document, you 
first retrieve the Lucene document, create a new Lucene document, update 
the changed data, but keep the unchanged vector, which means you don't 
need to re-generate the vector, right?


Thanks

Michael



Am 11.02.24 um 13:39 schrieb Uthra:

Hi Michael,
The use case is to handle index updates along with its vector field 
without resending the vector in change data every time. The change data will 
consist of only “updated_field(s):value(s)” wherein I will read the vector 
value from Index to update the document.

Thanks,
Uthra


On 09-Feb-2024, at 7:13 PM, Michael Wechner  wrote:

Can you describe your use case in more detail (beyond having to read the 
vectors)?

Thanks

Michael

Am 09.02.24 um 12:28 schrieb Uthra:

Hi,
Our project uses Lucene 9_7_0 and we have a requirement of frequent 
vector read operation from the Index for a set of documents. We tried two 
approaches
1. Index vector as Stored field and retrieve whenever needed using StoredFields 
APIs.
2. Using LeafReader’s API to read vector. Here the Random accessing of 
documents is very slow.
Which one is the right approach and can you suggest me a better approach.Also 
why isn’t there a straightforward API like the StoredFields API to read vector.

Regards,
Uthra


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Right Way to Read vectors from Index

2024-02-09 Thread Michael Wechner

Can you describe your use case in more detail (beyond having to read the 
vectors)?


Thanks

Michael

Am 09.02.24 um 12:28 schrieb Uthra:

Hi,
Our project uses Lucene 9_7_0 and we have a requirement of frequent 
vector read operation from the Index for a set of documents. We tried two 
approaches
1. Index vector as Stored field and retrieve whenever needed using StoredFields 
APIs.
2. Using LeafReader’s API to read vector. Here the Random accessing of 
documents is very slow.
Which one is the right approach and can you suggest me a better approach.Also 
why isn’t there a straightforward API like the StoredFields API to read vector.

Regards,
Uthra



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: hnsw parameters for vector search

2024-01-30 Thread Michael Wechner

Re your "second" question about suboptimal results, I think Nils Reimers 
explains quite nicely why this might happen, see for example


https://www.youtube.com/watch?v=Abh3YCahyqU

HTH

Michael



Am 30.01.24 um 15:48 schrieb Moll, Dr. Andreas:

Hi,

the hnsw documentation for the Lucene HnswGraph and the SolR vector search is 
not very verbose, especially in regards to the parameters hnswMaxConn and 
hnswBeamWidth.
I find it hard to come up with sensible values for these parameters by reading 
the paper from 2018.
Does anyone have experience with the influence of the parameters on the 
results? As far as I understand the code the graph is created at indexing time 
so it would be time intensive to come up with the optimal values for a specific 
use case by trial and error?

We have a SolR index with roughly 100 million embeddings and in a synthetic 
randomized benchmarks around 14% percent of requests will result in a 
suboptimal answer (based on the cosine vector similarity).
I expected this "error" rate to be much smaller. I would love to hear your 
experiences.

Best regards

Andreas Moll




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Katie released as Open Source under the Apache License 2.0 using Lucene for full text and vector search by default

2024-01-24 Thread Michael Wechner


Hi Together

Yesterday, Katie got released as Open Source under the Apache License 
2.0 using Lucene for full text and vector search by default.


You can find the code on GitHub https://github.com/wyona/katie-backend

A very big thank you to everyone working on Lucene, to make this great 
search software available and useable for everyone!


I have setup a small Katie demo using the Lucene FAQs

https://lucene-faq.katie.qa/

whereas for this demo the vector embeddings model "all-mpnet-base-v2" is 
being used.


Thanks

Michael

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Azure AI Search uses Apache Lucene for full text search

2024-01-24 Thread Michael Wechner


You can find a working, very simple implementation at

https://github.com/wyona/katie-backend/blob/main/src/main/java/com/wyona/katie/handlers/AzureAISearchImpl.java

whereas to test it you might to download / install Katie locally

https://github.com/wyona/katie-backend

I will spend some more time on it during the next couple of days and 
keep you posted once I will have gained more experience.


Thanks

Michael




Am 22.01.24 um 09:06 schrieb Ali Akhtar:

Sure, please share

On Mon, Jan 22, 2024 at 1:33 AM Michael Wechner 
wrote:


Hi

I recently noticed, that Azure AI Search uses Apache Lucene
<https://lucene.apache.org/> for full text search


https://learn.microsoft.com/en-us/azure/search/search-lucene-query-architecture

which I did not know so far, but I think it is very cool, that Microsoft
is using Lucene.

The documentation is saying, that the integration is not exhaustive yet,
but that it is being extended.

I am currently playing with it's Java SDK


https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/search/azure-search-documents/src/samples

and would be happy to share my findings / experience in case somebody is
interested.

Thanks

Michael



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Azure AI Search uses Apache Lucene for full text search

2024-01-21 Thread Michael Wechner


Hi

I recently noticed, that Azure AI Search uses Apache Lucene 
 for full text search


https://learn.microsoft.com/en-us/azure/search/search-lucene-query-architecture

which I did not know so far, but I think it is very cool, that Microsoft 
is using Lucene.


The documentation is saying, that the integration is not exhaustive yet, 
but that it is being extended.


I am currently playing with it's Java SDK

https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/search/azure-search-documents/src/samples

and would be happy to share my findings / experience in case somebody is 
interested.


Thanks

Michael

Re: Old codecs may only be used for reading

2024-01-11 Thread Michael Wechner


Hi Adrian, thank you very much for confirming quickly!

I switched to Lucene99Codec and all looks good again :-)

Thanks

Michael

Am 11.01.24 um 10:47 schrieb Adrien Grand:

Hey Michael. Your understanding is correct.

On Thu, Jan 11, 2024 at 10:46 AM Michael Wechner 
wrote:


Hi

I recently upgraded from Lucene 9.8.0 to Lucene 9.9.1 and noticed that
Lucene95Codec got moved to

org.apache.lucene.backward_codecs.lucene95.Lucene95Codec

When testing my code I received the following error message:

"Old codecs may only be used for reading"

Do I understand correctly, that Lucene95Codec can not be used for writing
anymore?

And that I should use now Lucene99Codec for writing?

Thanks

Michael






-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Old codecs may only be used for reading

2024-01-11 Thread Michael Wechner


Hi

I recently upgraded from Lucene 9.8.0 to Lucene 9.9.1 and noticed that 
Lucene95Codec got moved to

org.apache.lucene.backward_codecs.lucene95.Lucene95Codec

When testing my code I received the following error message:

"Old codecs may only be used for reading"

Do I understand correctly, that Lucene95Codec can not be used for writing 
anymore?

And that I should use now Lucene99Codec for writing?

Thanks

Michael

Re: Proof of concept for a Luke IntelliJ plugin

2023-11-16 Thread Michael Wechner

 that?

Also, although, I don't want to jump ahead too much, since I work as a
freelancer, I have to prioritize what projects I work on. Thus, I must

ask,

if this project got under the Apache umbrella, would there be a chance

of

receiveing a financial compensation for this work (maybe under some

kind of

contract) and potential longer-term maintenance?

Best regards,
Tamás

---Tamás Balog
Freelance JetBrains IDE Plugin Developer

Find me on: GitHub / JetBrains Marketplace / LinkedIn / Website

Proton Mail biztonságos e-maillel küldve.

2023. november 13., hétfő 10:33 keltezéssel, Michael Wechner <
michael.wech...@wyona.com> írta:


Hi Tamas

Can one download your plugin somewhere to test it?

Thanks

Michael

Am 13.11.23 um 10:07 schrieb Balog Tamás:


Hello everyone!

I've been working on a proof of concept of creating an IntelliJ

plugin

from the Luke application and it reached a demoable state.

If anyone of the Lucene core maintainers, team leads, etc. is
interested, I'd be glad to demonstrate it and discuss a potential

future

for the plugin. In that case please let me know who and where I

could

contact directly.

Or, if I'm at the wrong place for such a topic, please point me to

the

right direction. :)

Cheers,
Tamás Balog

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Proof of concept for a Luke IntelliJ plugin

2023-11-16 Thread Michael Wechner


Hi Tamas

Am 16.11.23 um 08:55 schrieb Balog Tamás:

Michael, Alessandro, thank you for the links and the summary. I appreciate the 
help.

Meanwhile, I also read up on the sent documents/pages, and yeah, I see it's 
based on donation of existing projects, and the financial aspects aside, the 
incubation process would seemingly require more capacity on my side that I can 
spare now.

For now, I might put this project aside, but I'll see how I could host and 
release this under my GitHub account at first. (Without violating license 
terms, as the plugin's code base is still more than 90% the same as Luke's, as 
I simply reused its source code, so that I don't have to reimplement 
everything.)


AFAIK Luke is also using Apache License 2.0, so you should not have a 
licensing issue re Luke itself at least




@Michael: I built the plugin .zip file. It is about 55MB due to having to 
bundle the dependent Lucene jars too. Is it ok if I share it on Google Drive 
and send you a link to it to your wyona email address?


yes, that would be great!

Thanks

Michael



Cheers,
Tamás


2023. november 15., szerda 16:41 keltezéssel, Alessandro Benedetti 
 írta:



Hi Balog,
first of all, thanks, I think it's a cool idea!

In regards to giving it to the Apache Software Foundation, it works as a
donation:
you assign a permissive license (Apache) and contribute it as a Lucene
module (most likely, like it happened with Luke) or a separate project
(unlikely to deserve a separate one as Luke has been donated itself to
Lucene).
This won't happen automatically, you'll need to  a Lucene

committer it's good code and a good idea and have it merged one day.

You won't get any money, but if you continue contributing, you may become
an Apache Lucene committer one day.
Also, that won't give you any money, it's rather the opposite, it's
volunteering work.

Given that, there are other ways of making money out of a contribution:

- a company becomes interested in such a contribution and pays you to
maintain it
- companies start using that part of Lucene and ask the creator to do some
consulting

As a personal recommendation, If you are a freelancer and need money I
wouldn't suggest pursuing this activity as a quick revenue channel (because
it's unlikely you'll see any money anytime soon), do it if you want to
donate and benefit the broader community (i.e. you get the money from other
projects)

Hope it helps,

Cheers
--
Alessandro Benedetti
Director @ Sease Ltd.
Apache Lucene/Solr Committer
Apache Solr PMC Member

e-mail: a.benede...@sease.io


Sease - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io http://sease.io/

LinkedIn https://linkedin.com/company/sease-ltd | Twitter

https://twitter.com/seaseltd | Youtube

https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ | Github

https://github.com/seaseltd



On Wed, 15 Nov 2023 at 07:51, Balog Tamás picim...@protonmail.com.invalid

wrote:


Hi Michael,

Not at the moment, but I can share privately a built plugin archive (a
.zip file) that one can install manually.

If the plugin looks suitable, I think the best place for it would be under
the Apache GitHub organization, and my question is, are you someone who can
help with that?

Also, although, I don't want to jump ahead too much, since I work as a
freelancer, I have to prioritize what projects I work on. Thus, I must ask,
if this project got under the Apache umbrella, would there be a chance of
receiveing a financial compensation for this work (maybe under some kind of
contract) and potential longer-term maintenance?

Best regards,
Tamás

---Tamás Balog
Freelance JetBrains IDE Plugin Developer

Find me on: GitHub / JetBrains Marketplace / LinkedIn / Website

Proton Mail biztonságos e-maillel küldve.

2023. november 13., hétfő 10:33 keltezéssel, Michael Wechner <
michael.wech...@wyona.com> írta:


Hi Tamas

Can one download your plugin somewhere to test it?

Thanks

Michael

Am 13.11.23 um 10:07 schrieb Balog Tamás:


Hello everyone!

I've been working on a proof of concept of creating an IntelliJ plugin
from the Luke application and it reached a demoable state.

If anyone of the Lucene core maintainers, team leads, etc. is
interested, I'd be glad to demonstrate it and discuss a potential future
for the plugin. In that case please let me know who and where I could
contact directly.

Or, if I'm at the wrong place for such a topic, please point me to the
right direction. :)

Cheers,
Tamás Balog

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

-
To unsub

Re: Proof of concept for a Luke IntelliJ plugin

2023-11-15 Thread Michael Wechner


Hi Tamas

Am 15.11.23 um 08:51 schrieb Balog Tamás:

Hi Michael,

Not at the moment, but I can share privately a built plugin archive (a .zip 
file) that one can install manually.


I would be happy to test it




If the plugin looks suitable, I think the best place for it would be under the 
Apache GitHub organization, and my question is, are you someone who can help 
with that?



Please see https://incubator.apache.org/



Also, although, I don't want to jump ahead too much, since I work as a 
freelancer, I have to prioritize what projects I work on. Thus, I must ask, if 
this project got under the Apache umbrella, would there be a chance of 
receiveing a financial compensation for this work (maybe under some kind of 
contract) and potential longer-term maintenance?


Please see https://www.apache.org/foundation/how-it-works/

All the best

Michael





Best regards,
Tamás



---Tamás Balog
Freelance JetBrains IDE Plugin Developer

Find me on: GitHub / JetBrains Marketplace / LinkedIn / Website

Proton Mail biztonságos e-maillel küldve.

2023. november 13., hétfő 10:33 keltezéssel, Michael Wechner 
 írta:



Hi Tamas

Can one download your plugin somewhere to test it?

Thanks

Michael



Am 13.11.23 um 10:07 schrieb Balog Tamás:


Hello everyone!

I've been working on a proof of concept of creating an IntelliJ plugin from the 
Luke application and it reached a demoable state.

If anyone of the Lucene core maintainers, team leads, etc. is interested, I'd 
be glad to demonstrate it and discuss a potential future for the plugin. In 
that case please let me know who and where I could contact directly.

Or, if I'm at the wrong place for such a topic, please point me to the right 
direction. :)

Cheers,
Tamás Balog



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Proof of concept for a Luke IntelliJ plugin

2023-11-13 Thread Michael Wechner


Hi Tamas

Can one download your plugin somewhere to test it?

Thanks

Michael



Am 13.11.23 um 10:07 schrieb Balog Tamás:

Hello everyone!

I've been working on a proof of concept of creating an IntelliJ plugin from the 
Luke application and it reached a demoable state.

If anyone of the Lucene core maintainers, team leads, etc. is interested, I'd 
be glad to demonstrate it and discuss a potential future for the plugin. In 
that case please let me know who and where I could contact directly.

Or, if I'm at the wrong place for such a topic, please point me to the right 
direction. :)

Cheers,
Tamás Balog



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: How to get terms of a particular field of a particular document

2023-11-13 Thread Michael Wechner


I just realize, that the code can be even simpler:

String text ="Apache Lucen is a great search library!"; TokenStream stream = 
TokenSources.getTokenStream(null,null, text,new StandardAnalyzer(), -1);
stream.reset();// INFO: See 
https://lucene.apache.org/core/9_8_0/core/org/apache/lucene/analysis/TokenStream.html 
while (stream.incrementToken()) {

log.info("Token: " + stream.getAttribute(CharTermAttribute.class));
}
stream.end();
stream.close();

The code also seems to work without stream.end() but if I understand the 
documentation at
https://lucene.apache.org/core/9_8_0/core/org/apache/lucene/analysis/TokenStream.html
 correctly, then one should add it.

Thanks

Michael


Am 12.11.23 um 23:36 schrieb Michael Wechner:

Thanks again, whereas I think I have found now what I wanted (without needing 
the Highlighter):

IndexReader reader = DirectoryReader.open(„index_directory");
log.info("Get terms of document ...");
TokenStream stream = TokenSources.getTokenStream(„field_name", null, text, 
analyzer, -1);
stream.reset();
while (stream.incrementToken()) {
 log.info("Term: " + stream.getAttribute(CharTermAttribute.class));
}
stream.close();
reader.close()

Thanks

Michael





Am 12.11.2023 um 22:00 schrieb Mikhail Khludnev:

it's something over there
https://github.com/apache/lucene/blob/4e2ce76b3e131ba92b7327a52460e6c4d92c5e33/lucene/highlighter/src/java/org/apache/lucene/search/highlight/Highlighter.java#L159


On Sun, Nov 12, 2023 at 11:42 PM Michael Wechner
wrote:


Hi Mikhail

Thank you very much for your feedback!

I have found various examples for the first option when running a query,
e.g.

https://howtodoinjava.com/lucene/lucene-search-highlight-example/

but don't understand how to implement the second option, resp. how to
get the extracted terms of a document field independent of a query?

Can you maybe give a code example?

Thanks

Michael



Am 12.11.23 um 18:46 schrieb Mikhail Khludnev:

Hello,
This is what highlighters do. There are two options:
  - index termVectors, obtain them in search time.
  - obtain the stored field value, analyse it again, get all terms.
  Good Luck

On Sun, Nov 12, 2023 at 7:47 PM Michael Wechner <

michael.wech...@wyona.com>

wrote:


HI

IIUC I can get all terms of a particular field of an index with

IndexReader reader = DirectoryReader.open(„index_directory");
List list = reader.leaves();
for (LeafReaderContext lrc : list) {
 Terms terms = lrc.reader().terms(„field_name");
 if (terms != null) {
 TermsEnum termsEnum = terms.iterator();
 BytesRef term = null;
 while ((term = termsEnum.next()) != null) {
 System.out.println("Term: " + term.utf8ToString());
 }
 }
}
reader.close();
But how I can get all terms of a particular field of a particular

document?

Thanks
Michael

P.S.: Btw, does it make sense to update the Lucene FAQ



https://cwiki.apache.org/confluence/display/lucene/lucenefaq#LuceneFAQ-HowdoIretrieveallthevaluesofaparticularfieldthatexistswithinanindex,acrossalldocuments

?
with the code above?
I can do this, but want to make sure, that I don’t update it in a wrong
way.






-
To unsubscribe, e-mail:java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail:java-user-h...@lucene.apache.org



--
Sincerely yours
Mikhail Khludnev

Re: How to get terms of a particular field of a particular document

2023-11-12 Thread Michael Wechner

Thanks again, whereas I think I have found now what I wanted (without needing 
the Highlighter):

IndexReader reader = DirectoryReader.open(„index_directory");
log.info("Get terms of document ...");
TokenStream stream = TokenSources.getTokenStream(„field_name", null, text, 
analyzer, -1);
stream.reset();
while (stream.incrementToken()) {
log.info("Term: " + stream.getAttribute(CharTermAttribute.class));
}
stream.close();
reader.close()

Thanks

Michael




> Am 12.11.2023 um 22:00 schrieb Mikhail Khludnev :
> 
> it's something over there
> https://github.com/apache/lucene/blob/4e2ce76b3e131ba92b7327a52460e6c4d92c5e33/lucene/highlighter/src/java/org/apache/lucene/search/highlight/Highlighter.java#L159
> 
> 
> On Sun, Nov 12, 2023 at 11:42 PM Michael Wechner 
> wrote:
> 
>> Hi Mikhail
>> 
>> Thank you very much for your feedback!
>> 
>> I have found various examples for the first option when running a query,
>> e.g.
>> 
>> https://howtodoinjava.com/lucene/lucene-search-highlight-example/
>> 
>> but don't understand how to implement the second option, resp. how to
>> get the extracted terms of a document field independent of a query?
>> 
>> Can you maybe give a code example?
>> 
>> Thanks
>> 
>> Michael
>> 
>> 
>> 
>> Am 12.11.23 um 18:46 schrieb Mikhail Khludnev:
>>> Hello,
>>> This is what highlighters do. There are two options:
>>>  - index termVectors, obtain them in search time.
>>>  - obtain the stored field value, analyse it again, get all terms.
>>>  Good Luck
>>> 
>>> On Sun, Nov 12, 2023 at 7:47 PM Michael Wechner <
>> michael.wech...@wyona.com>
>>> wrote:
>>> 
>>>> HI
>>>> 
>>>> IIUC I can get all terms of a particular field of an index with
>>>> 
>>>> IndexReader reader = DirectoryReader.open(„index_directory");
>>>> List list = reader.leaves();
>>>> for (LeafReaderContext lrc : list) {
>>>> Terms terms = lrc.reader().terms(„field_name");
>>>> if (terms != null) {
>>>> TermsEnum termsEnum = terms.iterator();
>>>> BytesRef term = null;
>>>> while ((term = termsEnum.next()) != null) {
>>>> System.out.println("Term: " + term.utf8ToString());
>>>> }
>>>> }
>>>> }
>>>> reader.close();
>>>> But how I can get all terms of a particular field of a particular
>> document?
>>>> Thanks
>>>> Michael
>>>> 
>>>> P.S.: Btw, does it make sense to update the Lucene FAQ
>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/lucene/lucenefaq#LuceneFAQ-HowdoIretrieveallthevaluesofaparticularfieldthatexistswithinanindex,acrossalldocuments
>>>> ?
>>>> with the code above?
>>>> I can do this, but want to make sure, that I don’t update it in a wrong
>>>> way.
>>>> 
>>>> 
>>>> 
>>>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> 
>> 
> 
> -- 
> Sincerely yours
> Mikhail Khludnev

Re: How to get terms of a particular field of a particular document

2023-11-12 Thread Michael Wechner


Hi Mikhail

Thank you very much for your feedback!

I have found various examples for the first option when running a query, 
e.g.


https://howtodoinjava.com/lucene/lucene-search-highlight-example/

but don't understand how to implement the second option, resp. how to 
get the extracted terms of a document field independent of a query?


Can you maybe give a code example?

Thanks

Michael



Am 12.11.23 um 18:46 schrieb Mikhail Khludnev:

Hello,
This is what highlighters do. There are two options:
  - index termVectors, obtain them in search time.
  - obtain the stored field value, analyse it again, get all terms.
  Good Luck

On Sun, Nov 12, 2023 at 7:47 PM Michael Wechner 
wrote:


HI

IIUC I can get all terms of a particular field of an index with

IndexReader reader = DirectoryReader.open(„index_directory");
List list = reader.leaves();
for (LeafReaderContext lrc : list) {
 Terms terms = lrc.reader().terms(„field_name");
 if (terms != null) {
 TermsEnum termsEnum = terms.iterator();
 BytesRef term = null;
 while ((term = termsEnum.next()) != null) {
 System.out.println("Term: " + term.utf8ToString());
 }
 }
}
reader.close();
But how I can get all terms of a particular field of a particular document?
Thanks
Michael

P.S.: Btw, does it make sense to update the Lucene FAQ

https://cwiki.apache.org/confluence/display/lucene/lucenefaq#LuceneFAQ-HowdoIretrieveallthevaluesofaparticularfieldthatexistswithinanindex,acrossalldocuments
?
with the code above?
I can do this, but want to make sure, that I don’t update it in a wrong
way.







-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

How to get terms of a particular field of a particular document

2023-11-12 Thread Michael Wechner

HI

IIUC I can get all terms of a particular field of an index with

IndexReader reader = DirectoryReader.open(„index_directory");
List list = reader.leaves();
for (LeafReaderContext lrc : list) {
Terms terms = lrc.reader().terms(„field_name");
if (terms != null) {
TermsEnum termsEnum = terms.iterator();
BytesRef term = null;
while ((term = termsEnum.next()) != null) {
System.out.println("Term: " + term.utf8ToString());
}
}
}
reader.close();
But how I can get all terms of a particular field of a particular document?
Thanks
Michael

P.S.: Btw, does it make sense to update the Lucene FAQ
https://cwiki.apache.org/confluence/display/lucene/lucenefaq#LuceneFAQ-HowdoIretrieveallthevaluesofaparticularfieldthatexistswithinanindex,acrossalldocuments?
with the code above?
I can do this, but want to make sure, that I don’t update it in a wrong way.

Re: Field[vector]vector's dimensions must be <= [1024]; got 1536

2023-11-08 Thread Michael Wechner


Hi Uwe

Thank you very much for confirming the code!

Yes, I only set it for the IndexWriter, but what I meant to ask was, 
what if the default Codec gets updated
and I will update my implementation, then I guess I will have to reindex 
from scratch, right?


Or can I assume that the default Codec is backwards compatible also for 
writing?


Thanks

Michael



Am 08.11.23 um 09:25 schrieb Uwe Schindler:

Hi Michael,

The version below looks correct. Of course the Solr version is able to 
do much more. The code you posted limits it to the bare minimum:


 * subclass default codec
 * implement getKnnVectorsFormatForField() and return the wrapper with
   other max dimension

Reading indexes still works with unmodified default codec, you only 
need to set it for IndexWriter. When reading the actual codec is 
looked up by name.


Uwe

Am 07.11.2023 um 17:03 schrieb Michael Wechner:

Hi Uwe

Thanks again for your feedback, I got it working now :-)

I am using a simplified version, which I will post below, such that 
it might help others, at least as long as this implementation makes 
sense.


Btw, when a new version of Lucene gets released, how do I best find 
out that  "Lucene95Codec" is still the most recent default codec or 
that there is a new default codec?


Thanks

Michael

---

@Autowired private LuceneCodecFactoryluceneCodecFactory;

IndexWriterConfig iwc =new IndexWriterConfig();
iwc.setCodec(luceneCodecFactory.getCodec());



package com.erkigsnek.webapp.services;

import org.apache.lucene.codecs.Codec;
import org.apache.lucene.codecs.KnnVectorsFormat;
import org.apache.lucene.codecs.KnnVectorsReader;
import org.apache.lucene.codecs.KnnVectorsWriter;
import org.apache.lucene.codecs.lucene95.Lucene95Codec;
import org.apache.lucene.codecs.lucene95.Lucene95HnswVectorsFormat;
import org.apache.lucene.index.SegmentReadState;
import org.apache.lucene.index.SegmentWriteState;
import org.springframework.stereotype.Component;
import lombok.extern.slf4j.Slf4j;

import java.io.IOException;

@Slf4j @Component public class LuceneCodecFactory {

    private final int maxDimensions =16384;/** * */ public Codec 
getCodec() {

    //return Lucene95Codec.getDefault(); log.info("Get codec ...");
    Codec codec =new Lucene95Codec() {
    @Override public KnnVectorsFormat 
getKnnVectorsFormatForField(String field) {

    var delegate =new Lucene95HnswVectorsFormat();
    log.info("Maximum Vector Dimension: " +maxDimensions);
    return new 
DelegatingKnnVectorsFormat(delegate,maxDimensions);

    }
    };

    return codec;
    }
}

/** * This class exists because Lucene95HnswVectorsFormat's 
getMaxDimensions method is final and we * need to workaround that 
constraint to allow more than the default number of dimensions */ 
@Slf4j class DelegatingKnnVectorsFormatextends KnnVectorsFormat {

    private final KnnVectorsFormatdelegate;
    private final int maxDimensions;

    public DelegatingKnnVectorsFormat(KnnVectorsFormat delegate,int 
maxDimensions) {

    super(delegate.getName());
    this.delegate = delegate;
    this.maxDimensions = maxDimensions;
    }

    @Override public KnnVectorsWriter fieldsWriter(SegmentWriteState 
state)throws IOException {

    return delegate.fieldsWriter(state);
    }

    @Override public KnnVectorsReader fieldsReader(SegmentReadState 
state)throws IOException {

    return delegate.fieldsReader(state);
    }

    @Override public int getMaxDimensions(String fieldName) {
    log.info("Maximum vector dimension: " +maxDimensions);
    return maxDimensions;
    }
}






Am 19.10.23 um 11:23 schrieb Uwe Schindler:

Hi Michael,

The max vector dimension limit is no longer checked in the field 
type as it is responsibility of the codec to enforce it.


You need to build your own codec that returns a different setting so 
it can be enforced by IndexWriter. See Apache Solr's code how to 
wrap the existing KnnVectorsFormat so it returns another limit: 
<https://github.com/apache/solr/blob/6d50c592fb0b7e0ea2e52ecf1cde7e882e1d0d0a/solr/core/src/java/org/apache/solr/core/SchemaCodecFactory.java#L159-L183> 



Basically you need to subclass Lucene95Codec like done here: 
<https://github.com/apache/solr/blob/6d50c592fb0b7e0ea2e52ecf1cde7e882e1d0d0a/solr/core/src/java/org/apache/solr/core/SchemaCodecFactory.java#L99-L146> 
and return a different vectors format like a delegator as descirbed 
before.


The responsibility was shifted to the codec, because there may be 
better alternatives to HNSW that have different limits especially 
with regard to performance during merging and query response times, 
e.g. BKD trees.


Uwe

Am 19.10.2023 um 10:53 schrieb Michael Wechner:
I forgot to mention, that when using the custom FieldType and 1536 
vector dimension does work with Lucene 9.7.0


Thanks

Michael



Am 19.10.23 um 10:39 schrieb Micha

Re: Field[vector]vector's dimensions must be <= [1024]; got 1536

2023-11-07 Thread Michael Wechner


Hi Uwe

Thanks again for your feedback, I got it working now :-)

I am using a simplified version, which I will post below, such that it 
might help others, at least as long as this implementation makes sense.


Btw, when a new version of Lucene gets released, how do I best find out 
that  "Lucene95Codec" is still the most recent default codec or that 
there is a new default codec?


Thanks

Michael

---

@Autowired private LuceneCodecFactoryluceneCodecFactory;

IndexWriterConfig iwc =new IndexWriterConfig();
iwc.setCodec(luceneCodecFactory.getCodec());



package com.erkigsnek.webapp.services;

import org.apache.lucene.codecs.Codec;
import org.apache.lucene.codecs.KnnVectorsFormat;
import org.apache.lucene.codecs.KnnVectorsReader;
import org.apache.lucene.codecs.KnnVectorsWriter;
import org.apache.lucene.codecs.lucene95.Lucene95Codec;
import org.apache.lucene.codecs.lucene95.Lucene95HnswVectorsFormat;
import org.apache.lucene.index.SegmentReadState;
import org.apache.lucene.index.SegmentWriteState;
import org.springframework.stereotype.Component;
import lombok.extern.slf4j.Slf4j;

import java.io.IOException;

@Slf4j @Component public class LuceneCodecFactory {

private final int maxDimensions =16384;/** * */ public Codec getCodec() {
//return Lucene95Codec.getDefault(); log.info("Get codec ...");
Codec codec =new Lucene95Codec() {
@Override public KnnVectorsFormat 
getKnnVectorsFormatForField(String field) {
var delegate =new Lucene95HnswVectorsFormat();
log.info("Maximum Vector Dimension: " +maxDimensions);
return new DelegatingKnnVectorsFormat(delegate,maxDimensions);
}
};

return codec;
}
}

/** * This class exists because Lucene95HnswVectorsFormat's 
getMaxDimensions method is final and we * need to workaround that 
constraint to allow more than the default number of dimensions */ @Slf4j 
class DelegatingKnnVectorsFormatextends KnnVectorsFormat {

private final KnnVectorsFormatdelegate;
private final int maxDimensions;

public DelegatingKnnVectorsFormat(KnnVectorsFormat delegate,int 
maxDimensions) {
super(delegate.getName());
this.delegate = delegate;
this.maxDimensions = maxDimensions;
}

@Override public KnnVectorsWriter fieldsWriter(SegmentWriteState 
state)throws IOException {
return delegate.fieldsWriter(state);
}

@Override public KnnVectorsReader fieldsReader(SegmentReadState 
state)throws IOException {
return delegate.fieldsReader(state);
}

@Override public int getMaxDimensions(String fieldName) {
log.info("Maximum vector dimension: " +maxDimensions);
return maxDimensions;
}
}






Am 19.10.23 um 11:23 schrieb Uwe Schindler:

Hi Michael,

The max vector dimension limit is no longer checked in the field type 
as it is responsibility of the codec to enforce it.


You need to build your own codec that returns a different setting so 
it can be enforced by IndexWriter. See Apache Solr's code how to wrap 
the existing KnnVectorsFormat so it returns another limit: 
<https://github.com/apache/solr/blob/6d50c592fb0b7e0ea2e52ecf1cde7e882e1d0d0a/solr/core/src/java/org/apache/solr/core/SchemaCodecFactory.java#L159-L183> 



Basically you need to subclass Lucene95Codec like done here: 
<https://github.com/apache/solr/blob/6d50c592fb0b7e0ea2e52ecf1cde7e882e1d0d0a/solr/core/src/java/org/apache/solr/core/SchemaCodecFactory.java#L99-L146> 
and return a different vectors format like a delegator as descirbed 
before.


The responsibility was shifted to the codec, because there may be 
better alternatives to HNSW that have different limits especially with 
regard to performance during merging and query response times, e.g. 
BKD trees.


Uwe

Am 19.10.2023 um 10:53 schrieb Michael Wechner:
I forgot to mention, that when using the custom FieldType and 1536 
vector dimension does work with Lucene 9.7.0


Thanks

Michael



Am 19.10.23 um 10:39 schrieb Michael Wechner:

Hi

I recently upgraded Lucene to 9.8.0 and was running tests with 
OpenAI's embedding model, which has the vector dimension 1536 and 
received the following error


Field[vector]vector's dimensions must be <= [1024]; got 1536

wheres this worked previously with the hack to override the vector 
dimension using a custom


float[] vector = ...
FieldType vectorFieldType = new CustomVectorFieldType(vector.length, 
VectorSimilarityFuncion.COSINE);


and setting

KnnFloatVectorField vectorField = new 
KnnFloatVectorField("VECTOR_FIELD", vector, vectorFieldType);


But this does not seem to work anymore with Lucene 9.8.0

Is this hack now prevented by the Lucene code itself, or any idea 
how to make this work again?


Whatever one thinks of OpenAI, the embedding model 
"text-embedding-ada-002" is really good and it is sad, that one 
cannot use i

Re: When to use StringField and when to use FacetField for categorization?

2023-10-23 Thread Michael Wechner


Hi Greg

Thank you very much for your additional information, really very much 
appreciated!


Yes, generally speaking I think Lucene has many great features, which 
unfortunately are not so obvious for various reasons.


Documentation could of course always be better, but I guess it is also 
because many people do not use Lucene itself, but
rather use Solr, OpenSearch, Elasticsearch, etc. and do not have to know 
what Lucene itself is offering and therefore there are
not so many people asking for these things and therefore there is not 
really an incentive to improve the documentation.


In the python world there is a huge hype re RAG / RAG-Fusion and there 
are many people writing posts and documentation, see for example


https://medium.com/@murtuza753/using-llama-2-0-faiss-and-langchain-for-question-answering-on-your-own-data-682241488476

I do not mean to say Lucene should or has to jump on this bandwagon, but 
I would argue there is definitely an evolution in search algorithms
and I think it would be nice if more people would know what Lucene has 
to offer and it would be more transparent where Lucene is heading.


But then again, it might be only me not being familiar enough with these 
things :-)


Thanks

Michael





Am 23.10.23 um 21:09 schrieb Greg Miller:

Hey Michael-

You've gotten a lot of great information here already. I'll point you to
one more implementation as well: StringValueFacetCounts. This
implementation lets you do faceting over arbitrary "string-like" doc value
fields (SORTED and SORTED_SET). So if you already have a field of this type
you're using for other purposes, and you want to do faceting over it, you
can do it with this implementation.

The faceting-specific fields (there's a taxonomy-based approach and a
non-taxonomy-based approach, both with pros/cons) are also available, which
is what you've referenced here so far (and what others have pointed you
to). These are more "managed" fields with faceting in mind.

A high-level difference here is that faceting-specific fields tend to index
all the facet fields into a single doc values field in the index, which can
make faceting more efficient. StringValueFacetCounts can be less efficient
for faceting (if you have many different fields you want to individually
facet) but could be more flexible for you if you already have these fields
in your index for other purposes and don't want to duplicate the data into
these facet-specific fields.

Not sure if these details are helpful for you or not. If any of this is a
bit unclear, let me know and I'll try to describe things better or answer
specific questions. Honestly, we probably have too many ways to do the same
thing in the faceting module, and maybe our documentation could be a bit
more helpful.

Cheers,
-Greg

On Fri, Oct 20, 2023 at 2:54 PM Michael Wechner 
wrote:


thanks very much for this additional information, Marc!

Am 20.10.23 um 20:30 schrieb Marc D'Mello:

Just following up on Mike's comment:



It used to be that the "doc values" based faceting did not support


arbitrary hierarchy, but I think that was fixed at some point.


Yeah it was fixed a year or two ago, SortedSetDocValuesFacetField

supports

hierarchical faceting, I think you just need to enable it in the
FacetsConfig. One thing to keep in mind is even though SSDV faceting
doesn't require a taxonomy index, it still requires a
SortedSetDocValuesReaderState to be maintained, which can be a little bit
expensive to create, but only needs to be done once. This benchmark code
<

https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/facets/BenchmarkFacets.java

serves as a pretty basic example of SSDV/hierarchical SSDV faceting.

On Fri, Oct 20, 2023 at 7:09 AM Michael Wechner <

michael.wech...@wyona.com>

wrote:


cool, thank you very much!

Michael



Am 20.10.23 um 15:44 schrieb Michael McCandless:

You can use either the "doc values" implementation for facets
(SortedSetDocValuesFacetField), or the "taxonomy" implementation
(FacetField, in which case, yes, you need to create a TaxonomyWriter).

It used to be that the "doc values" based faceting did not support
arbitrary hierarchy, but I think that was fixed at some point.

Mike McCandless

http://blog.mikemccandless.com


On Fri, Oct 20, 2023 at 9:03 AM Michael Wechner <

michael.wech...@wyona.com>

wrote:


Hi Mike

Thanks for your feedback!

IIUC in order to have the actual advantages of Facets one has to
"connect" it with a TaxonomyWriter

FacetsConfig config = new FacetsConfig();
DirectoryTaxonomyWriter taxoWriter = new

DirectoryTaxonomyWriter(taxoDir);

indexWriter.addDocument(config.build(taxoWriter, doc));

right?

Thanks

Michael




Am 20.10.23 um 12:19 schrieb Michael McCandless:

There are some differences.

StringField is indexed into the inverted index (postings) so you can

do

efficient filtering.  You can also store in stored fields t

Re: When to use StringField and when to use FacetField for categorization?

2023-10-20 Thread Michael Wechner


thanks very much for this additional information, Marc!

Am 20.10.23 um 20:30 schrieb Marc D'Mello:

Just following up on Mike's comment:



It used to be that the "doc values" based faceting did not support


arbitrary hierarchy, but I think that was fixed at some point.


Yeah it was fixed a year or two ago, SortedSetDocValuesFacetField supports
hierarchical faceting, I think you just need to enable it in the
FacetsConfig. One thing to keep in mind is even though SSDV faceting
doesn't require a taxonomy index, it still requires a
SortedSetDocValuesReaderState to be maintained, which can be a little bit
expensive to create, but only needs to be done once. This benchmark code
<https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/facets/BenchmarkFacets.java>
serves as a pretty basic example of SSDV/hierarchical SSDV faceting.

On Fri, Oct 20, 2023 at 7:09 AM Michael Wechner 
wrote:


cool, thank you very much!

Michael



Am 20.10.23 um 15:44 schrieb Michael McCandless:

You can use either the "doc values" implementation for facets
(SortedSetDocValuesFacetField), or the "taxonomy" implementation
(FacetField, in which case, yes, you need to create a TaxonomyWriter).

It used to be that the "doc values" based faceting did not support
arbitrary hierarchy, but I think that was fixed at some point.

Mike McCandless

http://blog.mikemccandless.com


On Fri, Oct 20, 2023 at 9:03 AM Michael Wechner <

michael.wech...@wyona.com>

wrote:


Hi Mike

Thanks for your feedback!

IIUC in order to have the actual advantages of Facets one has to
"connect" it with a TaxonomyWriter

FacetsConfig config = new FacetsConfig();
DirectoryTaxonomyWriter taxoWriter = new

DirectoryTaxonomyWriter(taxoDir);

indexWriter.addDocument(config.build(taxoWriter, doc));

right?

Thanks

Michael




Am 20.10.23 um 12:19 schrieb Michael McCandless:

There are some differences.

StringField is indexed into the inverted index (postings) so you can do
efficient filtering.  You can also store in stored fields to retrieve.

FacetField does everything StringField does (filtering, storing

(maybe?)),

but in addition it stores data for faceting.  I.e. you can compute

facet

counts or simple aggregations at search time.

FacetField is also hierarchical: you can filter and facet by different
points/levels of your hierarchy.

Mike McCandless

http://blog.mikemccandless.com


On Fri, Oct 20, 2023 at 5:43 AM Michael Wechner <

michael.wech...@wyona.com>

wrote:


Hi

I have found the following simple Facet Example




https://github.com/apache/lucene/blob/main/lucene/demo/src/java/org/apache/lucene/demo/facet/SimpleFacetsExample.java

whereas for a simple categorization of documents I currently use
StringField, e.g.

doc1.add(new StringField("category", "book"));
doc1.add(new StringField("category", "quantum_physics"));
doc1.add(new StringField("category", "Neumann"))
doc1.add(new StringField("category", "Wheeler"))

doc2.add(new StringField("category", "magazine"));
doc2.add(new StringField("category", "astro_physics"));

which works well, but would it be better to use Facets for this, e.g.

doc1.add(new FacetField("media-type", "book"));
doc1.add(new FacetField("topic", "physics", "quantum");
doc1.add(new FacetField("author", "Neumann");
doc1.add(new FacetField("author", "Wheeler");

doc1.add(new FacetField("media-type", "magazine"));
doc1.add(new FacetField("topic", "physics", "astro");

?

IIUC the StringField approach is more general, whereas the FacetField
approach allows to do a more specific categorization / search.
Or do I misunderstand this?

Thanks

Michael



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: When to use StringField and when to use FacetField for categorization?

2023-10-20 Thread Michael Wechner


cool, thank you very much!

Michael



Am 20.10.23 um 15:44 schrieb Michael McCandless:

You can use either the "doc values" implementation for facets
(SortedSetDocValuesFacetField), or the "taxonomy" implementation
(FacetField, in which case, yes, you need to create a TaxonomyWriter).

It used to be that the "doc values" based faceting did not support
arbitrary hierarchy, but I think that was fixed at some point.

Mike McCandless

http://blog.mikemccandless.com


On Fri, Oct 20, 2023 at 9:03 AM Michael Wechner 
wrote:


Hi Mike

Thanks for your feedback!

IIUC in order to have the actual advantages of Facets one has to
"connect" it with a TaxonomyWriter

FacetsConfig config = new FacetsConfig();
DirectoryTaxonomyWriter taxoWriter = new DirectoryTaxonomyWriter(taxoDir);
indexWriter.addDocument(config.build(taxoWriter, doc));

right?

Thanks

Michael




Am 20.10.23 um 12:19 schrieb Michael McCandless:

There are some differences.

StringField is indexed into the inverted index (postings) so you can do
efficient filtering.  You can also store in stored fields to retrieve.

FacetField does everything StringField does (filtering, storing

(maybe?)),

but in addition it stores data for faceting.  I.e. you can compute facet
counts or simple aggregations at search time.

FacetField is also hierarchical: you can filter and facet by different
points/levels of your hierarchy.

Mike McCandless

http://blog.mikemccandless.com


On Fri, Oct 20, 2023 at 5:43 AM Michael Wechner <

michael.wech...@wyona.com>

wrote:


Hi

I have found the following simple Facet Example




https://github.com/apache/lucene/blob/main/lucene/demo/src/java/org/apache/lucene/demo/facet/SimpleFacetsExample.java

whereas for a simple categorization of documents I currently use
StringField, e.g.

doc1.add(new StringField("category", "book"));
doc1.add(new StringField("category", "quantum_physics"));
doc1.add(new StringField("category", "Neumann"))
doc1.add(new StringField("category", "Wheeler"))

doc2.add(new StringField("category", "magazine"));
doc2.add(new StringField("category", "astro_physics"));

which works well, but would it be better to use Facets for this, e.g.

doc1.add(new FacetField("media-type", "book"));
doc1.add(new FacetField("topic", "physics", "quantum");
doc1.add(new FacetField("author", "Neumann");
doc1.add(new FacetField("author", "Wheeler");

doc1.add(new FacetField("media-type", "magazine"));
doc1.add(new FacetField("topic", "physics", "astro");

?

IIUC the StringField approach is more general, whereas the FacetField
approach allows to do a more specific categorization / search.
Or do I misunderstand this?

Thanks

Michael



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: When to use StringField and when to use FacetField for categorization?

2023-10-20 Thread Michael Wechner

Hi Adrien

Thank you very much for your feedback as well!

I just replaced the StringField by KeywordField :-)

Thanks

Michael

Am 20.10.23 um 14:13 schrieb Adrien Grand:
FYI there is also KeywordField, which combines StringField and 
SortedSetDocValuesField. It supports filtering, sorting, faceting and 
retrieval. It's my go-to field for string values.

Le ven. 20 oct. 2023, 12:20, Michael McCandless 
 a écrit :

There are some differences.

StringField is indexed into the inverted index (postings) so you
can do
efficient filtering.  You can also store in stored fields to retrieve.

FacetField does everything StringField does (filtering, storing
(maybe?)),
but in addition it stores data for faceting.  I.e. you can compute
facet
counts or simple aggregations at search time.

FacetField is also hierarchical: you can filter and facet by different
points/levels of your hierarchy.

Mike McCandless

http://blog.mikemccandless.com

On Fri, Oct 20, 2023 at 5:43 AM Michael Wechner

wrote:

> Hi
>
> I have found the following simple Facet Example
>
>
>

https://github.com/apache/lucene/blob/main/lucene/demo/src/java/org/apache/lucene/demo/facet/SimpleFacetsExample.java
>
> whereas for a simple categorization of documents I currently use
> StringField, e.g.
>
> doc1.add(new StringField("category", "book"));
> doc1.add(new StringField("category", "quantum_physics"));
> doc1.add(new StringField("category", "Neumann"))
> doc1.add(new StringField("category", "Wheeler"))
>
> doc2.add(new StringField("category", "magazine"));
> doc2.add(new StringField("category", "astro_physics"));
>
> which works well, but would it be better to use Facets for this,
e.g.
>
> doc1.add(new FacetField("media-type", "book"));
> doc1.add(new FacetField("topic", "physics", "quantum");
> doc1.add(new FacetField("author", "Neumann");
> doc1.add(new FacetField("author", "Wheeler");
>
> doc1.add(new FacetField("media-type", "magazine"));
> doc1.add(new FacetField("topic", "physics", "astro");
>
> ?
>
> IIUC the StringField approach is more general, whereas the
FacetField
> approach allows to do a more specific categorization / search.
> Or do I misunderstand this?
>
> Thanks
>
> Michael
>
>
>
>
-
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Re: When to use StringField and when to use FacetField for categorization?

2023-10-20 Thread Michael Wechner


Hi Mike

Thanks for your feedback!

IIUC in order to have the actual advantages of Facets one has to 
"connect" it with a TaxonomyWriter


FacetsConfig config = new FacetsConfig();
DirectoryTaxonomyWriter taxoWriter = new DirectoryTaxonomyWriter(taxoDir);
indexWriter.addDocument(config.build(taxoWriter, doc));

right?

Thanks

Michael




Am 20.10.23 um 12:19 schrieb Michael McCandless:

There are some differences.

StringField is indexed into the inverted index (postings) so you can do
efficient filtering.  You can also store in stored fields to retrieve.

FacetField does everything StringField does (filtering, storing (maybe?)),
but in addition it stores data for faceting.  I.e. you can compute facet
counts or simple aggregations at search time.

FacetField is also hierarchical: you can filter and facet by different
points/levels of your hierarchy.

Mike McCandless

http://blog.mikemccandless.com


On Fri, Oct 20, 2023 at 5:43 AM Michael Wechner 
wrote:


Hi

I have found the following simple Facet Example


https://github.com/apache/lucene/blob/main/lucene/demo/src/java/org/apache/lucene/demo/facet/SimpleFacetsExample.java

whereas for a simple categorization of documents I currently use
StringField, e.g.

doc1.add(new StringField("category", "book"));
doc1.add(new StringField("category", "quantum_physics"));
doc1.add(new StringField("category", "Neumann"))
doc1.add(new StringField("category", "Wheeler"))

doc2.add(new StringField("category", "magazine"));
doc2.add(new StringField("category", "astro_physics"));

which works well, but would it be better to use Facets for this, e.g.

doc1.add(new FacetField("media-type", "book"));
doc1.add(new FacetField("topic", "physics", "quantum");
doc1.add(new FacetField("author", "Neumann");
doc1.add(new FacetField("author", "Wheeler");

doc1.add(new FacetField("media-type", "magazine"));
doc1.add(new FacetField("topic", "physics", "astro");

?

IIUC the StringField approach is more general, whereas the FacetField
approach allows to do a more specific categorization / search.
Or do I misunderstand this?

Thanks

Michael



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

When to use StringField and when to use FacetField for categorization?

2023-10-20 Thread Michael Wechner


Hi

I have found the following simple Facet Example

https://github.com/apache/lucene/blob/main/lucene/demo/src/java/org/apache/lucene/demo/facet/SimpleFacetsExample.java

whereas for a simple categorization of documents I currently use 
StringField, e.g.


doc1.add(new StringField("category", "book"));
doc1.add(new StringField("category", "quantum_physics"));
doc1.add(new StringField("category", "Neumann"))
doc1.add(new StringField("category", "Wheeler"))

doc2.add(new StringField("category", "magazine"));
doc2.add(new StringField("category", "astro_physics"));

which works well, but would it be better to use Facets for this, e.g.

doc1.add(new FacetField("media-type", "book"));
doc1.add(new FacetField("topic", "physics", "quantum");
doc1.add(new FacetField("author", "Neumann");
doc1.add(new FacetField("author", "Wheeler");

doc1.add(new FacetField("media-type", "magazine"));
doc1.add(new FacetField("topic", "physics", "astro");

?

IIUC the StringField approach is more general, whereas the FacetField 
approach allows to do a more specific categorization / search.

Or do I misunderstand this?

Thanks

Michael



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Field[vector]vector's dimensions must be <= [1024]; got 1536

2023-10-19 Thread Michael Wechner


Hi Uwe

Thank you very much for your quick feedback, really appreciated!

Will change it as you describe below.

Thanks

Michael



Am 19.10.23 um 11:23 schrieb Uwe Schindler:

Hi Michael,

The max vector dimension limit is no longer checked in the field type 
as it is responsibility of the codec to enforce it.


You need to build your own codec that returns a different setting so 
it can be enforced by IndexWriter. See Apache Solr's code how to wrap 
the existing KnnVectorsFormat so it returns another limit: 
<https://github.com/apache/solr/blob/6d50c592fb0b7e0ea2e52ecf1cde7e882e1d0d0a/solr/core/src/java/org/apache/solr/core/SchemaCodecFactory.java#L159-L183> 



Basically you need to subclass Lucene95Codec like done here: 
<https://github.com/apache/solr/blob/6d50c592fb0b7e0ea2e52ecf1cde7e882e1d0d0a/solr/core/src/java/org/apache/solr/core/SchemaCodecFactory.java#L99-L146> 
and return a different vectors format like a delegator as descirbed 
before.


The responsibility was shifted to the codec, because there may be 
better alternatives to HNSW that have different limits especially with 
regard to performance during merging and query response times, e.g. 
BKD trees.


Uwe

Am 19.10.2023 um 10:53 schrieb Michael Wechner:
I forgot to mention, that when using the custom FieldType and 1536 
vector dimension does work with Lucene 9.7.0


Thanks

Michael



Am 19.10.23 um 10:39 schrieb Michael Wechner:

Hi

I recently upgraded Lucene to 9.8.0 and was running tests with 
OpenAI's embedding model, which has the vector dimension 1536 and 
received the following error


Field[vector]vector's dimensions must be <= [1024]; got 1536

wheres this worked previously with the hack to override the vector 
dimension using a custom


float[] vector = ...
FieldType vectorFieldType = new CustomVectorFieldType(vector.length, 
VectorSimilarityFuncion.COSINE);


and setting

KnnFloatVectorField vectorField = new 
KnnFloatVectorField("VECTOR_FIELD", vector, vectorFieldType);


But this does not seem to work anymore with Lucene 9.8.0

Is this hack now prevented by the Lucene code itself, or any idea 
how to make this work again?


Whatever one thinks of OpenAI, the embedding model 
"text-embedding-ada-002" is really good and it is sad, that one 
cannot use it with Lucene, because of the 1024 dimension restriction.


Thanks

Michael



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Field[vector]vector's dimensions must be <= [1024]; got 1536

2023-10-19 Thread Michael Wechner

I forgot to mention, that when using the custom FieldType and 1536 
vector dimension does work with Lucene 9.7.0


Thanks

Michael



Am 19.10.23 um 10:39 schrieb Michael Wechner:

Hi

I recently upgraded Lucene to 9.8.0 and was running tests with 
OpenAI's embedding model, which has the vector dimension 1536 and 
received the following error


Field[vector]vector's dimensions must be <= [1024]; got 1536

wheres this worked previously with the hack to override the vector 
dimension using a custom


float[] vector = ...
FieldType vectorFieldType = new CustomVectorFieldType(vector.length, 
VectorSimilarityFuncion.COSINE);


and setting

KnnFloatVectorField vectorField = new 
KnnFloatVectorField("VECTOR_FIELD", vector, vectorFieldType);


But this does not seem to work anymore with Lucene 9.8.0

Is this hack now prevented by the Lucene code itself, or any idea how 
to make this work again?


Whatever one thinks of OpenAI, the embedding model 
"text-embedding-ada-002" is really good and it is sad, that one cannot 
use it with Lucene, because of the 1024 dimension restriction.


Thanks

Michael



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Field[vector]vector's dimensions must be <= [1024]; got 1536

2023-10-19 Thread Michael Wechner


Hi

I recently upgraded Lucene to 9.8.0 and was running tests with OpenAI's 
embedding model, which has the vector dimension 1536 and received the 
following error


Field[vector]vector's dimensions must be <= [1024]; got 1536

wheres this worked previously with the hack to override the vector 
dimension using a custom


float[] vector = ...
FieldType vectorFieldType = new CustomVectorFieldType(vector.length, 
VectorSimilarityFuncion.COSINE);


and setting

KnnFloatVectorField vectorField = new 
KnnFloatVectorField("VECTOR_FIELD", vector, vectorFieldType);


But this does not seem to work anymore with Lucene 9.8.0

Is this hack now prevented by the Lucene code itself, or any idea how to 
make this work again?


Whatever one thinks of OpenAI, the embedding model 
"text-embedding-ada-002" is really good and it is sad, that one cannot 
use it with Lucene, because of the 1024 dimension restriction.


Thanks

Michael



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: How to replace deprecated document(i)

2023-09-25 Thread Michael Wechner


cool, thank you very much!

I will try to do a pull request re the javadoc documentation, because I 
think it currently does not tell explicitely how to replace the 
deprecated method.


Thanks

Michael

Am 25.09.23 um 11:11 schrieb Uwe Schindler:

Hi,

yes once per search request is the best to start with.

You can reuse the instance for multiple requests, but you cannot use 
it from multiple threads. So it is up to you to make sure you reuse it 
at best effort.


See also the documentation I posted from MIGRATE.txt.

If the documentation is missing, maybe let's open a pull request that 
gives the missing information in 9.x Javadocs, too.


Uwe

Am 25.09.2023 um 11:02 schrieb Michael Wechner:

you mean once per search request?

I mean for example

GET https://localhost:8080/search?q=Lucene

and the following would be executed

IndexReader reader = DirectoryReader.open(...);
StoredFields  storedfields = reader.storedFields();
IndexSearcher searcher = new IndexSearcher(reader)
TopDocs topDocs = searcher.search(query, k)
for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
  Document doc = storedFields.document(scoreDoc.doc);
}

Like this?

Thanks

Michael


Am 25.09.23 um 10:28 schrieb Uwe Schindler:
Background: For performance, it is advisable to get the 
storedFields() *once* to process all documents in the search result. 
The resason for the change was the problem of accessing stored 
fields would otherwise need to use ThreadLocals to keep state.


Issue: https://github.com/apache/lucene/pull/11998

This was introduced in Lucene 9.5.

It is also listed in MIGRATE.txt:

   ### Removed deprecated IndexSearcher.doc, IndexReader.document,
   IndexReader.getTermVectors (GITHUB#11998)

   The deprecated Stored Fields and Term Vectors apis relied upon
   threadlocal storage and have been removed.

   Instead, call storedFields()/termVectors() to return an instance
   which can fetch data for multiple documents,
   and will be garbage-collected as usual.

   For example:
   ```java
   TopDocs hits = searcher.search(query, 10);
   StoredFields storedFields = reader.storedFields();
   for (ScoreDoc hit : hits.scoreDocs) {
      Document doc = storedFields.document(hit.doc);
   }
   ```

   Note that these StoredFields and TermVectors instances should only
   be consumed in the thread where
   they were acquired. For instance, it is illegal to share them across
   threads.

Uwe

Am 25.09.2023 um 07:53 schrieb Michael Wechner:

Hi Shubham

Great, thank you very much!

Michael

Am 25.09.23 um 02:14 schrieb Shubham Chaudhary:

Hi Michael,

You could replace this with
*indexReader.storedFields().document(scoreDoc.doc)*

Docs -
https://lucene.apache.org/core/9_7_0/core/org/apache/lucene/index/StoredFields.html#document(int) 



- Shubham

On Mon, Sep 25, 2023 at 1:59 AM Michael Wechner 


wrote:


Hi

I recently noctived that

IndexReader.document(int)

is deprecated, whereas my code is currently

TopDocs topDocs = searcher.search(query, k);
for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
  Document doc = indexReader.document(scoreDoc.doc);
}

How do I best replace document(int)?

Thanks

Michael

- 


To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: How to replace deprecated document(i)

2023-09-25 Thread Michael Wechner


you mean once per search request?

I mean for example

GET https://localhost:8080/search?q=Lucene

and the following would be executed

IndexReader reader = DirectoryReader.open(...);
StoredFields  storedfields = reader.storedFields();
IndexSearcher searcher = new IndexSearcher(reader)
TopDocs topDocs = searcher.search(query, k)
for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
  Document doc = storedFields.document(scoreDoc.doc);
}

Like this?

Thanks

Michael


Am 25.09.23 um 10:28 schrieb Uwe Schindler:
Background: For performance, it is advisable to get the storedFields() 
*once* to process all documents in the search result. The resason for 
the change was the problem of accessing stored fields would otherwise 
need to use ThreadLocals to keep state.


Issue: https://github.com/apache/lucene/pull/11998

This was introduced in Lucene 9.5.

It is also listed in MIGRATE.txt:

   ### Removed deprecated IndexSearcher.doc, IndexReader.document,
   IndexReader.getTermVectors (GITHUB#11998)

   The deprecated Stored Fields and Term Vectors apis relied upon
   threadlocal storage and have been removed.

   Instead, call storedFields()/termVectors() to return an instance
   which can fetch data for multiple documents,
   and will be garbage-collected as usual.

   For example:
   ```java
   TopDocs hits = searcher.search(query, 10);
   StoredFields storedFields = reader.storedFields();
   for (ScoreDoc hit : hits.scoreDocs) {
      Document doc = storedFields.document(hit.doc);
   }
   ```

   Note that these StoredFields and TermVectors instances should only
   be consumed in the thread where
   they were acquired. For instance, it is illegal to share them across
   threads.

Uwe

Am 25.09.2023 um 07:53 schrieb Michael Wechner:

Hi Shubham

Great, thank you very much!

Michael

Am 25.09.23 um 02:14 schrieb Shubham Chaudhary:

Hi Michael,

You could replace this with
*indexReader.storedFields().document(scoreDoc.doc)*

Docs -
https://lucene.apache.org/core/9_7_0/core/org/apache/lucene/index/StoredFields.html#document(int) 



- Shubham

On Mon, Sep 25, 2023 at 1:59 AM Michael Wechner 


wrote:


Hi

I recently noctived that

IndexReader.document(int)

is deprecated, whereas my code is currently

TopDocs topDocs = searcher.search(query, k);
for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
  Document doc = indexReader.document(scoreDoc.doc);
}

How do I best replace document(int)?

Thanks

Michael

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: How to replace deprecated document(i)

2023-09-24 Thread Michael Wechner


Hi Shubham

Great, thank you very much!

Michael

Am 25.09.23 um 02:14 schrieb Shubham Chaudhary:

Hi Michael,

You could replace this with
*indexReader.storedFields().document(scoreDoc.doc)*

Docs -
https://lucene.apache.org/core/9_7_0/core/org/apache/lucene/index/StoredFields.html#document(int)

- Shubham

On Mon, Sep 25, 2023 at 1:59 AM Michael Wechner 
wrote:


Hi

I recently noctived that

IndexReader.document(int)

is deprecated, whereas my code is currently

TopDocs topDocs = searcher.search(query, k);
for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
  Document doc = indexReader.document(scoreDoc.doc);
}

How do I best replace document(int)?

Thanks

Michael

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

How to replace deprecated document(i)

2023-09-24 Thread Michael Wechner


Hi

I recently noctived that

IndexReader.document(int)

is deprecated, whereas my code is currently

TopDocs topDocs = searcher.search(query, k);
for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
    Document doc = indexReader.document(scoreDoc.doc);
}

How do I best replace document(int)?

Thanks

Michael

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Vector Search with OpenAI Embeddings: Lucene Is All You Need

2023-08-31 Thread Michael Wechner


Hi Together

You might be interesed in this paper / article

https://arxiv.org/abs/2308.14963

Thanks

Michael

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Top docs depend on value of K nearest neighbour

2023-08-04 Thread Michael Wechner


Thank you very much for your feedback!

Does there also exist an "ef" value inside Lucene

"ef - *the size of the dynamic list for the nearest neighbors* (used 
during the search). Higher ef leads to more accurate but slower search. 
ef cannot be set lower than the number of queried nearest neighbors k . 
The value ef of can be anything between k and the size of the dataset."


or is the "k" value *the* "ef" value?

IIUC the relevant "k" value (k1) for the HNSW algorithm is what is set at

Query query = new KnnFloatVectorQuery("vector-field-name",queryVector, k1);


and not the max top docs value (k2) at

TopDocs topDocs = searcher.search(query, k2);


right?

But it should be k2 <= k1, right? Or would you set k1 == k2?

Thanks

Michael


Am 03.08.23 um 19:49 schrieb Michael Sokolov:

well, it is "approximate" KNN and can get caught in local minima
(maxima?). Increasing K has, indirectly, the effect of expanding the
search space because the minimum score in the priority score (score of
the Kth item) is used as a threshold for deciding when to terminate
the search

On Wed, Aug 2, 2023 at 5:19 PM Michael Wechner
  wrote:

Hi

I use Lucene 9.7.0 but experienced the same behaviour with Lucene 9.6.0
when doing vector search as follows:

I have indexed about 200 vectors (dimension 768)

I build the query as follows

   Query query = new KnnFloatVectorQuery("vector-field-name",
queryVector, k);

and do the search as follows:

TopDocs topDocs = searcher.search(query, k);

When I set k=27 then the top doc has a score of 0.7757

When I set the "k" value a little lower, e.g. k=24 then the top doc has
a score of 0.7319 and is not the same document as the one with the score
of 0.7757

And idea what I might be doing wrong or what I misunderstand?

Why does the value of k has an effect on the returned top doc?

Thanks

Michael



-
To unsubscribe, e-mail:java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail:java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail:java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail:java-user-h...@lucene.apache.org

Top docs depend on value of K nearest neighbour

2023-08-02 Thread Michael Wechner


Hi

I use Lucene 9.7.0 but experienced the same behaviour with Lucene 9.6.0 
when doing vector search as follows:


I have indexed about 200 vectors (dimension 768)

I build the query as follows

 Query query = new KnnFloatVectorQuery("vector-field-name", 
queryVector, k);


and do the search as follows:

TopDocs topDocs = searcher.search(query, k);

When I set k=27 then the top doc has a score of 0.7757

When I set the "k" value a little lower, e.g. k=24 then the top doc has 
a score of 0.7319 and is not the same document as the one with the score 
of 0.7757


And idea what I might be doing wrong or what I misunderstand?

Why does the value of k has an effect on the returned top doc?

Thanks

Michael



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: [ANNOUNCE] Apache Lucene 9.6.0 released

2023-05-11 Thread Michael Wechner


Thank you very much for the release! Works very fine so far :-)

All the best

Michael

Am 10.05.23 um 09:49 schrieb Alan Woodward:

The Lucene PMC is pleased to announce the release of Apache Lucene 9.6.0.

Apache Lucene is a high-performance, full-featured search engine library 
written entirely in Java. It is a technology suitable for nearly any 
application that requires structured search, full-text search, faceting, 
nearest-neighbor search across high-dimensionality vectors, spell correction or 
query suggestions.

This release contains numerous bug fixes, optimizations, and improvements, some 
of which are highlighted below. The release is available for immediate download 
at:

  

### Lucene 9.6.0 Release Highlights:

* Introduce a new KeywordField for simple and efficient  filtering, sorting and 
faceting.
* Add support for Java 20 foreign memory API. If exactly Java 19 or 20 is used, 
MMapDirectory will mmap Lucene indexes in chunks of 16 GiB (instead of 1 GiB) 
and indexes closed while queries are running can no longer crash the JVM.
* Improved performance for TermInSetQuery, PrefixQuery, WildcardQuery and 
TermRangeQuery
* Lower memory usage for BloomFilteringPostingsFormat
* Faster merges for HNSW indexes
* Improvements to concurrent indexing throughput under heavy load
* Correct equals implementation in SynonymQuery
* 'explain' is now implemented on TermAutomatonQuery

Please read CHANGES.txt for a full list of new features and changes:

  
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Vector Search on Lucene

2023-03-02 Thread Michael Wechner


Hi Marcos

The indexing looks kind of

Document doc =new Document();
float[] vector = getEmbedding(text);
FieldType vectorFieldType = KnnVectorField.createFieldType(vector.length, 
VectorSimilarityFunction.COSINE);
KnnVectorField vectorField =new KnnVectorField("my_vector_field", vector, 
vectorFieldType);
doc.add(vectorField);
writer.addDocument(doc);


And the searching / retrieval looks kind of

float[] queryVector = getEmbedding(question)
int k =7;// INFO: The number of documents to find
Query query =new KnnVectorQuery("my_vector_field", queryVector, k);
IndexSearcher searcher =new IndexSearcher(indexReader);
TopDocs topDocs = searcher.search(query, k);

Also see

https://lucene.apache.org/core/9_5_0/demo/index.html#Embeddings
https://lucene.apache.org/core/9_5_0/demo/org/apache/lucene/demo/knn/package-summary.html

HTH

Michael





Am 02.03.23 um 10:25 schrieb marcos rebelo:

Hi all,

I'm willing to use Vector Search with Lucene.

I have vectors created for queries and documents outside Lucene.
I would like to upload the document vectors to a Lucene index, Then use
Lucene to filter the documents (like classical search) and rank the
remaining products with the Vectors.

For performance reasons I would like some fast KNN for the rankers.

I looked on Google and I didn't find any document with some code samples.

2 questions:
  * Is this a correct design pattern?
  * Is there a good article explaining how to do this with Lucene?

Best Regards
Marcos Rebelo

Re: Re-ranking using cross-encoder after vector search (bi-encoder)

2023-02-11 Thread Michael Wechner


Thanks for your feedback!

To better understand your answer I would like to consider the following code / 
example:

My code currently kind of looks like

int k = 3;
String question = "How old is Michael?";
IndexSearcher searcher =new IndexSearcher(indexReader);
float[] queryVector = getEmbedding(question);
Query query =new KnnVectorQuery(VECTOR_FIELD, queryVector, k, filter);
TopDocs topDocs = searcher.search(query, k);

The data (from topDocs) passed to the re-ranker kind of looks like

"question": "How old is Michael?",
"docs":[
"Michael lives in Switzerland"
"Michael was born 1969",
"Michael has three children"
]

So the expected returned result set would be

["2", "1", "3"] or ["2", "3", "1"]

basically saying the answer "Michael was born 1969" is the best answer 
to the question "How old is Michael?".


So how would the code look like by providing something like a 
VectorRerankField / FastVectorField?


Thanks

Michael




Am 10.02.23 um 17:02 schrieb Robert Muir:

I think it would be good to provide something like a VectorRerankField
(sorry for the bad name, maybe FastVectorField would be amusing too),
that just stores vectors as docvalues (no HNSW) and has a
newRescorer() method that implements
org.apache.lucene.search.Rescorer. Then its easy to do as that
document describes, pull top 500 hits with BM25 and rerank them with
your vectors, very fast, only 500 calculations required, no HNSW or
anything needed. Of course you could use a vector search instead of a
BM25 search as the initial search to pull the top 500 hits too.

So it could meet both use-cases and provide a really performant option
for users that want to integrate vector search.

On Fri, Feb 10, 2023 at 10:21 AM Michael Wechner
  wrote:

Hi

I use the vector search of Lucene, whereas the embeddings I get from
SentenceBERT for example.

According to

https://www.sbert.net/examples/applications/retrieve_rerank/README.html

a re-ranking with a cross-encoder after the vector search (bi-encoding)
can improve the ranking.

Would it make sense to add this kind of functionality to Lucene or is
somebody already working on something similar?

Thanks

Michael

-
To unsubscribe, e-mail:java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail:java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail:java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail:java-user-h...@lucene.apache.org

Re-ranking using cross-encoder after vector search (bi-encoder)

2023-02-10 Thread Michael Wechner


Hi

I use the vector search of Lucene, whereas the embeddings I get from 
SentenceBERT for example.


According to

https://www.sbert.net/examples/applications/retrieve_rerank/README.html

a re-ranking with a cross-encoder after the vector search (bi-encoding) 
can improve the ranking.


Would it make sense to add this kind of functionality to Lucene or is 
somebody already working on something similar?


Thanks

Michael

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Other vector similarity metric than provided by VectorSimilarityFunction

2023-01-15 Thread Michael Wechner





Am 15.01.23 um 16:36 schrieb Michael Sokolov:

I would suggest building Lucene from source and adding your own
similarity function to VectorSimilarity. That is the proper extension
point for similarity functions. If you find there is some substantial
benefit, it wouldn't be a big lift to add something like that. However
I'm dubious about the likely benefit; just because scipy supports lots
of functions doesn't mean you will get substantially better results
with L3 metric vs L2 metric or so. I think you'd probably find this
community receptive to a metric that *doesn't lose* accuracy and
provides a more efficient computation -- maybe L1 would do that?


yes, I think the L1 (Manhattan) could be one of them :-)

Btw, Weaviate has a quite nice documentation re vector distances

https://weaviate.io/blog/2022/09/Distance-Metrics-in-Vector-Search.html

Yes, maybe it is easier to just contribute another metric as part of the 
source, than make it configurable dynamically with a custom implementation.


Thanks

Michael




On Sat, Jan 14, 2023 at 6:04 PM Michael Wechner
 wrote:

Hi Adrien

Thanks for your feedback! Whereas I am not sure I fully understand what
you mean

At the moment I am using something like:

float[] vector = ...;
FieldType vectorFieldType = KnnVectorField.createFieldType(vector.length, 
VectorSimilarityFunction.COSINE);
KnnVectorField vectorField =new KnnVectorField("vector_field", vector, 
vectorFieldType);
doc.add(vectorField);

Could you give me some sample code what you mean with "custom KNN
vectors format"?

Thanks

Michael

Am 14.01.23 um 22:14 schrieb Adrien Grand:

Hi Michael,

You could create a custom KNN vectors format that ignores the vector
similarity configured on the field and uses its own.

Le sam. 14 janv. 2023, 21:33, Michael Wechner  a
écrit :


Hi

IIUC Lucene currently supports

VectorSimilarityFunction.COSINE
VectorSimilarityFunction.DOT_PRODUCT
VectorSimilarityFunction.EUCLIDEAN

whereas some embedding models have been trained with other metrics.
Also see

https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html

How can I best implement another metric?

Thanks

Michael





-
To unsubscribe, e-mail:java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail:java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Other vector similarity metric than provided by VectorSimilarityFunction

2023-01-14 Thread Michael Wechner


Hi Adrien

Thanks for your feedback! Whereas I am not sure I fully understand what 
you mean


At the moment I am using something like:

float[] vector = ...;
FieldType vectorFieldType = KnnVectorField.createFieldType(vector.length, 
VectorSimilarityFunction.COSINE);
KnnVectorField vectorField =new KnnVectorField("vector_field", vector, 
vectorFieldType);
doc.add(vectorField);

Could you give me some sample code what you mean with "custom KNN 
vectors format"?


Thanks

Michael

Am 14.01.23 um 22:14 schrieb Adrien Grand:

Hi Michael,

You could create a custom KNN vectors format that ignores the vector
similarity configured on the field and uses its own.

Le sam. 14 janv. 2023, 21:33, Michael Wechner  a
écrit :


Hi

IIUC Lucene currently supports

VectorSimilarityFunction.COSINE
VectorSimilarityFunction.DOT_PRODUCT
VectorSimilarityFunction.EUCLIDEAN

whereas some embedding models have been trained with other metrics.
Also see

https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html

How can I best implement another metric?

Thanks

Michael





-
To unsubscribe, e-mail:java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail:java-user-h...@lucene.apache.org

Other vector similarity metric than provided by VectorSimilarityFunction

2023-01-14 Thread Michael Wechner


Hi

IIUC Lucene currently supports

VectorSimilarityFunction.COSINE
VectorSimilarityFunction.DOT_PRODUCT
VectorSimilarityFunction.EUCLIDEAN

whereas some embedding models have been trained with other metrics.
Also see 
https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html


How can I best implement another metric?

Thanks

Michael





-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Question for SynonymQuery

2023-01-02 Thread Michael Wechner


independent of the synonym implementation you might want to consider vector/similarity 
search, for example if the query is "internet device",
then the cosine similarity of the multi-terms "internet device", "wifi router" and "wifi 
device" using the "all-mpnet-base-v2" are

{"cosineSimilarity":1,"cosineDistance":0,"sentenceOne":"internet 
device","sentenceTwo":"internet device"}


{"cosineSimilarity":0.47380197,"cosineDistance":0.526198,"sentenceOne":"internet 
device","sentenceTwo":"wifi router"}


{"cosineSimilarity":0.74852204,"cosineDistance":0.25147796,"sentenceOne":"internet 
device","sentenceTwo":"wifi device"} whereas as you can see "wifi 
device" is closer to "internet device" than "wifi router" to "internet 
device" using the model "all-mpnet-base-v2", whereas if you consider 
"wifi device" a false positive, then it is not helpful of course, but it 
might be useful otherwise considering the original question of this 
thread. HTH Michael




Am 02.01.23 um 17:54 schrieb Mikhail Khludnev:

Hello Trevor.
Can you help me better understand this approach? If we have a text "wifi
router" and inject "internet device" at indexing time, terms reside at the
same positions. How to avoid false positive match for query "wifi device"?

On Mon, Jan 2, 2023 at 4:16 PM Trevor Nicholls
wrote:


Hi Anh

The two links Michael shared relate to questions I asked when I was trying
to get synonym matching with our application.

I really do have multi-term synonym matching working at this point;
there's always scope for improvement of course but with the hints suppled
in those threads I was able to index our documents and search them using a
variety of synonymous terms, both single words and phrases.

Our application does not use either BooleanQuery or SynonymQuery; I have
just used the standard QueryParser. Instead the synonym processing occurs
in the indexing phase, which is not only simpler (one search pattern, one
query), but also I think you would also find it gives you superior
performance (because the synonym processing occurs once at indexing time
and not at all during searching - and I'm sure you'll be doing far more
searching than indexing).

cheers
T


-Original Message-
From: Michael Wechner
Sent: Thursday, 29 December 2022 08:56
To:java-user@lucene.apache.org
Subject: Re: Question for SynonymQuery

Hi Anh

The following Stackoverflow link might help


https://stackoverflow.com/questions/73240494/can-someone-assist-me-with-a-multi-word-synonym-problem-in-lucene

The following thread seems to confirm, that escaping the space with a
backslash does not help

https://lists.apache.org/list?java-user@lucene.apache.org:2022-3

HTH

Michael


Am 27.12.22 um 20:22 schrieb Anh Dũng Bùi:

Hi Lucene users,

I recently came across SynonymQuery and found out that it only
supports single-term synonyms (since it accepts a list of Term which
will be considered as synonyms). We have some multi-term synonyms like
"internet device" <-> "wifi router" or "dns" <-> "domain name
service". Am I right that I need to use something like a BooleanQuery

for these cases?

I have 2 other follow-up questions:
- Does SynonymQuery have any advantage over BooleanQuery? Or is it
only different in how scores are computed? As I understand
SynonymWeight will consider all terms as exactly the same while
BooleanQuery will favor the documents with more matched terms.
- Is it worth it to support multi-term synonyms in SynonymQuery? My
feeling is that it's better to just use BooleanQuery in those cases,
since to support multi-term synonyms it needs to accept a list of
Query, which would make it behave like a BooleanQuery. Also how
scoring works with multi-term is another problem.

Thanks & Regards!



-
To unsubscribe, e-mail:java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail:java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail:java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail:java-user-h...@lucene.apache.org

Re: Question for SynonymQuery

2022-12-28 Thread Michael Wechner

Hi Anh

The following Stackoverflow link might help

https://stackoverflow.com/questions/73240494/can-someone-assist-me-with-a-multi-word-synonym-problem-in-lucene

The following thread seems to confirm, that escaping the space with a
backslash does not help

https://lists.apache.org/list?java-user@lucene.apache.org:2022-3

HTH

Michael

Am 27.12.22 um 20:22 schrieb Anh Dũng Bùi:

Hi Lucene users,

I recently came across SynonymQuery and found out that it only supports
single-term synonyms (since it accepts a list of Term which will be
considered as synonyms). We have some multi-term synonyms like "internet
device" <-> "wifi router" or "dns" <-> "domain name service". Am I right
that I need to use something like a BooleanQuery for these cases?

I have 2 other follow-up questions:
- Does SynonymQuery have any advantage over BooleanQuery? Or is it only
different in how scores are computed? As I understand SynonymWeight will
consider all terms as exactly the same while BooleanQuery will favor the
documents with more matched terms.
- Is it worth it to support multi-term synonyms in SynonymQuery? My feeling
is that it's better to just use BooleanQuery in those cases, since to
support multi-term synonyms it needs to accept a list of Query, which would
make it behave like a BooleanQuery. Also how scoring works with multi-term
is another problem.

Thanks & Regards!

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: What exactly returns IndexReader.numDeletedDocs()

2022-12-08 Thread Michael Wechner

So IIUC the information re number of deleted documents is only visible 
temporarily and only when there are many documents, right?


Thanks

Michael

Am 08.12.22 um 14:21 schrieb Uwe Schindler:
If this is a reader with only a few documents the likelyness of all 
deletes being applied while closing is high.


Uwe

Am 08.12.2022 um 11:44 schrieb Michael Wechner:

My code at the moment is as follows:

Directory dir = FSDirectory.open(Paths.get(vectorIndexPath));

IndexReader reader = 
DirectoryReader.open(FSDirectory.open(Paths.get(vectorIndexPath)));

int numberOfDocsBeforeDeleting = reader.numDocs();
log.info("Number of documents: " + numberOfDocsBeforeDeleting);
log.info("Number of deleted documents: " + reader.numDeletedDocs());
reader.close();

log.info("Delete document with path '" + uuid +"' from index '" + 
vectorIndexPath +"' ...");
IndexWriterConfig iwc =new IndexWriterConfig();IndexWriter writer 
=new IndexWriter(dir, iwc);

Term term =new Term(PATH_FIELD, uuid);
writer.deleteDocuments(term);writer.close();

reader = 
DirectoryReader.open(FSDirectory.open(Paths.get(vectorIndexPath)));

int numberOfDocsAfterDeleting = reader.numDocs();
log.info("Number of documents: " + numberOfDocsAfterDeleting);
log.info("Number of deleted documents: " + 
(numberOfDocsBeforeDeleting - numberOfDocsAfterDeleting));
// TODO: Not sure whether the method numDeletedDocs() makes sense 
here log.info("Number of deleted documents: " + 
reader.numDeletedDocs()); reader.close();



whereas this code always returns 0, whereas

numberOfDocsBeforeDeleting - numberOfDocsAfterDeleting

produces the correct result.

Should I open the reader before closing the writer?

Thanks

Michael



Am 08.12.22 um 11:36 schrieb Uwe Schindler:
You have to reopen the index reader to see deletes from the 
indexwriter.


Am 08.12.2022 um 10:32 schrieb Hrvoje Lončar:

Did you call this method before or after commit method?
My wild guess would be that you can count deleted documents inside
transaction only.

On Thu, Dec 8, 2022 at 12:10 AM Michael Wechner 


wrote:


Hi

I am using Lucen 9.4.2 vector search and everything seems to work 
fine,
except that when I delete some documents from the index, then the 
method



https://lucene.apache.org/core/9_0_0/core/org/apache/lucene/index/IndexReader.html#numDeletedDocs() 



always returns 0, whereas I would have expected that it would 
return the

number of documents which I deleted from the index.

IndexReader.numDocs() returns the correct number though.

I guess I misunderstand the javadoc and in particular the note 
"*NOTE*:

This operation may run in O(maxDoc)."

Does somebody explain in more detail what this method is doing?

Thanks

Michael








-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: What exactly returns IndexReader.numDeletedDocs()

2022-12-08 Thread Michael Wechner


My code at the moment is as follows:

Directory dir = FSDirectory.open(Paths.get(vectorIndexPath));

IndexReader reader = 
DirectoryReader.open(FSDirectory.open(Paths.get(vectorIndexPath)));
int numberOfDocsBeforeDeleting = reader.numDocs();
log.info("Number of documents: " + numberOfDocsBeforeDeleting);
log.info("Number of deleted documents: " + reader.numDeletedDocs());
reader.close();

log.info("Delete document with path '" + uuid +"' from index '" + vectorIndexPath 
+"' ...");
IndexWriterConfig iwc =new IndexWriterConfig();IndexWriter writer =new 
IndexWriter(dir, iwc);
Term term =new Term(PATH_FIELD, uuid);
writer.deleteDocuments(term);writer.close();

reader = DirectoryReader.open(FSDirectory.open(Paths.get(vectorIndexPath)));
int numberOfDocsAfterDeleting = reader.numDocs();
log.info("Number of documents: " + numberOfDocsAfterDeleting);
log.info("Number of deleted documents: " + (numberOfDocsBeforeDeleting - 
numberOfDocsAfterDeleting));
// TODO: Not sure whether the method numDeletedDocs() makes sense here 
log.info("Number of deleted documents: " + reader.numDeletedDocs()); reader.close();



whereas this code always returns 0, whereas

numberOfDocsBeforeDeleting - numberOfDocsAfterDeleting

produces the correct result.

Should I open the reader before closing the writer?

Thanks

Michael



Am 08.12.22 um 11:36 schrieb Uwe Schindler:

You have to reopen the index reader to see deletes from the indexwriter.

Am 08.12.2022 um 10:32 schrieb Hrvoje Lončar:

Did you call this method before or after commit method?
My wild guess would be that you can count deleted documents inside
transaction only.

On Thu, Dec 8, 2022 at 12:10 AM Michael Wechner 


wrote:


Hi

I am using Lucen 9.4.2 vector search and everything seems to work fine,
except that when I delete some documents from the index, then the 
method



https://lucene.apache.org/core/9_0_0/core/org/apache/lucene/index/IndexReader.html#numDeletedDocs() 



always returns 0, whereas I would have expected that it would return 
the

number of documents which I deleted from the index.

IndexReader.numDocs() returns the correct number though.

I guess I misunderstand the javadoc and in particular the note "*NOTE*:
This operation may run in O(maxDoc)."

Does somebody explain in more detail what this method is doing?

Thanks

Michael

What exactly returns IndexReader.numDeletedDocs()

2022-12-07 Thread Michael Wechner


Hi

I am using Lucen 9.4.2 vector search and everything seems to work fine, 
except that when I delete some documents from the index, then the method


https://lucene.apache.org/core/9_0_0/core/org/apache/lucene/index/IndexReader.html#numDeletedDocs()

always returns 0, whereas I would have expected that it would return the 
number of documents which I deleted from the index.


IndexReader.numDocs() returns the correct number though.

I guess I misunderstand the javadoc and in particular the note "*NOTE*: 
This operation may run in O(maxDoc)."


Does somebody explain in more detail what this method is doing?

Thanks

Michael

Re: The current default similarity implementation of Lucene is BM25, right?

2022-11-23 Thread Michael Wechner


I have enhanced the FAQ

https://cwiki.apache.org/confluence/display/LUCENE/LuceneFAQ#LuceneFAQ-Whatisthedefaultrelevance/similarityimplementationofLucene?

Hope it is ok like this :-)

Thanks

Michael

Am 23.11.22 um 10:58 schrieb Michael Wechner:

cool, thanks!

Am 23.11.22 um 10:55 schrieb Adrien Grand:

This is correct. See IndexSearcher#getDefaultSimilarity().

On Wed, Nov 23, 2022 at 10:53 AM Michael Wechner
 wrote:

Hi

On the Lucene FAQ there is no mentioning re tf-idf or bm25 and I would
like to add some notes, but to be sure I don't write anything wrong I
would like to ask

whether the current default similarity implementation of Lucene is
really BM25, right?

as described at

https://opensourceconnections.com/blog/2015/10/16/bm25-the-next-generation-of-lucene-relevation/ 



Thanks

Michael

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org






-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: The current default similarity implementation of Lucene is BM25, right?

2022-11-23 Thread Michael Wechner


cool, thanks!

Am 23.11.22 um 10:55 schrieb Adrien Grand:

This is correct. See IndexSearcher#getDefaultSimilarity().

On Wed, Nov 23, 2022 at 10:53 AM Michael Wechner
 wrote:

Hi

On the Lucene FAQ there is no mentioning re tf-idf or bm25 and I would
like to add some notes, but to be sure I don't write anything wrong I
would like to ask

whether the current default similarity implementation of Lucene is
really BM25, right?

as described at

https://opensourceconnections.com/blog/2015/10/16/bm25-the-next-generation-of-lucene-relevation/

Thanks

Michael

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org






-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

The current default similarity implementation of Lucene is BM25, right?

2022-11-23 Thread Michael Wechner


Hi

On the Lucene FAQ there is no mentioning re tf-idf or bm25 and I would 
like to add some notes, but to be sure I don't write anything wrong I 
would like to ask


whether the current default similarity implementation of Lucene is 
really BM25, right?


as described at

https://opensourceconnections.com/blog/2015/10/16/bm25-the-next-generation-of-lucene-relevation/

Thanks

Michael

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Will ApacheCon North America 2022 sessions also be published on YouTube?

2022-10-16 Thread Michael Wechner


Hi

I just noticed that the ApacheCon Asia 2022 have been published on YouTube

https://apachecon.com/
https://www.youtube.com/c/TheApacheFoundation/playlists

Will this also happen for ApacheCon North America 2022?

Thanks

Michael

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Latency and recall re HSWN: Lucene versus Vespa

2022-10-01 Thread Michael Wechner


Thanks for your feedback!

I really do not know enough, so it is good to hear different opinions 
and I will try to understand better.


But anyway, I see it positive in the sense, that one way or the other, 
one can learn from each other hopefully :-)


Am 01.10.22 um 14:41 schrieb Michael Sokolov:

I'd agree with the main point re: the need to combine vector-based
matching with term-based matching.

As for the comparison with Lucene, I'd say it's a shallow and biased
take. The main argument is that Vespa's mutable in-memory(?) data
structures are superior to Lucene's immutable on-disk segments. While
it is true that Lucene's approach leads to slower searches when there
are more segments, especially for vector searches, the immutability
property provides other well-understood benefits. TBH I don't know
enough about Vespa to make any meaningful comparison, but every choice
is a compromise. We've known for centuries that "Odyous of olde been
comparisonis, And of comparisonis engendyrd is haterede."

On Sat, Oct 1, 2022 at 7:18 AM Michael Wechner
 wrote:

Hi Together

I just read the following article, where the author compares Lucene and
Vespa re HSWN

https://bergum.medium.com/will-new-vector-databases-dislodge-traditional-search-engines-b4fdb398fb43

What is your take on "comparing Lucene and Vespa re HSWN latency and
recall"?

Thanks

Michael

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Latency and recall re HSWN: Lucene versus Vespa

2022-10-01 Thread Michael Wechner


Hi Together

I just read the following article, where the author compares Lucene and 
Vespa re HSWN


https://bergum.medium.com/will-new-vector-databases-dislodge-traditional-search-engines-b4fdb398fb43

What is your take on "comparing Lucene and Vespa re HSWN latency and 
recall"?


Thanks

Michael

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Upgrading from 9.1.0. to 9.4.0: Old codecs may only be used for reading Lucene91HnswVectorsFormat.java

2022-10-01 Thread Michael Wechner


ah ok :-)

I think I just set it, because I saw this in one of the code examples 
and assumed that it might be necessary.


Yes, I am using HNSW, but I don't think I set any particular HNSW 
parameters.


I will double-check and will try without setting the codec.

Thanks

Michael

Am 01.10.22 um 12:56 schrieb Adrien Grand:

The best practice is to not set the codec explicitly, and Lucene will make
sure to always use the right one.

Seeing the codec explicitly is considered expert. I guess you are doing
this because you want to configure things like stored fields compression or
HNSW parameters? If so, there is no better way than what you are doing.


Le sam. 1 oct. 2022, 12:31, Michael Wechner  a
écrit :


Hi Adrien

Thank you very much for your help!

That was it :-) I completely forgot that I set this somewhere hidden
inside my code.
I made a note in the pom file, such that I should not forget again
during the next upgrade :-)

Or what is the best practice re setting / handling the codec?

Thanks

Michael

Am 01.10.22 um 08:06 schrieb Adrien Grand:

I would guess that you are configuring your IndexWriterConfig with a
"Lucene91Codec" instance. You need to replace it with a "Lucene94Codec"
instance.

Le sam. 1 oct. 2022, 06:12, Michael Wechner 

a

écrit :


Hi

I have just upgraded from 9.1.0 to 9.4.0 and compiling works fine, but
when I run and re-index my data using KnnVectorField, then I receive the
following exception:

java.lang.UnsupportedOperationException: Old codecs may only be used for
reading
   at


org.apache.lucene.backward_codecs.lucene91.Lucene91HnswVectorsFormat.fieldsWriter(Lucene91HnswVectorsFormat.java:131)

~[lucene-backward-codecs-9.4.0.jar:9.4.0
d2e22e18c6c92b6a6ba0bbc26d78b5e82832f956 - sokolovm - 2022-09-30

14:55:13]

   at


org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat$FieldsWriter.getInstance(PerFieldKnnVectorsFormat.java:161)

~[lucene-core-9.4.0.jar:9.4.0 d2e22e18c6c92b6a6ba0bbc26d78b5e82832f956 -
sokolovm - 2022-09-30 14:55:13]
   at


org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat$FieldsWriter.addField(PerFieldKnnVectorsFormat.java:105)

~[lucene-core-9.4.0.jar:9.4.0 d2e22e18c6c92b6a6ba0bbc26d78b5e82832f956 -
sokolovm - 2022-09-30 14:55:13]
   at


org.apache.lucene.index.VectorValuesConsumer.addField(VectorValuesConsumer.java:70)

~[lucene-core-9.4.0.jar:9.4.0 d2e22e18c6c92b6a6ba0bbc26d78b5e82832f956 -
sokolovm - 2022-09-30 14:55:13]
   at


org.apache.lucene.index.IndexingChain.initializeFieldInfo(IndexingChain.java:665)

~[lucene-core-9.4.0.jar:9.4.0 d2e22e18c6c92b6a6ba0bbc26d78b5e82832f956 -
sokolovm - 2022-09-30 14:55:13]
   at


org.apache.lucene.index.IndexingChain.processDocument(IndexingChain.java:556)

~[lucene-core-9.4.0.jar:9.4.0 d2e22e18c6c92b6a6ba0bbc26d78b5e82832f956 -
sokolovm - 2022-09-30 14:55:13]
   at


org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:241)

~[lucene-core-9.4.0.jar:9.4.0 d2e22e18c6c92b6a6ba0bbc26d78b5e82832f956 -
sokolovm - 2022-09-30 14:55:13]
   at


org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:432)

~[lucene-core-9.4.0.jar:9.4.0 d2e22e18c6c92b6a6ba0bbc26d78b5e82832f956 -
sokolovm - 2022-09-30 14:55:13]
   at


org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1533)

~[lucene-core-9.4.0.jar:9.4.0 d2e22e18c6c92b6a6ba0bbc26d78b5e82832f956 -
sokolovm - 2022-09-30 14:55:13]
   at


org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1818)

~[lucene-core-9.4.0.jar:9.4.0 d2e22e18c6c92b6a6ba0bbc26d78b5e82832f956 -
sokolovm - 2022-09-30 14:55:13]
   at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1471)
~[lucene-core-9.4.0.jar:9.4.0 d2e22e18c6c92b6a6ba0bbc26d78b5e82832f956 -
sokolovm - 2022-09-30 14:55:13]

Any idea what I might be doing wrong?

Thanks

Michael

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Upgrading from 9.1.0. to 9.4.0: Old codecs may only be used for reading Lucene91HnswVectorsFormat.java

2022-10-01 Thread Michael Wechner


Hi Adrien

Thank you very much for your help!

That was it :-) I completely forgot that I set this somewhere hidden 
inside my code.
I made a note in the pom file, such that I should not forget again 
during the next upgrade :-)


Or what is the best practice re setting / handling the codec?

Thanks

Michael

Am 01.10.22 um 08:06 schrieb Adrien Grand:

I would guess that you are configuring your IndexWriterConfig with a
"Lucene91Codec" instance. You need to replace it with a "Lucene94Codec"
instance.

Le sam. 1 oct. 2022, 06:12, Michael Wechner  a
écrit :


Hi

I have just upgraded from 9.1.0 to 9.4.0 and compiling works fine, but
when I run and re-index my data using KnnVectorField, then I receive the
following exception:

java.lang.UnsupportedOperationException: Old codecs may only be used for
reading
  at
org.apache.lucene.backward_codecs.lucene91.Lucene91HnswVectorsFormat.fieldsWriter(Lucene91HnswVectorsFormat.java:131)

~[lucene-backward-codecs-9.4.0.jar:9.4.0
d2e22e18c6c92b6a6ba0bbc26d78b5e82832f956 - sokolovm - 2022-09-30 14:55:13]
  at
org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat$FieldsWriter.getInstance(PerFieldKnnVectorsFormat.java:161)

~[lucene-core-9.4.0.jar:9.4.0 d2e22e18c6c92b6a6ba0bbc26d78b5e82832f956 -
sokolovm - 2022-09-30 14:55:13]
  at
org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat$FieldsWriter.addField(PerFieldKnnVectorsFormat.java:105)

~[lucene-core-9.4.0.jar:9.4.0 d2e22e18c6c92b6a6ba0bbc26d78b5e82832f956 -
sokolovm - 2022-09-30 14:55:13]
  at
org.apache.lucene.index.VectorValuesConsumer.addField(VectorValuesConsumer.java:70)

~[lucene-core-9.4.0.jar:9.4.0 d2e22e18c6c92b6a6ba0bbc26d78b5e82832f956 -
sokolovm - 2022-09-30 14:55:13]
  at
org.apache.lucene.index.IndexingChain.initializeFieldInfo(IndexingChain.java:665)

~[lucene-core-9.4.0.jar:9.4.0 d2e22e18c6c92b6a6ba0bbc26d78b5e82832f956 -
sokolovm - 2022-09-30 14:55:13]
  at
org.apache.lucene.index.IndexingChain.processDocument(IndexingChain.java:556)

~[lucene-core-9.4.0.jar:9.4.0 d2e22e18c6c92b6a6ba0bbc26d78b5e82832f956 -
sokolovm - 2022-09-30 14:55:13]
  at
org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:241)

~[lucene-core-9.4.0.jar:9.4.0 d2e22e18c6c92b6a6ba0bbc26d78b5e82832f956 -
sokolovm - 2022-09-30 14:55:13]
  at
org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:432)

~[lucene-core-9.4.0.jar:9.4.0 d2e22e18c6c92b6a6ba0bbc26d78b5e82832f956 -
sokolovm - 2022-09-30 14:55:13]
  at
org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1533)
~[lucene-core-9.4.0.jar:9.4.0 d2e22e18c6c92b6a6ba0bbc26d78b5e82832f956 -
sokolovm - 2022-09-30 14:55:13]
  at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1818)
~[lucene-core-9.4.0.jar:9.4.0 d2e22e18c6c92b6a6ba0bbc26d78b5e82832f956 -
sokolovm - 2022-09-30 14:55:13]
  at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1471)
~[lucene-core-9.4.0.jar:9.4.0 d2e22e18c6c92b6a6ba0bbc26d78b5e82832f956 -
sokolovm - 2022-09-30 14:55:13]

Any idea what I might be doing wrong?

Thanks

Michael

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Upgrading from 9.1.0. to 9.4.0: Old codecs may only be used for reading Lucene91HnswVectorsFormat.java

2022-09-30 Thread Michael Wechner


Hi

I have just upgraded from 9.1.0 to 9.4.0 and compiling works fine, but 
when I run and re-index my data using KnnVectorField, then I receive the 
following exception:


java.lang.UnsupportedOperationException: Old codecs may only be used for 
reading
    at 
org.apache.lucene.backward_codecs.lucene91.Lucene91HnswVectorsFormat.fieldsWriter(Lucene91HnswVectorsFormat.java:131) 
~[lucene-backward-codecs-9.4.0.jar:9.4.0 
d2e22e18c6c92b6a6ba0bbc26d78b5e82832f956 - sokolovm - 2022-09-30 14:55:13]
    at 
org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat$FieldsWriter.getInstance(PerFieldKnnVectorsFormat.java:161) 
~[lucene-core-9.4.0.jar:9.4.0 d2e22e18c6c92b6a6ba0bbc26d78b5e82832f956 - 
sokolovm - 2022-09-30 14:55:13]
    at 
org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat$FieldsWriter.addField(PerFieldKnnVectorsFormat.java:105) 
~[lucene-core-9.4.0.jar:9.4.0 d2e22e18c6c92b6a6ba0bbc26d78b5e82832f956 - 
sokolovm - 2022-09-30 14:55:13]
    at 
org.apache.lucene.index.VectorValuesConsumer.addField(VectorValuesConsumer.java:70) 
~[lucene-core-9.4.0.jar:9.4.0 d2e22e18c6c92b6a6ba0bbc26d78b5e82832f956 - 
sokolovm - 2022-09-30 14:55:13]
    at 
org.apache.lucene.index.IndexingChain.initializeFieldInfo(IndexingChain.java:665) 
~[lucene-core-9.4.0.jar:9.4.0 d2e22e18c6c92b6a6ba0bbc26d78b5e82832f956 - 
sokolovm - 2022-09-30 14:55:13]
    at 
org.apache.lucene.index.IndexingChain.processDocument(IndexingChain.java:556) 
~[lucene-core-9.4.0.jar:9.4.0 d2e22e18c6c92b6a6ba0bbc26d78b5e82832f956 - 
sokolovm - 2022-09-30 14:55:13]
    at 
org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:241) 
~[lucene-core-9.4.0.jar:9.4.0 d2e22e18c6c92b6a6ba0bbc26d78b5e82832f956 - 
sokolovm - 2022-09-30 14:55:13]
    at 
org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:432) 
~[lucene-core-9.4.0.jar:9.4.0 d2e22e18c6c92b6a6ba0bbc26d78b5e82832f956 - 
sokolovm - 2022-09-30 14:55:13]
    at 
org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1533) 
~[lucene-core-9.4.0.jar:9.4.0 d2e22e18c6c92b6a6ba0bbc26d78b5e82832f956 - 
sokolovm - 2022-09-30 14:55:13]
    at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1818) 
~[lucene-core-9.4.0.jar:9.4.0 d2e22e18c6c92b6a6ba0bbc26d78b5e82832f956 - 
sokolovm - 2022-09-30 14:55:13]
    at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1471) 
~[lucene-core-9.4.0.jar:9.4.0 d2e22e18c6c92b6a6ba0bbc26d78b5e82832f956 - 
sokolovm - 2022-09-30 14:55:13]


Any idea what I might be doing wrong?

Thanks

Michael

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: [ANNOUNCE] Apache Lucene 9.4.0 released

2022-09-30 Thread Michael Wechner


great, thank you very much!

Just in time for ApacheCon :-)

Am 01.10.22 um 00:09 schrieb Michael Sokolov:

The Lucene PMC is pleased to announce the release of Apache Lucene 9.4.0.

Apache Lucene is a high-performance, full-featured search engine
library written entirely in Java. It is a technology suitable for
nearly any application that requires structured search, full-text
search, faceting, nearest-neighbor search across high-dimensionality
vectors, spell correction or query suggestions.

This release contains numerous bug fixes, optimizations, and
improvements, some of which are highlighted below. The release is
available for immediate download at:

https://lucene.apache.org/core/downloads.html

Lucene 9.4.0 Release Highlights:

New features

Added ShapeDocValues/Field, a unified abstraction to represent
existing types: XY and lat/long.
FacetSets can now be filtered using a Query via MatchingFacetSetCounts.
SortField now allows control over whether to apply index-sort optimizations.
Support for Java 19 foreign memory access ("project Panama") was
added. Applications started with command line parameter "java
--enable-preview" will automatically use the new foreign memory API of
Java 19 to access indexes on disk with MMapDirectory. This is an
opt-in feature and requires explicit Java command line flag passed to
your application's Java process (e.g., modify startup parameters of
Solr or Elasticsearch/Opensearch)! When enabled, Lucene logs a notice
using java.util.logging. Please test thoroughly and report
bugs/slowness to Lucene's mailing list. When the new API is used,
MMapDirectory will mmap Lucene indexes in chunks of 16 GiB (instead of
1 GiB) and indexes closed while queries are running can no longer
crash the JVM.

Optimizations

Added support for dynamic pruning to queries sorted by a string field
that is indexed with both terms and SORTED or SORTED_SET doc values.
This can lead to dramatic speedups when applicable.
TermInSetQuery is optimized for the case when one of its terms matches
all docs in a segment, and it now provides cost estimation, making it
usable with IndexOrDocValuesQuery for better query planning.
KnnVector fields can now be stored with reduced (8-bit) precision,
saving storage and yielding a small query latency improvement.

Other

KnnVector fields' HNSW graphs are now created incrementally when new
documents are added, rather than all-at-once when flushing. This
yields more consistent predictable behavior at the cost of an overall
increase in indexing time.
randomizedtesting dependency upgraded to 2.8.1
addIndexes(CodecReader) now respects MergePolicy and MergeScheduler,
enabling it to do its work concurrently.

Please read CHANGES.txt for a full list of new features and changes:

https://lucene.apache.org/core/9_4_0/changes/Changes.html

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: How to filter KnnVectorQuery with multiple terms?

2022-09-01 Thread Michael Wechner


great, thank you very much for clarifying!

Michael

Am 01.09.22 um 08:43 schrieb Uwe Schindler:

Simply said,

the last parameter of KnnVectorQuery is a Lucene query, so you can 
pass any query type there. TermInSetQuery is a good idea for doing a 
"IN multiple terms" query. But you can also pass a BooleanQuery with 
multiple terms or a combination of other queries, a numeric range,... 
or a fulltext query out of Lucene's query parsers.


Uwe

Am 31.08.2022 um 22:19 schrieb Michael Wechner:

Hi Matt

Thanks very much for your feedback!

According to your links I will try

Collection terms =new ArrayList();
terms.add(new BytesRef(classification1));
terms.add(new BytesRef(classification2));
Query filter =new TermInSetQuery(CLASSIFICATION_FIELD, terms);

query =new KnnVectorQuery(VECTOR_FIELD, queryVector, k, filter);

All the best

Michael



Am 31.08.22 um 20:24 schrieb Matt Davis:
If I understand correctly, I believe you would want to use a 
TermInSetQuery

query.  An example usage can be found here
https://github.com/zuliaio/zuliasearch/blob/main/zulia-server/src/main/java/io/zulia/server/index/ZuliaIndex.java#L398. 




You can also check out the usage of KnnVectorQuery here:
https://github.com/zuliaio/zuliasearch/blob/main/zulia-server/src/main/java/io/zulia/server/index/ZuliaIndex.java#L419 

noting that in this case the getPreFilter method a few lines below 
uses a

BooleanQuery.Builder.

As noted in TermsInSetQuery (
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/TermInSetQuery.java#L62) 

multiple terms could be represented as a boolean query with 
Occur.SHOULD.


~Matt

On Wed, Aug 31, 2022 at 11:15 AM Michael 
Wechner

wrote:


Hi

I am currently filtering a KnnVectorQuery as follows

Query filter =new TermQuery(new Term(CLASSIFICATION_FIELD,
classification));
query =new KnnVectorQuery(VECTOR_FIELD, queryVector, k, filter);

but it is not clear to me how I can filter for multiple terms.

Should I subclass MultiTermQuery and use as filter, just as I use
TermQuery as filter above?

Thanks

Michael






-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: How to filter KnnVectorQuery with multiple terms?

2022-08-31 Thread Michael Wechner


Hi Matt

Thanks very much for your feedback!

According to your links I will try

Collection terms =new ArrayList();
terms.add(new BytesRef(classification1));
terms.add(new BytesRef(classification2));
Query filter =new TermInSetQuery(CLASSIFICATION_FIELD, terms);

query =new KnnVectorQuery(VECTOR_FIELD, queryVector, k, filter);

All the best

Michael



Am 31.08.22 um 20:24 schrieb Matt Davis:

If I understand correctly, I believe you would want to use a TermInSetQuery
query.  An example usage can be found here
https://github.com/zuliaio/zuliasearch/blob/main/zulia-server/src/main/java/io/zulia/server/index/ZuliaIndex.java#L398.


You can also check out the usage of KnnVectorQuery here:
https://github.com/zuliaio/zuliasearch/blob/main/zulia-server/src/main/java/io/zulia/server/index/ZuliaIndex.java#L419
noting that in this case the getPreFilter method a few lines below uses a
BooleanQuery.Builder.

As noted in TermsInSetQuery (
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/TermInSetQuery.java#L62)
multiple terms could be represented as a boolean query with Occur.SHOULD.

~Matt

On Wed, Aug 31, 2022 at 11:15 AM Michael Wechner
wrote:


Hi

I am currently filtering a KnnVectorQuery as follows

Query filter =new TermQuery(new Term(CLASSIFICATION_FIELD,
classification));
query =new KnnVectorQuery(VECTOR_FIELD, queryVector, k, filter);

but it is not clear to me how I can filter for multiple terms.

Should I subclass MultiTermQuery and use as filter, just as I use
TermQuery as filter above?

Thanks

Michael

How to filter KnnVectorQuery with multiple terms?

2022-08-31 Thread Michael Wechner


Hi

I am currently filtering a KnnVectorQuery as follows

Query filter =new TermQuery(new Term(CLASSIFICATION_FIELD, classification));
query =new KnnVectorQuery(VECTOR_FIELD, queryVector, k, filter);

but it is not clear to me how I can filter for multiple terms.

Should I subclass MultiTermQuery and use as filter, just as I use TermQuery as 
filter above?

Thanks

Michael

Re: Multi-Value query test

2022-06-23 Thread Michael Wechner

Maybe I misunderstand the problem, but why don't you decouple showing 
the results from the results of the query?


Am 23.06.22 um 14:03 schrieb Patrick Bernardina:

How to test if a value in a multi-value field matches a specific query?

Example of the problem:

I've created a query to return all documents of some specific authors. The
authors field contains multi-value sorted set.
When showing the result, I want to show only the name of the authors
specified on the query, even if the document has more authors.




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Auto-complete in Lucene

2022-05-25 Thread Michael Wechner

we are using  AnalyzingInfixSuggester but I would also be curious to 
know whether this is the best way :-)


Thanks

Michael

Am 25.05.22 um 14:39 schrieb Anastasiya Tarasenko:

Hi All,

I have a question regarding auto-complete functionality in Lucene.
On the StackOverflow the suggestion regarding implementation is
1) Use AnalyzingSuggester
2) Use PrefixQuery and SpanFirstQuery with the IndexSearcher.search()

What is the differences between them? Do they have big differences in
performance?

Currently for suggest we are using SuggestIndexSearcher.suggest() method,
but there is no availability to add scoring by some field (what exactly we
need) and I did not found any recommendation on the StackOverflow for this
method.

And now the question is what is the better to use 1# or 2# for
auto-complete.
Could you advice?

Thank you

Re: Example / Demo re support for filtering in nearest-neighbor vector search (Lucene 9.1.0)

2022-05-24 Thread Michael Wechner


yes, sure, will keep you posted :-)

Thanks

Michael

Am 24.05.22 um 02:10 schrieb Julie Tibshirani:

Michael -- I am not totally clear on what you have in mind. Would you be up
for opening a pull request in GitHub (or JIRA issue) to show the idea, and
we can discuss the details there?

Matt -- thanks for sharing!

Julie

On Sun, May 22, 2022 at 3:29 PM Matt Davis  wrote:


Thanks Julie.  I was able to implement vector search in Zulia with your
pointers.  The pull request might be helpful to others:
https://github.com/zuliaio/zuliasearch/pull/70

Thanks,
Matt

On Fri, May 20, 2022 at 9:23 AM Michael Wechner 
Hi Julie

I got it running and it seems to work fine so far :-)

Re an example for the demo package, I guess this would go here




https://lucene.apache.org/core/9_1_0/demo/org/apache/lucene/demo/knn/package-summary.html

and I thought of something like "KnnVectorPreFilterQuery.java"

Query preFilterQuery =new TermQuery(new Term(TOPIC_FIELD,"general"));
if (filter !=null) {
  log.info("Filter applied before the vector search: " +
preFilterQuery);
}
Query query =new KnnVectorQuery(VECTOR_FIELD, queryVector, k,
preFilterQuery);

TopDocs topDocs = searcher.search(query, k);

Does that make sense to you?

Thanks

Michael

Am 11.05.22 um 07:59 schrieb Michael Wechner:

Hi Julie

Cool, thanks!

I try to apply it and if it works could create an example to the demo
package.

Will keep you posted :-)

Thanks

Michael

Am 11.05.22 um 02:13 schrieb Julie Tibshirani:

Hello Michael, we don't yet have an example of kNN with filtering in

the

demo package. This would be nice to add!

For now maybe looking at the unit tests could give a sense of how to

use

it. Here's an example:


https://github.com/apache/lucene/blob/main/lucene/core/src/test/org/apache/lucene/search/TestKnnVectorQuery.java#L115-L127
.

The idea is that KnnVectorQuery optionally accepts a Query as a
filter, and
returns the k nearest vectors that also match the filter. Many people
refer
to this as "kNN with prefiltering" (as opposed to "postfiltering",

where

the filter is applied *after* the kNN search, so in the end you may
receive
fewer than k matches).

Let us know if you run into any questions/ issues while trying it out!

Julie

On Mon, May 9, 2022 at 8:08 AM Michael Wechner

wrote:


sorry for the URLs below.

I have tested Twilio SendGrid as outgoing server and it just rewrote
the
URLs


https://issues.apache.org/jira/browse/SOLR-15947
https://issues.apache.org/jira/browse/LUCENE-10382

and




https://lucene.apache.org/core/9_1_0/demo/org/apache/lucene/demo/knn/package-summary.html


which I was not aware of, but disabled the tracking now and hope it
will
be ok now.

Thanks

Michael

Am 09.05.22 um 15:12 schrieb Michael Wechner:

Hi

I noticed that Lucene 9.1.0 supports filtering in nearest-neighbor
vector search, which is great :-)

I have found



http://url7093.wyona.com/ls/click?upn=JOH5Fjdv9AA9sbvUyiP84WWONyl36e4Tdd3VZFG-2B7pcYPJTPhVT3xqtcUDjPgQX5jI0WYWlJZX8h9NDC6okDRg-3D-3DHvvY_UMWFA-2BOn91WS4mEQPCWI9gZNzEZlJPmWPGP2CeMD7g4c-2Fpo3g6VPyd4ghH4X9o8sJ-2Bl292KOe2-2F30WmSZB1KHnF6KpgvICPKY8k5m30V-2FWrDvoSQLIaimtz2YHrzSMNV98es7-2BeXQS174-2B0EHPnQVtOYqUPojoZgGkqmovRXrz1dJlfs9dtFDGqSjpYaFiMaVBoiDTrpJ-2FUkuanwcx6R8UgvxBCq08DUa7vhRJqir7M-3D
http://url7093.wyona.com/ls/click?upn=JOH5Fjdv9AA9sbvUyiP84WWONyl36e4Tdd3VZFG-2B7pe76-2Fa6XWGARorEmYO8A-2BeVhPN2B1iPvCEp9XG8WpVE6w-3D-3DHR5k_UMWFA-2BOn91WS4mEQPCWI9gZNzEZlJPmWPGP2CeMD7g4c-2Fpo3g6VPyd4ghH4X9o8sJ-2Bl292KOe2-2F30WmSZB1KHiGuZh3RFDkfg6-2FfNpOG8Tly-2FHwK25rp-2F24-2BPYZV6e5mGIWVQ5bpJ0l9u4lRzO6rySncTjxQZEPOjzZIrDMh-2Fo17VBGBagQ0Gr6G-2BAySO2ZdtDBthWjE-2F7HwxioFG9XrEUBVS79a0mcaaPKM-2BdzT9Cc-3D

and



http://url7093.wyona.com/ls/click?upn=JOH5Fjdv9AA9sbvUyiP84Rv2gN4NEUQqv-2Fn4lbJOY6mZGeN60klU5tyssLGPfHHB3IBl2Fx9C7un4UF2pBgYYcEd15H8F-2FPYEn4LTL-2Bz8fMFeo4z-2BjB3yMGv345VDdnStdESYCXN-2FD-2BQOSZSTNQLbQ-3D-3D9uTB_UMWFA-2BOn91WS4mEQPCWI9gZNzEZlJPmWPGP2CeMD7g4c-2Fpo3g6VPyd4ghH4X9o8sJ-2Bl292KOe2-2F30WmSZB1KHkzFWMImz7LKaiDu0g-2BZTsPclbKiyBoQiJHrZiOk5CuKsixeOFVVfvwAjyEhV-2F5McxrC76Q-2F72ILNowoPMFyMwXdaUF-2FhFh6HF0aWgai16l9zSdZIETAq46vRruPFO9ZqlRY6XiSu-2FBiKe4r5xiM0vA-3D

but it is not really clear to me how to use it.

Is there some additional documentation / example / demo how I can
combine filtering with vector search?

Thanks

Michael





-

To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Example / Demo re support for filtering in nearest-neighbor vector search (Lucene 9.1.0)

2022-05-20 Thread Michael Wechner


Hi Julie

I got it running and it seems to work fine so far :-)

Re an example for the demo package, I guess this would go here

https://lucene.apache.org/core/9_1_0/demo/org/apache/lucene/demo/knn/package-summary.html

and I thought of something like "KnnVectorPreFilterQuery.java"

Query preFilterQuery =new TermQuery(new Term(TOPIC_FIELD,"general"));
if (filter !=null) {
log.info("Filter applied before the vector search: " + preFilterQuery);
}
Query query =new KnnVectorQuery(VECTOR_FIELD, queryVector, k, preFilterQuery);

TopDocs topDocs = searcher.search(query, k);

Does that make sense to you?

Thanks

Michael

Am 11.05.22 um 07:59 schrieb Michael Wechner:

Hi Julie

Cool, thanks!

I try to apply it and if it works could create an example to the demo 
package.


Will keep you posted :-)

Thanks

Michael

Am 11.05.22 um 02:13 schrieb Julie Tibshirani:

Hello Michael, we don't yet have an example of kNN with filtering in the
demo package. This would be nice to add!

For now maybe looking at the unit tests could give a sense of how to use
it. Here's an example:
https://github.com/apache/lucene/blob/main/lucene/core/src/test/org/apache/lucene/search/TestKnnVectorQuery.java#L115-L127. 

The idea is that KnnVectorQuery optionally accepts a Query as a 
filter, and
returns the k nearest vectors that also match the filter. Many people 
refer

to this as "kNN with prefiltering" (as opposed to "postfiltering", where
the filter is applied *after* the kNN search, so in the end you may 
receive

fewer than k matches).

Let us know if you run into any questions/ issues while trying it out!

Julie

On Mon, May 9, 2022 at 8:08 AM Michael Wechner 


wrote:


sorry for the URLs below.

I have tested Twilio SendGrid as outgoing server and it just rewrote 
the

URLs


https://issues.apache.org/jira/browse/SOLR-15947
https://issues.apache.org/jira/browse/LUCENE-10382

and


https://lucene.apache.org/core/9_1_0/demo/org/apache/lucene/demo/knn/package-summary.html 



which I was not aware of, but disabled the tracking now and hope it 
will

be ok now.

Thanks

Michael

Am 09.05.22 um 15:12 schrieb Michael Wechner:

Hi

I noticed that Lucene 9.1.0 supports filtering in nearest-neighbor
vector search, which is great :-)

I have found


http://url7093.wyona.com/ls/click?upn=JOH5Fjdv9AA9sbvUyiP84WWONyl36e4Tdd3VZFG-2B7pcYPJTPhVT3xqtcUDjPgQX5jI0WYWlJZX8h9NDC6okDRg-3D-3DHvvY_UMWFA-2BOn91WS4mEQPCWI9gZNzEZlJPmWPGP2CeMD7g4c-2Fpo3g6VPyd4ghH4X9o8sJ-2Bl292KOe2-2F30WmSZB1KHnF6KpgvICPKY8k5m30V-2FWrDvoSQLIaimtz2YHrzSMNV98es7-2BeXQS174-2B0EHPnQVtOYqUPojoZgGkqmovRXrz1dJlfs9dtFDGqSjpYaFiMaVBoiDTrpJ-2FUkuanwcx6R8UgvxBCq08DUa7vhRJqir7M-3D 



http://url7093.wyona.com/ls/click?upn=JOH5Fjdv9AA9sbvUyiP84WWONyl36e4Tdd3VZFG-2B7pe76-2Fa6XWGARorEmYO8A-2BeVhPN2B1iPvCEp9XG8WpVE6w-3D-3DHR5k_UMWFA-2BOn91WS4mEQPCWI9gZNzEZlJPmWPGP2CeMD7g4c-2Fpo3g6VPyd4ghH4X9o8sJ-2Bl292KOe2-2F30WmSZB1KHiGuZh3RFDkfg6-2FfNpOG8Tly-2FHwK25rp-2F24-2BPYZV6e5mGIWVQ5bpJ0l9u4lRzO6rySncTjxQZEPOjzZIrDMh-2Fo17VBGBagQ0Gr6G-2BAySO2ZdtDBthWjE-2F7HwxioFG9XrEUBVS79a0mcaaPKM-2BdzT9Cc-3D 



and


http://url7093.wyona.com/ls/click?upn=JOH5Fjdv9AA9sbvUyiP84Rv2gN4NEUQqv-2Fn4lbJOY6mZGeN60klU5tyssLGPfHHB3IBl2Fx9C7un4UF2pBgYYcEd15H8F-2FPYEn4LTL-2Bz8fMFeo4z-2BjB3yMGv345VDdnStdESYCXN-2FD-2BQOSZSTNQLbQ-3D-3D9uTB_UMWFA-2BOn91WS4mEQPCWI9gZNzEZlJPmWPGP2CeMD7g4c-2Fpo3g6VPyd4ghH4X9o8sJ-2Bl292KOe2-2F30WmSZB1KHkzFWMImz7LKaiDu0g-2BZTsPclbKiyBoQiJHrZiOk5CuKsixeOFVVfvwAjyEhV-2F5McxrC76Q-2F72ILNowoPMFyMwXdaUF-2FhFh6HF0aWgai16l9zSdZIETAq46vRruPFO9ZqlRY6XiSu-2FBiKe4r5xiM0vA-3D 



but it is not really clear to me how to use it.

Is there some additional documentation / example / demo how I can
combine filtering with vector search?

Thanks

Michael



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Example / Demo re support for filtering in nearest-neighbor vector search (Lucene 9.1.0)

2022-05-10 Thread Michael Wechner


Hi Julie

Cool, thanks!

I try to apply it and if it works could create an example to the demo 
package.


Will keep you posted :-)

Thanks

Michael

Am 11.05.22 um 02:13 schrieb Julie Tibshirani:

Hello Michael, we don't yet have an example of kNN with filtering in the
demo package. This would be nice to add!

For now maybe looking at the unit tests could give a sense of how to use
it. Here's an example:
https://github.com/apache/lucene/blob/main/lucene/core/src/test/org/apache/lucene/search/TestKnnVectorQuery.java#L115-L127.
The idea is that KnnVectorQuery optionally accepts a Query as a filter, and
returns the k nearest vectors that also match the filter. Many people refer
to this as "kNN with prefiltering" (as opposed to "postfiltering", where
the filter is applied *after* the kNN search, so in the end you may receive
fewer than k matches).

Let us know if you run into any questions/ issues while trying it out!

Julie

On Mon, May 9, 2022 at 8:08 AM Michael Wechner 
wrote:


sorry for the URLs below.

I have tested Twilio SendGrid as outgoing server and it just rewrote the
URLs


https://issues.apache.org/jira/browse/SOLR-15947
https://issues.apache.org/jira/browse/LUCENE-10382

and


https://lucene.apache.org/core/9_1_0/demo/org/apache/lucene/demo/knn/package-summary.html

which I was not aware of, but disabled the tracking now and hope it will
be ok now.

Thanks

Michael

Am 09.05.22 um 15:12 schrieb Michael Wechner:

Hi

I noticed that Lucene 9.1.0 supports filtering in nearest-neighbor
vector search, which is great :-)

I have found



http://url7093.wyona.com/ls/click?upn=JOH5Fjdv9AA9sbvUyiP84WWONyl36e4Tdd3VZFG-2B7pcYPJTPhVT3xqtcUDjPgQX5jI0WYWlJZX8h9NDC6okDRg-3D-3DHvvY_UMWFA-2BOn91WS4mEQPCWI9gZNzEZlJPmWPGP2CeMD7g4c-2Fpo3g6VPyd4ghH4X9o8sJ-2Bl292KOe2-2F30WmSZB1KHnF6KpgvICPKY8k5m30V-2FWrDvoSQLIaimtz2YHrzSMNV98es7-2BeXQS174-2B0EHPnQVtOYqUPojoZgGkqmovRXrz1dJlfs9dtFDGqSjpYaFiMaVBoiDTrpJ-2FUkuanwcx6R8UgvxBCq08DUa7vhRJqir7M-3D



http://url7093.wyona.com/ls/click?upn=JOH5Fjdv9AA9sbvUyiP84WWONyl36e4Tdd3VZFG-2B7pe76-2Fa6XWGARorEmYO8A-2BeVhPN2B1iPvCEp9XG8WpVE6w-3D-3DHR5k_UMWFA-2BOn91WS4mEQPCWI9gZNzEZlJPmWPGP2CeMD7g4c-2Fpo3g6VPyd4ghH4X9o8sJ-2Bl292KOe2-2F30WmSZB1KHiGuZh3RFDkfg6-2FfNpOG8Tly-2FHwK25rp-2F24-2BPYZV6e5mGIWVQ5bpJ0l9u4lRzO6rySncTjxQZEPOjzZIrDMh-2Fo17VBGBagQ0Gr6G-2BAySO2ZdtDBthWjE-2F7HwxioFG9XrEUBVS79a0mcaaPKM-2BdzT9Cc-3D


and



http://url7093.wyona.com/ls/click?upn=JOH5Fjdv9AA9sbvUyiP84Rv2gN4NEUQqv-2Fn4lbJOY6mZGeN60klU5tyssLGPfHHB3IBl2Fx9C7un4UF2pBgYYcEd15H8F-2FPYEn4LTL-2Bz8fMFeo4z-2BjB3yMGv345VDdnStdESYCXN-2FD-2BQOSZSTNQLbQ-3D-3D9uTB_UMWFA-2BOn91WS4mEQPCWI9gZNzEZlJPmWPGP2CeMD7g4c-2Fpo3g6VPyd4ghH4X9o8sJ-2Bl292KOe2-2F30WmSZB1KHkzFWMImz7LKaiDu0g-2BZTsPclbKiyBoQiJHrZiOk5CuKsixeOFVVfvwAjyEhV-2F5McxrC76Q-2F72ILNowoPMFyMwXdaUF-2FhFh6HF0aWgai16l9zSdZIETAq46vRruPFO9ZqlRY6XiSu-2FBiKe4r5xiM0vA-3D


but it is not really clear to me how to use it.

Is there some additional documentation / example / demo how I can
combine filtering with vector search?

Thanks

Michael



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Example / Demo re support for filtering in nearest-neighbor vector search (Lucene 9.1.0)

2022-05-09 Thread Michael Wechner


sorry for the URLs below.

I have tested Twilio SendGrid as outgoing server and it just rewrote the 
URLs



https://issues.apache.org/jira/browse/SOLR-15947
https://issues.apache.org/jira/browse/LUCENE-10382

and

https://lucene.apache.org/core/9_1_0/demo/org/apache/lucene/demo/knn/package-summary.html

which I was not aware of, but disabled the tracking now and hope it will 
be ok now.


Thanks

Michael

Am 09.05.22 um 15:12 schrieb Michael Wechner:

Hi

I noticed that Lucene 9.1.0 supports filtering in nearest-neighbor 
vector search, which is great :-)


I have found

http://url7093.wyona.com/ls/click?upn=JOH5Fjdv9AA9sbvUyiP84WWONyl36e4Tdd3VZFG-2B7pcYPJTPhVT3xqtcUDjPgQX5jI0WYWlJZX8h9NDC6okDRg-3D-3DHvvY_UMWFA-2BOn91WS4mEQPCWI9gZNzEZlJPmWPGP2CeMD7g4c-2Fpo3g6VPyd4ghH4X9o8sJ-2Bl292KOe2-2F30WmSZB1KHnF6KpgvICPKY8k5m30V-2FWrDvoSQLIaimtz2YHrzSMNV98es7-2BeXQS174-2B0EHPnQVtOYqUPojoZgGkqmovRXrz1dJlfs9dtFDGqSjpYaFiMaVBoiDTrpJ-2FUkuanwcx6R8UgvxBCq08DUa7vhRJqir7M-3D 

http://url7093.wyona.com/ls/click?upn=JOH5Fjdv9AA9sbvUyiP84WWONyl36e4Tdd3VZFG-2B7pe76-2Fa6XWGARorEmYO8A-2BeVhPN2B1iPvCEp9XG8WpVE6w-3D-3DHR5k_UMWFA-2BOn91WS4mEQPCWI9gZNzEZlJPmWPGP2CeMD7g4c-2Fpo3g6VPyd4ghH4X9o8sJ-2Bl292KOe2-2F30WmSZB1KHiGuZh3RFDkfg6-2FfNpOG8Tly-2FHwK25rp-2F24-2BPYZV6e5mGIWVQ5bpJ0l9u4lRzO6rySncTjxQZEPOjzZIrDMh-2Fo17VBGBagQ0Gr6G-2BAySO2ZdtDBthWjE-2F7HwxioFG9XrEUBVS79a0mcaaPKM-2BdzT9Cc-3D 



and

http://url7093.wyona.com/ls/click?upn=JOH5Fjdv9AA9sbvUyiP84Rv2gN4NEUQqv-2Fn4lbJOY6mZGeN60klU5tyssLGPfHHB3IBl2Fx9C7un4UF2pBgYYcEd15H8F-2FPYEn4LTL-2Bz8fMFeo4z-2BjB3yMGv345VDdnStdESYCXN-2FD-2BQOSZSTNQLbQ-3D-3D9uTB_UMWFA-2BOn91WS4mEQPCWI9gZNzEZlJPmWPGP2CeMD7g4c-2Fpo3g6VPyd4ghH4X9o8sJ-2Bl292KOe2-2F30WmSZB1KHkzFWMImz7LKaiDu0g-2BZTsPclbKiyBoQiJHrZiOk5CuKsixeOFVVfvwAjyEhV-2F5McxrC76Q-2F72ILNowoPMFyMwXdaUF-2FhFh6HF0aWgai16l9zSdZIETAq46vRruPFO9ZqlRY6XiSu-2FBiKe4r5xiM0vA-3D 



but it is not really clear to me how to use it.

Is there some additional documentation / example / demo how I can 
combine filtering with vector search?


Thanks

Michael



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Example / Demo re support for filtering in nearest-neighbor vector search (Lucene 9.1.0)

2022-05-09 Thread Michael Wechner


Hi

I noticed that Lucene 9.1.0 supports filtering in nearest-neighbor 
vector search, which is great :-)


I have found

http://url7093.wyona.com/ls/click?upn=JOH5Fjdv9AA9sbvUyiP84WWONyl36e4Tdd3VZFG-2B7pcYPJTPhVT3xqtcUDjPgQX5jI0WYWlJZX8h9NDC6okDRg-3D-3DHvvY_UMWFA-2BOn91WS4mEQPCWI9gZNzEZlJPmWPGP2CeMD7g4c-2Fpo3g6VPyd4ghH4X9o8sJ-2Bl292KOe2-2F30WmSZB1KHnF6KpgvICPKY8k5m30V-2FWrDvoSQLIaimtz2YHrzSMNV98es7-2BeXQS174-2B0EHPnQVtOYqUPojoZgGkqmovRXrz1dJlfs9dtFDGqSjpYaFiMaVBoiDTrpJ-2FUkuanwcx6R8UgvxBCq08DUa7vhRJqir7M-3D
http://url7093.wyona.com/ls/click?upn=JOH5Fjdv9AA9sbvUyiP84WWONyl36e4Tdd3VZFG-2B7pe76-2Fa6XWGARorEmYO8A-2BeVhPN2B1iPvCEp9XG8WpVE6w-3D-3DHR5k_UMWFA-2BOn91WS4mEQPCWI9gZNzEZlJPmWPGP2CeMD7g4c-2Fpo3g6VPyd4ghH4X9o8sJ-2Bl292KOe2-2F30WmSZB1KHiGuZh3RFDkfg6-2FfNpOG8Tly-2FHwK25rp-2F24-2BPYZV6e5mGIWVQ5bpJ0l9u4lRzO6rySncTjxQZEPOjzZIrDMh-2Fo17VBGBagQ0Gr6G-2BAySO2ZdtDBthWjE-2F7HwxioFG9XrEUBVS79a0mcaaPKM-2BdzT9Cc-3D

and

http://url7093.wyona.com/ls/click?upn=JOH5Fjdv9AA9sbvUyiP84Rv2gN4NEUQqv-2Fn4lbJOY6mZGeN60klU5tyssLGPfHHB3IBl2Fx9C7un4UF2pBgYYcEd15H8F-2FPYEn4LTL-2Bz8fMFeo4z-2BjB3yMGv345VDdnStdESYCXN-2FD-2BQOSZSTNQLbQ-3D-3D9uTB_UMWFA-2BOn91WS4mEQPCWI9gZNzEZlJPmWPGP2CeMD7g4c-2Fpo3g6VPyd4ghH4X9o8sJ-2Bl292KOe2-2F30WmSZB1KHkzFWMImz7LKaiDu0g-2BZTsPclbKiyBoQiJHrZiOk5CuKsixeOFVVfvwAjyEhV-2F5McxrC76Q-2F72ILNowoPMFyMwXdaUF-2FhFh6HF0aWgai16l9zSdZIETAq46vRruPFO9ZqlRY6XiSu-2FBiKe4r5xiM0vA-3D

but it is not really clear to me how to use it.

Is there some additional documentation / example / demo how I can 
combine filtering with vector search?


Thanks

Michael



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Example / Demo re support for filtering in nearest-neighbor vector search (Lucene 9.1.0)

2022-05-09 Thread Michael Wechner


Hi

I noticed that Lucene 9.1.0 supports filtering in nearest-neighbor 
vector search, which is great


I have found

https://issues.apache.org/jira/browse/SOLR-15947
https://issues.apache.org/jira/browse/LUCENE-10382

and

https://lucene.apache.org/core/9_1_0/demo/org/apache/lucene/demo/knn/package-summary.html 



but it is not really clear to me how to use it.

Is there some additional documentation / example / demo how I can 
combine filtering with vector search?


Thanks

Michael

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: New user questions about demo, downloads, and IRC

2022-04-26 Thread Michael Wechner


great, thanks!

Am 26.04.22 um 21:48 schrieb Michael Sokolov:

thanks, I fixed the doc!

On Tue, Apr 26, 2022 at 9:13 AM Bridger Dyson-Smith
 wrote:

Hi Michael -

On Mon, Apr 25, 2022 at 5:38 PM Michael Wechner 
wrote:


Hi Bridger

Inside

https://dlcdn.apache.org/lucene/java/9.1.0/lucene-9.1.0.tgz

you should find

modules/lucene-core-9.1.0.jar
modules/lucene-queryparser-9.1.0.jar
modules/lucene-analysis-common-9.1.0.jar
modules/lucene-demo-9.1.0.jar

Yes, those are there!  I wasn't sure if a different directory structure

would be available if building from source, but in any case I'll try
working through the demo.

I guess the documentation is not quite right.



Re your second question, there are two channels on Slack

https://app.slack.com/client/T4S1WH2J3/CE70MDPMF (#lucene-dev)
https://app.slack.com/client/T4S1WH2J3/C01E88Y8TQD (#lucene-vector)

Are these channels appropriate places for new user talk/questions?



HTH

Michael

Very helpful indeed -- thank you very kindly for your time.

Best,

Bridger



Am 25.04.22 um 21:27 schrieb Bridger Dyson-Smith:

Hi all -

I hope these questions are acceptable for this particular list.

I have a combined question re the 9.1.0 demo[1] and the binary

release[2]:

the demo suggests that there should be a `core/` directory, as well as
others, however, after unpacking the TAR, I'm not seeing any:
) ls -1
CHANGES.txt
JRE_VERSION_MIGRATION.md
LICENSE.txt
MIGRATE.md
NOTICE.txt
README.md
SYSTEM_REQUIREMENTS.md
bin/
docs/
licenses/
modules/
modules-test-framework/
modules-thirdparty/

Is downloading the source and building the recommended approach here?

Also, are the Lucene folks anywhere on liberachat vs freenode? Many
communities seem to have moved away from freenode and I was curious if

that

was the case with Lucene's IRC, or if people were still using freenode

(no

big deal either way - just curious!).

Thanks very much for your time!
Best,
Bridger

[1] https://lucene.apache.org/core/9_1_0/demo/index.html
[2]

https://www.apache.org/dyn/closer.lua/lucene/java/9.1.0/lucene-9.1.0.tgz

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: New user questions about demo, downloads, and IRC

2022-04-25 Thread Michael Wechner


Maybe one of the Lucene committers could quickly fix the documentation at

https://github.com/apache/lucene/blob/main/lucene/demo/src/java/overview.html

by replacing "core/" with "modules/"

Takes probably less time than reviewing a PR, but let me know in case 
you prefer a PR


Thanks

Michael

Am 25.04.22 um 23:37 schrieb Michael Wechner:

Hi Bridger

Inside

https://dlcdn.apache.org/lucene/java/9.1.0/lucene-9.1.0.tgz

you should find

modules/lucene-core-9.1.0.jar
modules/lucene-queryparser-9.1.0.jar
modules/lucene-analysis-common-9.1.0.jar
modules/lucene-demo-9.1.0.jar

I guess the documentation is not quite right.

Re your second question, there are two channels on Slack

https://app.slack.com/client/T4S1WH2J3/CE70MDPMF (#lucene-dev)
https://app.slack.com/client/T4S1WH2J3/C01E88Y8TQD (#lucene-vector)

HTH

Michael



Am 25.04.22 um 21:27 schrieb Bridger Dyson-Smith:

Hi all -

I hope these questions are acceptable for this particular list.

I have a combined question re the 9.1.0 demo[1] and the binary 
release[2]:


the demo suggests that there should be a `core/` directory, as well as
others, however, after unpacking the TAR, I'm not seeing any:
) ls -1
CHANGES.txt
JRE_VERSION_MIGRATION.md
LICENSE.txt
MIGRATE.md
NOTICE.txt
README.md
SYSTEM_REQUIREMENTS.md
bin/
docs/
licenses/
modules/
modules-test-framework/
modules-thirdparty/

Is downloading the source and building the recommended approach here?

Also, are the Lucene folks anywhere on liberachat vs freenode? Many
communities seem to have moved away from freenode and I was curious 
if that
was the case with Lucene's IRC, or if people were still using 
freenode (no

big deal either way - just curious!).

Thanks very much for your time!
Best,
Bridger

[1] https://lucene.apache.org/core/9_1_0/demo/index.html
[2] 
https://www.apache.org/dyn/closer.lua/lucene/java/9.1.0/lucene-9.1.0.tgz





-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: New user questions about demo, downloads, and IRC

2022-04-25 Thread Michael Wechner


Hi Bridger

Inside

https://dlcdn.apache.org/lucene/java/9.1.0/lucene-9.1.0.tgz

you should find

modules/lucene-core-9.1.0.jar
modules/lucene-queryparser-9.1.0.jar
modules/lucene-analysis-common-9.1.0.jar
modules/lucene-demo-9.1.0.jar

I guess the documentation is not quite right.

Re your second question, there are two channels on Slack

https://app.slack.com/client/T4S1WH2J3/CE70MDPMF (#lucene-dev)
https://app.slack.com/client/T4S1WH2J3/C01E88Y8TQD (#lucene-vector)

HTH

Michael



Am 25.04.22 um 21:27 schrieb Bridger Dyson-Smith:

Hi all -

I hope these questions are acceptable for this particular list.

I have a combined question re the 9.1.0 demo[1] and the binary release[2]:

the demo suggests that there should be a `core/` directory, as well as
others, however, after unpacking the TAR, I'm not seeing any:
) ls -1
CHANGES.txt
JRE_VERSION_MIGRATION.md
LICENSE.txt
MIGRATE.md
NOTICE.txt
README.md
SYSTEM_REQUIREMENTS.md
bin/
docs/
licenses/
modules/
modules-test-framework/
modules-thirdparty/

Is downloading the source and building the recommended approach here?

Also, are the Lucene folks anywhere on liberachat vs freenode? Many
communities seem to have moved away from freenode and I was curious if that
was the case with Lucene's IRC, or if people were still using freenode (no
big deal either way - just curious!).

Thanks very much for your time!
Best,
Bridger

[1] https://lucene.apache.org/core/9_1_0/demo/index.html
[2] https://www.apache.org/dyn/closer.lua/lucene/java/9.1.0/lucene-9.1.0.tgz




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Need help on defining custom scorer in Lucene 9

2022-04-03 Thread Michael Wechner


Hi Lokesh

IIUC each document (like for example a shop description) has a longitude 
and a latitude associated with.


The user search input are some keywords and the the user's geo location.

The keywords you use to search for the documents and the users's geo 
location you would like to use for sorting the search results set,
whereas the sorting would depend which locations are the nearest to the 
user's geo location.


Is this your use case?

Thanks

Michael

Am 03.04.22 um 20:12 schrieb Lokesh Mavale:

Hi Team,

I am little bit familiar with Lucene, and I have a problem statement in
hand to score each document based on the value of the field.

Value will be of type
GeoPoint {
Lat
  Long
}

And in the qry I will be getting other pair of lat, long and distance from
that lat long.
I have figured out on calculating distance of lat long on the basis of
input params
But I am having hard time in implementing such custom scoring function.
Any help / pointers/ examples / sample code snippets  will be
of great help.

Looking forward to hearing from you.

Thank you so much in advance!

Regards,
Lokesh




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Call for Presentations now open, ApacheCon North America 2022

2022-03-31 Thread Michael Wechner


Sure!

There is a lot going on re vector search, also various startups like 
Weaviate, Pinecone, Milvus, Qdrant, Vespa, etc. competing with each other


https://towardsdatascience.com/milvus-pinecone-vespa-weaviate-vald-gsi-what-unites-these-buzz-words-and-what-makes-each-9c65a3bd0696

and also companies like for example Jina.ai making use of them, whereas 
various of them just received considerable Series A funding.


I think it would be good to show what Lucene has to offer and how it 
relates/compares to these developments.




Am 31.03.22 um 11:39 schrieb Adrien Grand:

Thanks Michael for helping spread the word about Lucene's new vector
search capabilities!

On Thu, Mar 31, 2022 at 7:36 AM Michael Wechner
 wrote:

ok :-) thanks!

Anyway, if somebody would like to join re a "vector search" proposal,
please let me know

Michael

Am 30.03.22 um 20:13 schrieb Anshum Gupta:

Hi Michael,

I'd highly recommend submitting a proposal irrespective of what other folks
decide. Your submission would be reviewed independently and if there is
another proposals that clashes, the abstract would help the program
committee pick the one (or both) that's best suited for the audience.

Good luck!

-Anshum

On Wed, Mar 30, 2022 at 5:47 AM Michael Wechner 
wrote:


Hi Together

I would be interested to submit a proposal/presentation re Lucene's
vector search,  but would like to ask first whether somebody else wants
to do this as well or might be interested to do this together?

Thanks

Michael

Am 30.03.22 um 14:16 schrieb Rich Bowen:

[You are receiving this because you are subscribed to one or more user
or dev mailing list of an Apache Software Foundation project.]

ApacheCon draws participants at all levels to explore “Tomorrow’s
Technology Today” across 300+ Apache projects and their diverse
communities. ApacheCon showcases the latest developments in ubiquitous
Apache projects and emerging innovations through hands-on sessions,
keynotes, real-world case studies, trainings, hackathons, community
events, and more.

The Apache Software Foundation will be holding ApacheCon North America
2022 at the New Orleans Sheration, October 3rd through 6th, 2022. The
Call for Presentations is now open, and will close at 00:01 UTC on May
23rd, 2022.

We are accepting presentation proposals for any topic that is related
to the Apache mission of producing free software for the public good.
This includes, but is not limited to:

Community
Big Data
Search
IoT
Cloud
Fintech
Pulsar
Tomcat

You can submit your session proposals starting today at
https://cfp.apachecon.com/

Rich Bowen, on behalf of the ApacheCon Planners
apachecon.com
@apachecon

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org






-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Call for Presentations now open, ApacheCon North America 2022

2022-03-30 Thread Michael Wechner


ok :-) thanks!

Anyway, if somebody would like to join re a "vector search" proposal, 
please let me know


Michael

Am 30.03.22 um 20:13 schrieb Anshum Gupta:

Hi Michael,

I'd highly recommend submitting a proposal irrespective of what other folks
decide. Your submission would be reviewed independently and if there is
another proposals that clashes, the abstract would help the program
committee pick the one (or both) that's best suited for the audience.

Good luck!

-Anshum

On Wed, Mar 30, 2022 at 5:47 AM Michael Wechner 
wrote:


Hi Together

I would be interested to submit a proposal/presentation re Lucene's
vector search,  but would like to ask first whether somebody else wants
to do this as well or might be interested to do this together?

Thanks

Michael

Am 30.03.22 um 14:16 schrieb Rich Bowen:

[You are receiving this because you are subscribed to one or more user
or dev mailing list of an Apache Software Foundation project.]

ApacheCon draws participants at all levels to explore “Tomorrow’s
Technology Today” across 300+ Apache projects and their diverse
communities. ApacheCon showcases the latest developments in ubiquitous
Apache projects and emerging innovations through hands-on sessions,
keynotes, real-world case studies, trainings, hackathons, community
events, and more.

The Apache Software Foundation will be holding ApacheCon North America
2022 at the New Orleans Sheration, October 3rd through 6th, 2022. The
Call for Presentations is now open, and will close at 00:01 UTC on May
23rd, 2022.

We are accepting presentation proposals for any topic that is related
to the Apache mission of producing free software for the public good.
This includes, but is not limited to:

Community
Big Data
Search
IoT
Cloud
Fintech
Pulsar
Tomcat

You can submit your session proposals starting today at
https://cfp.apachecon.com/

Rich Bowen, on behalf of the ApacheCon Planners
apachecon.com
@apachecon

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Call for Presentations now open, ApacheCon North America 2022

2022-03-30 Thread Michael Wechner


Hi Together

I would be interested to submit a proposal/presentation re Lucene's 
vector search,  but would like to ask first whether somebody else wants 
to do this as well or might be interested to do this together?


Thanks

Michael

Am 30.03.22 um 14:16 schrieb Rich Bowen:

[You are receiving this because you are subscribed to one or more user
or dev mailing list of an Apache Software Foundation project.]

ApacheCon draws participants at all levels to explore “Tomorrow’s
Technology Today” across 300+ Apache projects and their diverse
communities. ApacheCon showcases the latest developments in ubiquitous
Apache projects and emerging innovations through hands-on sessions,
keynotes, real-world case studies, trainings, hackathons, community
events, and more.

The Apache Software Foundation will be holding ApacheCon North America
2022 at the New Orleans Sheration, October 3rd through 6th, 2022. The
Call for Presentations is now open, and will close at 00:01 UTC on May
23rd, 2022.

We are accepting presentation proposals for any topic that is related
to the Apache mission of producing free software for the public good.
This includes, but is not limited to:

Community
Big Data
Search
IoT
Cloud
Fintech
Pulsar
Tomcat

You can submit your session proposals starting today at
https://cfp.apachecon.com/

Rich Bowen, on behalf of the ApacheCon Planners
apachecon.com
@apachecon

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Call for Presentations now open, ApacheCon North America 2022

2022-03-30 Thread Michael Wechner


Hi Together

I would be interested to submit a proposal/presentation re Lucene's 
vector search,  but would like to ask first whether somebody else wants 
to do this as well or might be interested to do this together?


Thanks

Michael

Am 30.03.22 um 14:16 schrieb Rich Bowen:

[You are receiving this because you are subscribed to one or more user
or dev mailing list of an Apache Software Foundation project.]

ApacheCon draws participants at all levels to explore “Tomorrow’s
Technology Today” across 300+ Apache projects and their diverse
communities. ApacheCon showcases the latest developments in ubiquitous
Apache projects and emerging innovations through hands-on sessions,
keynotes, real-world case studies, trainings, hackathons, community
events, and more.

The Apache Software Foundation will be holding ApacheCon North America
2022 at the New Orleans Sheration, October 3rd through 6th, 2022. The
Call for Presentations is now open, and will close at 00:01 UTC on May
23rd, 2022.

We are accepting presentation proposals for any topic that is related
to the Apache mission of producing free software for the public good.
This includes, but is not limited to:

Community
Big Data
Search
IoT
Cloud
Fintech
Pulsar
Tomcat

You can submit your session proposals starting today at
https://cfp.apachecon.com/

Rich Bowen, on behalf of the ApacheCon Planners
apachecon.com
@apachecon

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Question about using Lucene to search source code

2021-12-20 Thread Michael Wechner


Hi Yuxin

Can you provide a concrete example of a query and a document/code snippet?

Thanks

Michael


Am 20.12.21 um 03:06 schrieb Yuxin Liu:

Dear development community of Lucene:
Hi from student research assistant Yuxin Liu. I'm using Lucene to build an
index search for source code indexes using TF-IDF similarity. I have a set
of source code snippets and I want to use part of the source code snippet
as a query and obtain the document with its source code textfield that
matches the query with highest TF-IDF similarity.
Here is what I did: build indexing documents, store each source code
snippet as a textfield into one document with its id. Then use a query to
search for it. However, I don't know which kind of query should I use to
have the partial source code as my query; Because my query is not terms nor
phrases.
What is a good way to achieve this?
I am really looking for some suggestions because this has blocked me for a
while.

Thanks a lot in advance.
Sincerely,
Yuxin




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Autosuggest/Autocomplete: What are the best practices to build Suggester?

2021-11-18 Thread Michael Wechner

I just realize that I can set an index directory when constructing the 
Suggester, for example


Directory indexDir = FSDirectory.open(indexDirPath);

AnalyzingInfixSuggester suggester =new AnalyzingInfixSuggester(indexDir, 
analyzer, analyzer,3,true);

and that I build the index using an ItemIterator when it does not exist 
yet, for example


if (!indexDirPath.toFile().isDirectory() || 
indexDirPath.toFile().list().length == 0) {

  List entities = new ArrayList();

entities.add(new Item("traffic 
accident","",asList("public","a84581a3-302f-4b73-80d9-0e60da5238f9"),3) );
entities.add(new 
Item("event","",asList("public","a84581a3-302f-4b73-80d9-0e60da5238f9"),2));
entities.add(new 
Item("person","",asList("public","a84581a3-302f-4b73-80d9-0e60da5238f9"),4));
entities.add(new Item("coverage 
check","",asList("a84581a3-302f-4b73-80d9-0e60da5238f9"),1));
entities.add(new 
Item("coverage","",asList("a84581a3-302f-4b73-80d9-0e60da5238f9"),1));
entities.add(new Item("contract 
search","",asList("a84581a3-302f-4b73-80d9-0e60da5238f9"),1));
entities.add(new Item("claims management 
system","",asList("a84581a3-302f-4b73-80d9-0e60da5238f9"),1));

suggester.build(new ItemIterator(entities.iterator()));
)

I was a little confused, because all the implementation examples I found 
were using an in-memory directory.


My bad, everything good now, thank you :-)

Michael



Am 18.11.21 um 09:47 schrieb Michael Wechner:

Hi

I recently started to use the Autosuggest/Autocomplete package as 
suggested by Robert


https://www.mail-archive.com/java-user@lucene.apache.org/msg51403.html

which works very fine, thanks again for your help :-)

But it is not clear to me what are the best practices building a 
suggester using an InputIterator


https://lucene.apache.org/core/8_10_1/suggest/org/apache/lucene/search/suggest/Lookup.html#build-org.apache.lucene.search.suggest.InputIterator- 



regarding

- scalability
- thousands of terms
- thousands of contexts (including personalized contexts)
- updating during runtime (singleton / thread safe)

So far I do something as follows

entities.add(new Item("traffic 
accident","",asList("public","a84581a3-302f-4b73-80d9-0e60da5238f9"),3) 
);
entities.add(new 
Item("event","",asList("public","a84581a3-302f-4b73-80d9-0e60da5238f9"),2));
entities.add(new 
Item("person","",asList("public","a84581a3-302f-4b73-80d9-0e60da5238f9"),4));
entities.add(new Item("coverage 
check","",asList("a84581a3-302f-4b73-80d9-0e60da5238f9"),1));
entities.add(new 
Item("coverage","",asList("a84581a3-302f-4b73-80d9-0e60da5238f9"),1));
entities.add(new Item("contract 
search","",asList("a84581a3-302f-4b73-80d9-0e60da5238f9"),1));
entities.add(new Item("claims management 
system","",asList("a84581a3-302f-4b73-80d9-0e60da5238f9"),1));


suggester.build(new ItemIterator(entities.iterator()));

whereas the terms associated with the context "public" are intended 
for all contexts and the terms associated with the context 
"a84581a3-302f-4b73-80d9-0e60da5238f9" are only for a private domain 
context, in this example an insurance company.


Let's assume we have thousands of private domain contexts and the 
terms keep changing continuously, because people upload new documents 
with new terms into these contexts.


Will the current implementation of building the suggester using 
InputIterator scale for such a situation?


I assumed/expected actually that the suggester is implemented like an 
IndexReader/DirectoryReader for searching, which means for each 
context I could have a separate "SuggesterDirectory", which can be 
updated during runtime and scales easily.


Or do I misunderstand the current concept of how to build a suggester?

Thanks

Michael

Autosuggest/Autocomplete: What are the best practices to build Suggester?

2021-11-18 Thread Michael Wechner


Hi

I recently started to use the Autosuggest/Autocomplete package as 
suggested by Robert


https://www.mail-archive.com/java-user@lucene.apache.org/msg51403.html

which works very fine, thanks again for your help :-)

But it is not clear to me what are the best practices building a 
suggester using an InputIterator


https://lucene.apache.org/core/8_10_1/suggest/org/apache/lucene/search/suggest/Lookup.html#build-org.apache.lucene.search.suggest.InputIterator-

regarding

- scalability
- thousands of terms
- thousands of contexts (including personalized contexts)
- updating during runtime (singleton / thread safe)

So far I do something as follows

entities.add(new Item("traffic 
accident","",asList("public","a84581a3-302f-4b73-80d9-0e60da5238f9"),3) );
entities.add(new 
Item("event","",asList("public","a84581a3-302f-4b73-80d9-0e60da5238f9"),2));
entities.add(new 
Item("person","",asList("public","a84581a3-302f-4b73-80d9-0e60da5238f9"),4));
entities.add(new Item("coverage 
check","",asList("a84581a3-302f-4b73-80d9-0e60da5238f9"),1));
entities.add(new 
Item("coverage","",asList("a84581a3-302f-4b73-80d9-0e60da5238f9"),1));
entities.add(new Item("contract 
search","",asList("a84581a3-302f-4b73-80d9-0e60da5238f9"),1));
entities.add(new Item("claims management 
system","",asList("a84581a3-302f-4b73-80d9-0e60da5238f9"),1));

suggester.build(new ItemIterator(entities.iterator()));

whereas the terms associated with the context "public" are intended for 
all contexts and the terms associated with the context 
"a84581a3-302f-4b73-80d9-0e60da5238f9" are only for a private domain 
context, in this example an insurance company.


Let's assume we have thousands of private domain contexts and the terms 
keep changing continuously, because people upload new documents with new 
terms into these contexts.


Will the current implementation of building the suggester using 
InputIterator scale for such a situation?


I assumed/expected actually that the suggester is implemented like an 
IndexReader/DirectoryReader for searching, which means for each context 
I could have a separate "SuggesterDirectory", which can be updated 
during runtime and scales easily.


Or do I misunderstand the current concept of how to build a suggester?

Thanks

Michael

Re: Search while typing (incremental search)

2021-10-27 Thread Michael Wechner


I have added a QnA

https://cwiki.apache.org/confluence/display/LUCENE/LuceneFAQ#LuceneFAQ-DoesLucenesupportauto-suggest/autocomplete?

I will also try to provide an example, for example

https://medium.com/@ekaterinamihailova/in-memory-search-and-autocomplete-with-lucene-8-5-f2df1bc71c36
https://github.com/Da-Bulgaria/e-prescriptions/blob/master/src/main/java/bg/ehealth/prescriptions/services/icd/ICDService.java

or

https://stackoverflow.com/questions/24968697/how-to-implement-auto-suggest-using-lucenes-new-analyzinginfixsuggester-api

but first need to check whether these examples are according to Lucene 
8.10.1 suggest API


If you know any simple, recent examples, please let me know

Thanks

Michael


Am 08.10.21 um 21:40 schrieb Michael Wechner:



Am 08.10.21 um 18:49 schrieb Michael Sokolov:

Thank you for offering to add to the FAQ! Indeed it should mention the
suggester capability. I think you have permissions to edit that wiki?


yes :-)


Please go ahead and I think add a link to the suggest module javadocs


ok, will do!

Thanks

Michael



On Thu, Oct 7, 2021 at 2:30 AM Michael Wechner
 wrote:

Thanks very much for your feedback!

I will try it :-)

As I wrote I would like to add a summary to the Lucene FAQ
(https://cwiki.apache.org/confluence/display/lucene/lucenefaq)

Would the following questions make sense?

   - "Does Lucene support incremental search?"

   - "Does Lucene support auto completion suggestions?"

Or would other other terms / or another wording make more sense?

Thanks

Michael



Am 07.10.21 um 01:14 schrieb Robert Muir:

TLDR: use the lucene suggest/ package. Start with building suggester
from your query logs (either a file or index them).
These have a lot of flexibility about how the matches happen, for
example pure prefixes, edit distance typos, infix matching, analysis
chain, even now Japanese input-method integration :)

Run that suggester on the user input, retrieving say, the top 5-10
matches of relevant query suggestions.
return those in the UI (typical autosuggest-type field), but also run
a search on the first one.

The user gets the instant-search experience, but when they type 'tes',
you search on 'tesla' (if that's the top-suggested query, the
highlighted one in the autocomplete). if they arrow-down to another
suggestion such as 'test' or type a 't' or use the mouse or whatever,
then the process runs again and they see the results for that.

IMO for most cases this leads to a saner experience than trying to
rank all documents based on a prefix 'tes': the problem is there is
still too much query ambiguity, not really any "keywords" yet, so
trying to rank those documents won't be very useful. Instead you try
to "interact" with the user to present results in a useful way that
they can navigate.

On the other hand if you really want to just search on prefixes and
jumble up the results (perhaps because you are gonna just sort by some
custom document feature instead of relevance), then you can do that if
you really want. You can use the n-gram/edge-ngram/shingle filters in
the analysis package for that.

On Wed, Oct 6, 2021 at 5:37 PM Michael Wechner
 wrote:

Hi

I am trying to implement a search with Lucene similar to what for
example various "Note Apps" (e.g. "Google Keep" or "Samsung 
Notes") are
offering, that with every new letter typed a new search is being 
executed.


For example when I type "tes", then all documents are being returned
containing the word "test" or "tesla" and when I continue typing, for
example "tesö" and there are no documents containing the string 
"tesö",

then the app will tell me that there are no matches.

I have found a couple of articles related to this kind of search, for
example

https://stackoverflow.com/questions/10828825/incremental-search-using-lucene 



https://stackoverflow.com/questions/120180/how-to-do-query-auto-completion-suggestions-in-lucene 



but would be great to know whether there exist other possibilities or
what the best practice is?

I am even not sure what the right term for this kind of search is, 
is it

really "incremental search" or something else?

Looking forward to your feedback and will be happy to extend the 
Lucene

FAQ once I understand better :-)

Thanks

Michael

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lu

Re: Search while typing (incremental search)

2021-10-08 Thread Michael Wechner





Am 08.10.21 um 18:49 schrieb Michael Sokolov:

Thank you for offering to add to the FAQ! Indeed it should mention the
suggester capability. I think you have permissions to edit that wiki?


yes :-)


Please go ahead and I think add a link to the suggest module javadocs


ok, will do!

Thanks

Michael



On Thu, Oct 7, 2021 at 2:30 AM Michael Wechner
 wrote:

Thanks very much for your feedback!

I will try it :-)

As I wrote I would like to add a summary to the Lucene FAQ
(https://cwiki.apache.org/confluence/display/lucene/lucenefaq)

Would the following questions make sense?

   - "Does Lucene support incremental search?"

   - "Does Lucene support auto completion suggestions?"

Or would other other terms / or another wording make more sense?

Thanks

Michael



Am 07.10.21 um 01:14 schrieb Robert Muir:

TLDR: use the lucene suggest/ package. Start with building suggester
from your query logs (either a file or index them).
These have a lot of flexibility about how the matches happen, for
example pure prefixes, edit distance typos, infix matching, analysis
chain, even now Japanese input-method integration :)

Run that suggester on the user input, retrieving say, the top 5-10
matches of relevant query suggestions.
return those in the UI (typical autosuggest-type field), but also run
a search on the first one.

The user gets the instant-search experience, but when they type 'tes',
you search on 'tesla' (if that's the top-suggested query, the
highlighted one in the autocomplete). if they arrow-down to another
suggestion such as 'test' or type a 't' or use the mouse or whatever,
then the process runs again and they see the results for that.

IMO for most cases this leads to a saner experience than trying to
rank all documents based on a prefix 'tes': the problem is there is
still too much query ambiguity, not really any "keywords" yet, so
trying to rank those documents won't be very useful. Instead you try
to "interact" with the user to present results in a useful way that
they can navigate.

On the other hand if you really want to just search on prefixes and
jumble up the results (perhaps because you are gonna just sort by some
custom document feature instead of relevance), then you can do that if
you really want. You can use the n-gram/edge-ngram/shingle filters in
the analysis package for that.

On Wed, Oct 6, 2021 at 5:37 PM Michael Wechner
 wrote:

Hi

I am trying to implement a search with Lucene similar to what for
example various "Note Apps" (e.g. "Google Keep" or "Samsung Notes") are
offering, that with every new letter typed a new search is being executed.

For example when I type "tes", then all documents are being returned
containing the word "test" or "tesla" and when I continue typing, for
example "tesö" and there are no documents containing the string "tesö",
then the app will tell me that there are no matches.

I have found a couple of articles related to this kind of search, for
example

https://stackoverflow.com/questions/10828825/incremental-search-using-lucene

https://stackoverflow.com/questions/120180/how-to-do-query-auto-completion-suggestions-in-lucene

but would be great to know whether there exist other possibilities or
what the best practice is?

I am even not sure what the right term for this kind of search is, is it
really "incremental search" or something else?

Looking forward to your feedback and will be happy to extend the Lucene
FAQ once I understand better :-)

Thanks

Michael

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Search while typing (incremental search)

2021-10-07 Thread Michael Wechner


Thanks very much for your feedback!

I will try it :-)

As I wrote I would like to add a summary to the Lucene FAQ 
(https://cwiki.apache.org/confluence/display/lucene/lucenefaq)


Would the following questions make sense?

 - "Does Lucene support incremental search?"

 - "Does Lucene support auto completion suggestions?"

Or would other other terms / or another wording make more sense?

Thanks

Michael



Am 07.10.21 um 01:14 schrieb Robert Muir:

TLDR: use the lucene suggest/ package. Start with building suggester
from your query logs (either a file or index them).
These have a lot of flexibility about how the matches happen, for
example pure prefixes, edit distance typos, infix matching, analysis
chain, even now Japanese input-method integration :)

Run that suggester on the user input, retrieving say, the top 5-10
matches of relevant query suggestions.
return those in the UI (typical autosuggest-type field), but also run
a search on the first one.

The user gets the instant-search experience, but when they type 'tes',
you search on 'tesla' (if that's the top-suggested query, the
highlighted one in the autocomplete). if they arrow-down to another
suggestion such as 'test' or type a 't' or use the mouse or whatever,
then the process runs again and they see the results for that.

IMO for most cases this leads to a saner experience than trying to
rank all documents based on a prefix 'tes': the problem is there is
still too much query ambiguity, not really any "keywords" yet, so
trying to rank those documents won't be very useful. Instead you try
to "interact" with the user to present results in a useful way that
they can navigate.

On the other hand if you really want to just search on prefixes and
jumble up the results (perhaps because you are gonna just sort by some
custom document feature instead of relevance), then you can do that if
you really want. You can use the n-gram/edge-ngram/shingle filters in
the analysis package for that.

On Wed, Oct 6, 2021 at 5:37 PM Michael Wechner
 wrote:

Hi

I am trying to implement a search with Lucene similar to what for
example various "Note Apps" (e.g. "Google Keep" or "Samsung Notes") are
offering, that with every new letter typed a new search is being executed.

For example when I type "tes", then all documents are being returned
containing the word "test" or "tesla" and when I continue typing, for
example "tesö" and there are no documents containing the string "tesö",
then the app will tell me that there are no matches.

I have found a couple of articles related to this kind of search, for
example

https://stackoverflow.com/questions/10828825/incremental-search-using-lucene

https://stackoverflow.com/questions/120180/how-to-do-query-auto-completion-suggestions-in-lucene

but would be great to know whether there exist other possibilities or
what the best practice is?

I am even not sure what the right term for this kind of search is, is it
really "incremental search" or something else?

Looking forward to your feedback and will be happy to extend the Lucene
FAQ once I understand better :-)

Thanks

Michael

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Search while typing (incremental search)

2021-10-06 Thread Michael Wechner


Hi

I am trying to implement a search with Lucene similar to what for 
example various "Note Apps" (e.g. "Google Keep" or "Samsung Notes") are 
offering, that with every new letter typed a new search is being executed.


For example when I type "tes", then all documents are being returned 
containing the word "test" or "tesla" and when I continue typing, for 
example "tesö" and there are no documents containing the string "tesö", 
then the app will tell me that there are no matches.


I have found a couple of articles related to this kind of search, for 
example


https://stackoverflow.com/questions/10828825/incremental-search-using-lucene

https://stackoverflow.com/questions/120180/how-to-do-query-auto-completion-suggestions-in-lucene

but would be great to know whether there exist other possibilities or 
what the best practice is?


I am even not sure what the right term for this kind of search is, is it 
really "incremental search" or something else?


Looking forward to your feedback and will be happy to extend the Lucene 
FAQ once I understand better :-)


Thanks

Michael

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: hello~~i have a question

2021-08-02 Thread Michael Wechner

I don't know either, whereas I searched  a little and found various good 
explanations what segments are, e.g.


https://www.alibabacloud.com/blog/analysis-of-lucene---basic-concepts_594672

but not in which order the segments are being read.

I am nore sure where in the code the segments are being read, but did 
you already have a look at the source code?


https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/IndexReader.java
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/DirectoryReader.java
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/store/FSDirectory.java

HTH

Michael



Am 02.08.21 um 08:07 schrieb nic k:

hello i have a question, so im sending u an e-mail

when searching in lucene, i wonder if it reads from the oldest segment or
the most recently created segment

when i test it, i think it reads the oldest file first, but i ask for
conviction, not conjecture
please im looking around and i cant figure it out so i ask
thank u




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Is deleting with IndexReader still possible?

2021-06-17 Thread Michael Wechner


cool, thanks very much for your quick response and updating the FAQ!


Am 17.06.21 um 10:28 schrieb Adrien Grand:

Good catch Michael, removing from IndexReader has actually been removed a
long time ago. I just edited the FAQ to correct this.

On Thu, Jun 17, 2021 at 10:08 AM Michael Wechner 
wrote:


Hi

According to the FAQ one can delete documents using the IndexReader


https://cwiki.apache.org/confluence/display/lucene/lucenefaq#LuceneFAQ-HowdoIdeletedocumentsfromtheindex
?

but when I look at the javadoc of Lucene version 8_8_2


https://lucene.apache.org/core/8_8_2/core/org/apache/lucene/index/IndexWriter.html

https://lucene.apache.org/core/8_8_2/core/org/apache/lucene/index/IndexReader.html

then it seems to me, that deleting documents is only possible with
IndexWriter, but not anymore with IndexReader, right?

Thanks

Michael

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Is deleting with IndexReader still possible?

2021-06-17 Thread Michael Wechner


Hi

According to the FAQ one can delete documents using the IndexReader

https://cwiki.apache.org/confluence/display/lucene/lucenefaq#LuceneFAQ-HowdoIdeletedocumentsfromtheindex?

but when I look at the javadoc of Lucene version 8_8_2

https://lucene.apache.org/core/8_8_2/core/org/apache/lucene/index/IndexWriter.html
https://lucene.apache.org/core/8_8_2/core/org/apache/lucene/index/IndexReader.html

then it seems to me, that deleting documents is only possible with 
IndexWriter, but not anymore with IndexReader, right?


Thanks

Michael

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Index backwards compatibility

2021-05-27 Thread Michael Wechner


good point! I have changed it accordingly

https://cwiki.apache.org/confluence/display/LUCENE/LuceneFAQ#LuceneFAQ-WhenIupradeLucene,forexamplefrom8.8.2to9.0.0,doIhavetoreindex?

Hope it is clear now :-)

Am 27.05.21 um 16:39 schrieb Michael Sokolov:

LGTM, but perhaps also should state that if possible you *should*
update because the 8.x index may not be able to be read by the
eventual 10 release.

On Thu, May 27, 2021 at 7:52 AM Michael Wechner
 wrote:

I have added a QnA

https://cwiki.apache.org/confluence/display/LUCENE/LuceneFAQ#LuceneFAQ-WhenIupradeLucene,forexamplefrom8.8.2to9.0.0,doIhavetoreindex?

Hope that makes sense, otherwise let me know and I can correct/update :-)



Am 26.05.21 um 23:56 schrieb Michael Wechner:

using lucene-backward-codecs-9.0.0-SNAPSHOT.jar makes it work :-)

Thank you very much!

But IIUC it is recommended to reindex when upgrading, right? I guess
similar to what Solr is recommending

https://solr.apache.org/guide/8_0/reindexing.html


Am 26.05.21 um 21:26 schrieb Michael Sokolov:

I think you need backward-codecs-9.0.0-SNAPSHOT there. It enables 9.0
to read 8.x indexes.

On Wed, May 26, 2021 at 9:27 AM Michael Wechner
 wrote:

Hi

I am using Lucene 8.8.2 in production and I am currently doing some
tests using 9.0.0-SNAPSHOT, whereas I have included
lucene-backward-codecs, because in the log files it was asking me
whether I have forgotten to include lucene-backward-codecs.jar

   
   org.apache.lucene
   lucene-core
   9.0.0-SNAPSHOT
   
   
   org.apache.lucene
lucene-queryparser
   9.0.0-SNAPSHOT
   
   
   org.apache.lucene
lucene-backward-codecs
   8.8.2
   

But when querying index directories created with Lucene 8.8.2, then I
receive the following error

java.lang.NoClassDefFoundError: Could not initialize class
org.apache.lucene.codecs.Codec$Holder

I am not sure whether I understand the backwards compatibility page
correctly

https://cwiki.apache.org/confluence/display/LUCENE/BackwardsCompatibility


but I guess version 9 will not be backwards compatible to version 8? Or
should I do something different?

Thanks

Michael

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Index backwards compatibility

2021-05-27 Thread Michael Wechner


I have added a QnA

https://cwiki.apache.org/confluence/display/LUCENE/LuceneFAQ#LuceneFAQ-WhenIupradeLucene,forexamplefrom8.8.2to9.0.0,doIhavetoreindex?

Hope that makes sense, otherwise let me know and I can correct/update :-)



Am 26.05.21 um 23:56 schrieb Michael Wechner:

using lucene-backward-codecs-9.0.0-SNAPSHOT.jar makes it work :-)

Thank you very much!

But IIUC it is recommended to reindex when upgrading, right? I guess 
similar to what Solr is recommending


https://solr.apache.org/guide/8_0/reindexing.html


Am 26.05.21 um 21:26 schrieb Michael Sokolov:

I think you need backward-codecs-9.0.0-SNAPSHOT there. It enables 9.0
to read 8.x indexes.

On Wed, May 26, 2021 at 9:27 AM Michael Wechner
 wrote:

Hi

I am using Lucene 8.8.2 in production and I am currently doing some
tests using 9.0.0-SNAPSHOT, whereas I have included
lucene-backward-codecs, because in the log files it was asking me
whether I have forgotten to include lucene-backward-codecs.jar

  
  org.apache.lucene
  lucene-core
  9.0.0-SNAPSHOT
  
  
  org.apache.lucene
lucene-queryparser
  9.0.0-SNAPSHOT
  
  
  org.apache.lucene
lucene-backward-codecs
  8.8.2
  

But when querying index directories created with Lucene 8.8.2, then I
receive the following error

java.lang.NoClassDefFoundError: Could not initialize class
org.apache.lucene.codecs.Codec$Holder

I am not sure whether I understand the backwards compatibility page
correctly

https://cwiki.apache.org/confluence/display/LUCENE/BackwardsCompatibility 



but I guess version 9 will not be backwards compatible to version 8? Or
should I do something different?

Thanks

Michael

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Lucene/Solr and BERT

2021-05-27 Thread Michael Wechner


Thank you very much for having done these benchmarks!

IIUC one could state

- Indexing:
  Lucene is slower than hnswlib/C++, very roughly 10x performance 
difference

- Searching (Queries per second):
  Lucene is slower than hnswlib/C++, very roughly 8x performance 
difference


right, but we should double-check these results?

Also it is not clear at the moment why there is this performance 
difference, right?



Am 27.05.21 um 03:33 schrieb Julie Tibshirani:

These JIRA issues contain results against two ann-benchmarks datasets. It'd
be great to get your thoughts/ feedback if you have any:
* Searching: https://issues.apache.org/jira/browse/LUCENE-9937
* Indexing: https://issues.apache.org/jira/browse/LUCENE-9941

The benchmarks are based on the setup here:
https://github.com/jtibshirani/lucene/pull/1. I am happy to help if you run
into issues with it.

A note: my motivation for running ann-benchmarks was to understand how the
current performance compares to other approaches, and to research ideas for
improvements. The setup in the PR doesn't feel solid/ maintainable as a
long term approach to development benchmarks. My personal plan is to focus
on enhancing luceneutil and our nightly benchmarks (
https://github.com/mikemccand/luceneutil) instead of putting a lot of
effort into the ann-benchmarks setup.

Julie

On Wed, May 26, 2021 at 1:04 PM Alex K  wrote:


Thanks Michael. IIRC, the thing that was taking so long was merging into a
single segment. Is there already benchmarking code for HNSW
available somewhere? I feel like I remember someone posting benchmarking
results on one of the Jira tickets.

Thanks,
Alex

On Wed, May 26, 2021 at 3:41 PM Michael Sokolov 
wrote:


This java implementation will be slower than the C implementation. I
believe the algorithm is essentially the same, however this is new and
there may be bugs!  I (and I think Julie had similar results IIRC)
measured something like 8x slower than hnswlib (using ann-benchmarks).
It is also surprising (to me) though how this varies with
differently-learned vectors so YMMV. I still think there is value
here, and look forward to improved performance, especially as JDK16
has some improved support for vectorized instructions.

Please also understand that the HNSW algorithm interacts with Lucene's
segmented architecture in a tricky way. Because we built a graph
*per-segment* when flushing/merging, these must be rebuilt whenever
segments are merged. So your indexing performance can be heavily
influenced by how often you flush, as well as by your merge policy
settings. Also, when searching, there is a bigger than usual benefit
for searching across fewer segments, since the cost of searching an
HNSW graph scales more or less with log N (so searching a single large
graph is cheaper than searching the same documents divided among
smaller graphs). So I do recommend using a multithreaded collector in
order to get best latency with HNSW-based search. To get the best
indexing, and searching, performance, you should generally index as
large a number of documents as possible before flushing.

-Mike

On Wed, May 26, 2021 at 9:43 AM Michael Wechner
 wrote:

Hi Alex

Thank you very much for your feedback and the various insights!

Am 26.05.21 um 04:41 schrieb Alex K:

Hi Michael and others,

Sorry just now getting back to you. For your three original

questions:

- Yes, I was referring to the Lucene90Hnsw* classes. Michael S. had a
thorough response.
- As far as I know Opendistro is calling out to a C/C++ binary to run

the

actual HNSW algorithm and store the HNSW part of the index. When they
implemented it about a year ago, Lucene did not have this yet. I

assume the

Lucene HNSW implementation is solid, but would not be surprised if

it's

slower than the C/C++ based implementation, given the JVM has some
disadvantages for these kinds of CPU-bound/number crunching algos.
- I just haven't had much time to invest into my benchmark recently.

In

particular, I got stuck on why indexing was taking extremely long.

Just

indexing the vectors would have easily exceeded the current time
limitations in the ANN-benchmarks project. Maybe I had some naive

mistake

in my implementation, but I profiled and dug pretty deep to make it

fast.

I am trying to get Julie's branch running

https://github.com/jtibshirani/lucene/tree/hnsw-bench

Maybe this will help and is comparable



I'm assuming you want to use Lucene, but not necessarily via

Elasticsearch?

Yes, for more simple setups I would like to use Lucene standalone, but
for setups which have to scale I would use either Elasticsearch or

Solr.

Thanks

Michael




If so, another option you might try for ANN is the elastiknn-models
and elastiknn-lucene packages. elastiknn-models contains the Locality
Sensitive Hashing implementations of ANN used by Elastiknn, and
elastiknn-lucene contains the Lucene queries used by Elastiknn.The

Lucene

query is the MatchHashesAndScoreQuery
<

https://github.

Re: Index backwards compatibility

2021-05-26 Thread Michael Wechner


using lucene-backward-codecs-9.0.0-SNAPSHOT.jar makes it work :-)

Thank you very much!

But IIUC it is recommended to reindex when upgrading, right? I guess 
similar to what Solr is recommending


https://solr.apache.org/guide/8_0/reindexing.html


Am 26.05.21 um 21:26 schrieb Michael Sokolov:

I think you need backward-codecs-9.0.0-SNAPSHOT there. It enables 9.0
to read 8.x indexes.

On Wed, May 26, 2021 at 9:27 AM Michael Wechner
 wrote:

Hi

I am using Lucene 8.8.2 in production and I am currently doing some
tests using 9.0.0-SNAPSHOT, whereas I have included
lucene-backward-codecs, because in the log files it was asking me
whether I have forgotten to include lucene-backward-codecs.jar

  
  org.apache.lucene
  lucene-core
  9.0.0-SNAPSHOT
  
  
  org.apache.lucene
  lucene-queryparser
  9.0.0-SNAPSHOT
  
  
  org.apache.lucene
lucene-backward-codecs
  8.8.2
  

But when querying index directories created with Lucene 8.8.2, then I
receive the following error

java.lang.NoClassDefFoundError: Could not initialize class
org.apache.lucene.codecs.Codec$Holder

I am not sure whether I understand the backwards compatibility page
correctly

https://cwiki.apache.org/confluence/display/LUCENE/BackwardsCompatibility

but I guess version 9 will not be backwards compatible to version 8? Or
should I do something different?

Thanks

Michael

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Lucene/Solr and BERT

2021-05-26 Thread Michael Wechner


Hi Alex

Thank you very much for your feedback and the various insights!

Am 26.05.21 um 04:41 schrieb Alex K:

Hi Michael and others,

Sorry just now getting back to you. For your three original questions:

- Yes, I was referring to the Lucene90Hnsw* classes. Michael S. had a
thorough response.
- As far as I know Opendistro is calling out to a C/C++ binary to run the
actual HNSW algorithm and store the HNSW part of the index. When they
implemented it about a year ago, Lucene did not have this yet. I assume the
Lucene HNSW implementation is solid, but would not be surprised if it's
slower than the C/C++ based implementation, given the JVM has some
disadvantages for these kinds of CPU-bound/number crunching algos.
- I just haven't had much time to invest into my benchmark recently. In
particular, I got stuck on why indexing was taking extremely long. Just
indexing the vectors would have easily exceeded the current time
limitations in the ANN-benchmarks project. Maybe I had some naive mistake
in my implementation, but I profiled and dug pretty deep to make it fast.


I am trying to get Julie's branch running

https://github.com/jtibshirani/lucene/tree/hnsw-bench

Maybe this will help and is comparable




I'm assuming you want to use Lucene, but not necessarily via Elasticsearch?


Yes, for more simple setups I would like to use Lucene standalone, but 
for setups which have to scale I would use either Elasticsearch or Solr.


Thanks

Michael




If so, another option you might try for ANN is the elastiknn-models
and elastiknn-lucene packages. elastiknn-models contains the Locality
Sensitive Hashing implementations of ANN used by Elastiknn, and
elastiknn-lucene contains the Lucene queries used by Elastiknn.The Lucene
query is the MatchHashesAndScoreQuery
.
There are a couple of scala test suites that show how to use it:
MatchHashesAndScoreQuerySuite
.
MatchHashesAndScoreQueryPerformanceSuite
.
This is all designed to work independently from Elasticsearch and is
published on Maven: com.klibisz.elastiknn / lucene

and
com.klibisz.elastiknn / models
.
The tests are Scala but all of the implementation is in Java.

Thanks,
Alex




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Index backwards compatibility

2021-05-26 Thread Michael Wechner


Hi

I am using Lucene 8.8.2 in production and I am currently doing some 
tests using 9.0.0-SNAPSHOT, whereas I have included 
lucene-backward-codecs, because in the log files it was asking me 
whether I have forgotten to include lucene-backward-codecs.jar


    
    org.apache.lucene
    lucene-core
    9.0.0-SNAPSHOT
    
    
    org.apache.lucene
    lucene-queryparser
    9.0.0-SNAPSHOT
    
    
    org.apache.lucene
lucene-backward-codecs
    8.8.2
    

But when querying index directories created with Lucene 8.8.2, then I 
receive the following error


java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.lucene.codecs.Codec$Holder


I am not sure whether I understand the backwards compatibility page 
correctly


https://cwiki.apache.org/confluence/display/LUCENE/BackwardsCompatibility

but I guess version 9 will not be backwards compatible to version 8? Or 
should I do something different?


Thanks

Michael

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

1 2 >

1 - 100 of 140 matches

Mail list logo