Re: Question for SynonymQuery

2023-01-02 Thread Michael Wechner
independent of the synonym implementation you might want to consider vector/similarity search, for example if the query is "internet device", then the cosine similarity of the multi-terms "internet device", "wifi router" and "wifi device" using the "all-mpnet-base-v2" are

Re: Question for SynonymQuery

2023-01-02 Thread Mikhail Khludnev
Hello Trevor. Can you help me better understand this approach? If we have a text "wifi router" and inject "internet device" at indexing time, terms reside at the same positions. How to avoid false positive match for query "wifi device"? On Mon, Jan 2, 2023 at 4:16 PM Trevor Nicholls wrote: > Hi

Re: Recurring index corruption

2023-01-02 Thread Uwe Schindler
Hi, Yeah, sorry for leading the issue into the wrong direction. I was just stumbling on the exception message and because we do not spend much time in improving/supporting the use of NIOFSDirectory, I may have moved this mailing list thread into the wrong direction. I don't think the

Re: Recurring index corruption

2023-01-02 Thread S S
Hi Uwe, I will report the bug to ES, as you suggested. Do you recon using Mmap would have an effect to the index corruption when using SMB? I have to report back to my manager in few days to decide wether to carry on with ACIs or find another hosting solution. It is unfortunate there seems to

Re: Recurring index corruption

2023-01-02 Thread Uwe Schindler
Hi, Please open a bug report at ES. The setting vm.max_map_count is not needed and should not be changed unless really needed, because it uses kernel resources. This has to do with their support (they try to tell people to overshard and to prevent support requests they ask to raise this

Re: Recurring index corruption

2023-01-02 Thread S S
I also tried enabling preview but no joy, same error :( It looks like it is not possible to start a multinode ES cluster without setting vm.max_map_count. I also googled it and this check cannot be disabled. I guess MMapDirectory is not an option for ES on ACIs, unless you have something else

Re: Recurring index corruption

2023-01-02 Thread S S
Hmm, I’m afraid I hit a roadblock: 2023-01-02T17:01:31,157][INFO ][o.e.b.BootstrapChecks] [fs-sdlc-elasticsearch-001] bound or publishing to a non-loopback address, enforcing bootstrap checks bootstrap check failure [1] of [1]: max virtual memory areas vm.max_map_count [65530] is too low,

Re: Recurring index corruption

2023-01-02 Thread S S
Thank you Uwe, this is great! I am rebuilding the cluster using MMapDirectory and no enable-preview, as you suggested. Let’s see what happens. Cheers, Seb > On 2 Jan 2023, at 17:51, Uwe Schindler wrote: > > Hi, > > in recent versions it works like that: > >

Re: Recurring index corruption

2023-01-02 Thread Uwe Schindler
Hi, in recent versions it works like that: https://www.elastic.co/guide/en/elasticsearch/reference/current/advanced-configuration.html#set-jvm-options So in folder jvm.options.d/ add a new file (like "preview.conf") and put "19:--enable-preview" into it. It is basically the same like

Re: Recurring index corruption

2023-01-02 Thread S S
Hi Uwe, Sorry for the late reply but upgrading the docker image to use OpenJDK was easier said that done. I am not a Java developer/expert so, sorry for the stupid question but, how do I specify the --enable-preview flag? ES has got a quite complex way to start so I cannot specify the flag on

RE: Question for SynonymQuery

2023-01-02 Thread Trevor Nicholls
Hi Anh The two links Michael shared relate to questions I asked when I was trying to get synonym matching with our application. I really do have multi-term synonym matching working at this point; there's always scope for improvement of course but with the hints suppled in those threads I was

Re: Recurring index corruption

2023-01-02 Thread Robert Muir
Your files are getting truncated. Nothing lucene can do. If this is really the only way you can store data in this azure cloud, and this is how they treat it, then run away... don't just walk... to a different cloud. On Mon, Jan 2, 2023 at 5:19 AM S S wrote: > > We are experimenting with

Re: Recurring index corruption

2023-01-02 Thread S S
Hi Uwe, This sounds very interesting, I will try this configuration this afternoon and I’ll let you know what happens. Many thanks for the suggestion! :) Seb > On 2 Jan 2023, at 11:48, Uwe Schindler wrote: > > Hi, > > in general you can still use MMapDirectory. There is no requirement to

Re: Recurring index corruption

2023-01-02 Thread Uwe Schindler
Hi, in general you can still use MMapDirectory. There is no requirement to set vm.max_map_count for smaller clusters. The information in Elastics documentation is not mandatory and misleading. If you use newest version of Elasticsearch with Java 19 and you use `--enable-preview` in you

Recurring index corruption

2023-01-02 Thread S S
We are experimenting with Elastic Search deployed in Azure Container Instances (Debian + OpenJDK). The ES indexes are stored into an Azure file share mounted via SMB (3.0). The Elastic Search cluster is made up of 4 nodes, each one have a separate file share to store the indices. This