Re: Slow HNSW creation times.

2024-04-29 Thread Uwe Schindler
continuous Garbage Collection pauses. Greatly appreciate any pointers or thoughts on how to further debug this issue or improve the performance. Thanks Kannan Krishnamurthy. -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@t

Re: Right Way to Read vectors from Index

2024-02-12 Thread Uwe Schindler
ne.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org -- Uwe Schin

Re: Need suggestion for a Lucene upgrade scenario

2024-01-30 Thread Uwe Schindler
ther information is required from my side. Regards Rajib -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additi

Re: NumericRangeQuery in Lucene 5.5.5: replacing the deprecated setBoost while keeping the NumericRange type?

2023-11-26 Thread Uwe Schindler
e setBoost is deprecated for all Query types. How to set the boost of a NumericRangeQuery while preserving the NumericRangeQuery type? BoostQuery doesn't allow this and I haven't found a way. Thanks for your help. Claude Lepère -- Sincerely yours Mikhail Khludnev -- Uwe Schindler Achterdiek 19, D-28

Re: StandardQueryParser and numeric fields

2023-11-14 Thread Uwe Schindler
down what I'm missing. The analyzer is the exact same analyzer I'm using during indexing. It's a PerFieldAnalyzerWrapper. The specific analyzer for the numeric fields is the one I mentioned above (StandardAnalyzer). The query used is: indexSearcher.search( query, 10 ); Thank you

Re: DisjunctionMinQuery

2023-11-09 Thread Uwe Schindler
Thanks! Marc -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Field[vector]vector's dimensions must be <= [1024]; got 1536

2023-11-08 Thread Uwe Schindler
fieldsReader(SegmentReadState state)throws IOException {     return delegate.fieldsReader(state);     }     @Override public int getMaxDimensions(String fieldName) {     log.info("Maximum vector dimension: " +maxDimensions);     return maxDimensions;     } } Am 19.1

Re: Preventing field data from being loaded into page cache

2023-10-21 Thread Uwe Schindler
into the page cache. Does Lucene have any mechanisms to explicitly prevent them from being cached? Is it even possible with Java? Thanks, Justin Borromeo -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de

Re: Field[vector]vector's dimensions must be <= [1024]; got 1536

2023-10-19 Thread Uwe Schindler
l: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.

Re: How to replace deprecated document(i)

2023-09-25 Thread Uwe Schindler
this? Thanks Michael Am 25.09.23 um 10:28 schrieb Uwe Schindler: Background: For performance, it is advisable to get the storedFields() *once* to process all documents in the search result. The resason for the change was the problem of accessing stored fields would otherwise need to use

Re: How to replace deprecated document(i)

2023-09-25 Thread Uwe Schindler
-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org -- Uwe Schindler Achterdiek 19, D-28357 Bremen https

Re: forceMerge(1) leads to ~10% perf gains

2023-09-22 Thread Uwe Schindler
query and still maintain accuracy than simply word tokenizing a sentence and joining with OR text: ? -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de - To unsubscribe, e-mail: jav

Re: How to retain % sign next to number during tokenization

2023-09-21 Thread Uwe Schindler
implementation front, I am using a set of filters like lowerCaseFilter, EnglishPossessiveFilter etc in addition to base tokenizer StandardTokenizer. Per my analysis, StandardTOkenizer strips off the % sign and hence the behavior.Has someone faced similar requirement? Any help/guidance is highly appreciated.

Re: Reindexing leaving behind 0 live doc segments

2023-09-13 Thread Uwe Schindler
of the process there are no more 7.x segments as referenced by the segments_x file. But for some reason the physical 7.x segment files continue to stay behind until I restart Solr. Thanks, Rahul On Mon, Sep 4, 2023 at 7:18 AM Uwe Schindler wrote: Hi, in Solr the empty segment keeps open as long

Re: Reindexing leaving behind 0 live doc segments

2023-09-04 Thread Uwe Schindler
inally{ if (iwRef != null) { iwRef.decref(); } } Help would be much appreciated! Thanks, Rahul -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de - To unsubscribe, e-mail: jav

Re: Disjunctively scoring non-matching conjunctive clauses

2023-07-21 Thread Uwe Schindler
ped by a ConstantScore query so it has no score and a scoring query that will provide a disjunctive score. My approach feels a bit convoluted, so I was wondering if there were any cleaner ways to do this? And if not, are there any drawbacks to my workaround performance wise? Thanks!

Re: Getting LinkageError due to Panama APIs

2023-06-30 Thread Uwe Schindler
y.java:448) : : Thanks, Shubham -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: j

Re: Question about index segment search order

2023-05-13 Thread Uwe Schindler
st? Thanks, Wei - To unsubscribe, e-mail:java-user-unsubscr...@lucene.apache.org For additional commands, e-mail:java-user-h...@lucene.apache.org -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail:u...@thetaphi.de

Re: Need help for conversion code from Lucene 2.4.0 to 8.11.2

2023-02-10 Thread Uwe Schindler
Exception e) { e.printStackTrace(); } == Regards Rajib -Original Message- From: Uwe Schindler Sent: 06 February 2023 16:46 To: java-user@lucene.apache.org Subject: Re: Need help for conversion code from Lucene 2.4.0 to 8.11.2 Hi, Since around Lucene 4 (maybe alre

Re: Need help for conversion code from Lucene 2.4.0 to 8.11.2

2023-02-06 Thread Uwe Schindler
..@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de - To unsubscribe, e-mail: java-user-unsubscr.

Re: Question about current situation of good first issues in GitHub

2023-01-10 Thread Uwe Schindler
nds, e-mail: java-user-h...@lucene.apache.org -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional comma

Re: Need your perspective on Garbage Collection

2023-01-03 Thread Uwe Schindler
se help me in managing it and provide your insight what steps or configuration i should prefer some useful way to optimize it . my index size 700 GB what configurations you suggest for it , like jvm,ram ,cpu cores,heap size,young and old genration. I hope to hear from you soon - -- Uwe

Re: Recurring index corruption

2023-01-02 Thread Uwe Schindler
. It is unfortunate there seems to be problems with this solution. Microsoft seems not interested in extending the volume mapping options for ACIs and K8 is overkilling for our use case. Thank you for your help so far, you have been very kind :) Cheers, Seb On 2 Jan 2023, at 19:09, Uwe Schindler

Re: Recurring index corruption

2023-01-02 Thread Uwe Schindler
ter using MMapDirectory and no enable-preview, as you suggested. Let’s see what happens. Cheers, Seb On 2 Jan 2023, at 17:51, Uwe Schindler wrote: Hi, in recent versions it works like that: https://www.elastic.co/guide/en/elasticsearch/reference/current/advanced-configuration.html#set-jvm-op

Re: Recurring index corruption

2023-01-02 Thread Uwe Schindler
w, and I cannot find anything on the ES website. Many thanks. Seb On 2 Jan 2023, at 11:48, Uwe Schindler wrote: Hi, in general you can still use MMapDirectory. There is no requirement to set vm.max_map_count for smaller clusters. The information in Elastics documentation is not mandatory and mis

Re: Recurring index corruption

2023-01-02 Thread Uwe Schindler
-core-9.3.0.jar:?] Many thanks. Seb -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional

Re: What exactly returns IndexReader.numDeletedDocs()

2022-12-08 Thread Uwe Schindler
ct result. Should I open the reader before closing the writer? Thanks Michael Am 08.12.22 um 11:36 schrieb Uwe Schindler: You have to reopen the index reader to see deletes from the indexwriter. Am 08.12.2022 um 10:32 schrieb Hrvoje Lončar: Did you call this method before or after commit method

Re: What exactly returns IndexReader.numDeletedDocs()

2022-12-08 Thread Uwe Schindler
in in more detail what this method is doing? Thanks Michael -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de - To unsubscribe, e-mail: java-user-unsubscr...@lucene.

Re: Sort by numeric field, order missing values before anything else

2022-11-21 Thread Uwe Schindler
just for a different long value. Besides writing a custom comparator, is there any simpler and still performant way to achieve this sort? --Petko -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMa

Re: Migrating WhitespaceTokenizerFactory from 8.2 to 9.4

2022-10-29 Thread Uwe Schindler
containing: org.apache.lucene.analysis.core.WhitespaceTokenizerFactory What am I missing? Any help would be appreciated. Thanks, David Shifflett -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de

Re: java 17 and older lucene (4.x)

2022-09-26 Thread Uwe Schindler
esults let me know as well. Thanks - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thet

Re: Lucene 9.2.0 build fails on Windows

2022-09-14 Thread Uwe Schindler
efinitely wrong because I'm on Windows and it works for me like a charm. Dawid - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org -- Uwe Schindler

Re: [External] Re: Can lucene be used in Android ?

2022-09-12 Thread Uwe Schindler
nks, David Shifflett Senior Lead Technologist Enterprise Cross Domain Solutions (ECDS) Booz Allen Hamilton On 9/10/22, 5:30 AM, "Uwe Schindler" wrote: Hi Jie, actually the Lucene 9.x series requires JDK 11 to run, previous versions also work with Java 8. The main branch

Re: Can lucene be used in Android ?

2022-09-10 Thread Uwe Schindler
, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de - To unsubscribe

Re: How to filter KnnVectorQuery with multiple terms?

2022-09-01 Thread Uwe Schindler
yVector, k, filter); but it is not clear to me how I can filter for multiple terms. Should I subclass MultiTermQuery and use as filter, just as I use TermQuery as filter above? Thanks Michael -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@t

Re: [ANNOUNCE] Issue migration Jira to GitHub starts on Monday, August 22

2022-08-24 Thread Uwe Schindler
- Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://w

Re: Can I integrate Apache Lucene with Dovecot POP3/IMAP incoming mail server to perform indexing and fast searching of email messages?

2022-08-13 Thread Uwe Schindler
ibe, e-mail:java-user-unsubscr...@lucene.apache.org For additional commands, e-mail:java-user-h...@lucene.apache.org -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail:u...@thetaphi.de

Re: Lucene Disable scoring

2022-07-11 Thread Uwe Schindler
calls can cause delay. As a result I'm looking for a trick to ignore the function call and have all no scoring on my whole query Is it possible to ignore this step? thanks a million -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de

Re: Fuzzy Query Similarity

2022-07-09 Thread Uwe Schindler
ore/src/java/org/apache/lucene/search/FuzzyTermsEnum.java#L248-L256 So in short the exact term gets a boost factor of 1 in the resulting term query, all other terms a lower one. Uwe -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail:u...@thetaphi.de

Re: Fuzzy Query Similarity

2022-07-09 Thread Uwe Schindler
with field 1.0 = tf(freq=1.0), with freq of: 1.0 = freq, occurrences of term within document 0.70710677 = fieldNorm - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional comma

Re: Fwd: Finding out which fields matched the query

2022-06-27 Thread Uwe Schindler
h time. I wonder what is the efficient way to get the matched fields. Would you please offer some help? Thank you so much! Best regards, Yichen Sun -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@theta

Re: Regarding field cache

2022-06-08 Thread Uwe Schindler
the field cache is getting cleared. Can you please help to clarify this. On 2022/06/08 17:46:50 Uwe Schindler wrote: Hi, You do not neessarily need a commit. If you use SearcherManager in combination with NRTCachingDirectory you can also refresh you searcher every few seconds, so in-memory cached

Re: Regarding field cache

2022-06-08 Thread Uwe Schindler
i: Thanks Uwe! New searcher opens when we do a commit.Apart from this, are there other scenarios where a searcher would be refreshed? On 2022/06/08 16:43:07 Uwe Schindler wrote: Hi, They get evicted when the segment of that index is closed. After that theres no reference to them anymo

Re: Regarding field cache

2022-06-08 Thread Uwe Schindler
any other scenario which could evict the unused entries from fieldcache. Please help to clarify the same. Thanks Poorna -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de - To unsubscri

Re: Index corruption and repair

2022-05-05 Thread Uwe Schindler
g*: >>>>> > > > > >>>>> > > > > - Python 3.8.10 >>>>> > > > >- Pylucene 6.5.0 >>>>> > > > >- Java 8 (1.8.0_181) >>>>> > > > >- Runs on Linux and Windows (error seen on Windows) >>>>> > > > > >>>>> > > > > We suddenly get the following *error*: >>>>> > > > > >>>>> > > > > 2022-02-10 09:58:09.253215: ERROR : writer | Failed to get index >>>>> > > > > (D:\i\202202) writer, Exception: >>>>> > > > > org.apache.lucene.index.CorruptIndexException: Unexpected file >>>>> read >>>>> > error >>>>> > > > > while reading index. >>>>> > > > > >>>>> > > > >>>>> > >>>>> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="D:\i\202202\segments_fo"))) >>>>> > > > > >>>>> > > > > >>>>> > > > > After this, no further indexing happens - trying to open the >>>>> index >>>>> > for >>>>> > > > > writing throws the above error - and the index writer does not >>>>> open. >>>>> > > > > >>>>> > > > > FYI, our code contains the following *settings*: >>>>> > > > > >>>>> > > > > index_path = "D:\i\202202" >>>>> > > > > index_directory = FSDirectory.open(Paths.get(index_path)) >>>>> > > > > iconfig = IndexWriterConfig(wrapper_analyzer) >>>>> > > > > iconfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND) >>>>> > > > > iconfig.setRAMBufferSizeMB(16.0) >>>>> > > > > writer = IndexWriter(index_directory, iconfig) >>>>> > > > > >>>>> > > > > >>>>> > > > > *Repairing* >>>>> > > > > We tried 'repairing' the index with the following command / >>>>> tool: >>>>> > > > > >>>>> > > > > java -cp lucene-core-6.5.0.jar:lucene-backward-codecs-6.5.0.jar >>>>> > > > > org.apache.lucene.index.CheckIndex "D:\i\202202" -exorcise >>>>> > > > > >>>>> > > > > This however returns saying "No problems found with the index." >>>>> > > > > >>>>> > > > > >>>>> > > > > *Work around* >>>>> > > > > We have to manually delete the problematic segment file: >>>>> > > > > D:\i\202202\segments_fo >>>>> > > > > after which the application starts again... until the next >>>>> > corruption. We >>>>> > > > > can't spot a specific pattern. >>>>> > > > > >>>>> > > > > >>>>> > > > > *Two questions:* >>>>> > > > > >>>>> > > > >1. Can we handle this situation programmatically, so that no >>>>> > manual >>>>> > > > >intervention is needed? >>>>> > > > >2. Any reason why we are facing the corruption issue in the >>>>> first >>>>> > > > place? >>>>> > > > > >>>>> > > > > >>>>> > > > > Before this we were using Pylucene 4.10 and we didn't face this >>>>> > problem - >>>>> > > > > the application logic is the same. >>>>> > > > > >>>>> > > > > Also, while the application runs on both Linux and Windows, so >>>>> far we >>>>> > > > have >>>>> > > > > observed this situation only on various Windows platforms. >>>>> > > > > >>>>> > > > > Would really appreciate some assistance. Thanks in advance. >>>>> > > > > >>>>> > > > > Regards, >>>>> > > > > Antony >>>>> > > > >>>>> > > > >>>>> > > > >>>>> > > > -- >>>>> > > > Adrien >>>>> > > > >>>>> > > > >>>>> - >>>>> > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>> > > > For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>> > > > >>>>> > > > >>>>> > >>>>> > - >>>>> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>> > For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>> > >>>>> > >>>>> >>>> -- Uwe Schindler Achterdiek 19, 28357 Bremen https://www.thetaphi.de

RE: Returning large resultset is slow and resource intensive

2022-03-08 Thread Uwe Schindler
Hi, > For our use case, we need to run queries which return the full > matched result set. In some cases, this result set can be large (50k+ > results out of 4 million total documents). > Perf test showed that just 4 threads running random queries returning 50k > results make Lucene utilize 100%

RE: Migration from Lucene 5.5 to 8.11.1

2022-01-17 Thread Uwe Schindler
"*initially* created with 6.x". - Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: András Péteri > Sent: Thursday, January 13, 2022 9:59 AM > To: java-user@lucene.apache.org > Subject:

RE: migration from lucene 5 to 8

2022-01-17 Thread Uwe Schindler
Hi, no that's expected. See my other post as response to another question a minute ago. Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Sascha Janz > Sent: Wednesday, January 12, 202

RE: Moving from lucene 6.x to 8.x

2022-01-17 Thread Uwe Schindler
By the way > Hi, one thing that always works to "forcefully" upgrade without reindexing. > You > just merge the old index into a completely new index not by coping files, but > by > sending their SegmentReaders to addIndex, stripping all metadata from them > with some trick: >

RE: Moving from lucene 6.x to 8.x

2022-01-17 Thread Uwe Schindler
ately. This may be a bit slower as the whole index needs to be processed, but it is still faster than reindexing. If you have incorrect offsets, the process will fail, so there's no risk. Uwe ----- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de

Re: Log4j

2021-12-15 Thread Uwe Schindler
schrieb Ali Akhtar : >Does Lucene not have any internal logging at all, e.g for debugging? > >On Thu, Dec 16, 2021 at 2:49 AM Uwe Schindler wrote: > >> Hi, >> >> Lucene is an API and does not log with log4j. >> >> Only the user interface Luke uses log4j, but

Re: Log4j

2021-12-15 Thread Uwe Schindler
Lucene is not affected by the latest bug, right? >I saw on Solr News page there are some fixes already made to Solr. >Best regards -- Uwe Schindler Achterdiek 19, 28357 Bremen https://www.thetaphi.de

RE: Java 17 and Lucene

2021-10-26 Thread Uwe Schindler
. It happens not all the time (about 1/4th of all builds) and > > > due > > > > to the fact that the JVM is unresponsible it is not possible to get a > > > stack > > > > trace with "jstack". If you know a way to get the stack t

RE: Java 17 and Lucene

2021-10-19 Thread Uwe Schindler
Hi, > > On a side note, the Lucene codebase still uses the deprecated (as of > > JDK17) AccessController > > in the RamUsageEstimator class. > > We suppressed the warning for now (based on recommendations > > > >

RE: Java 17 and Lucene

2021-10-19 Thread Uwe Schindler
Hi, > Hey, > > Our team at Amazon Product Search recently ran our internal benchmarks with > JDK 17. > We saw a ~5% increase in throughput and are in the process of > experimenting/enabling it in production. > We also plan to test the new Corretto Generational Shenandoah GC. I would a bit

RE: IntervalQuery replacement for SpanFirstQuery? Closest replacement for slops?

2021-10-08 Thread Uwe Schindler
as sibling should clauses? Other suggestions? Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Alan Woodward > Sent: Monday, September 21, 2020 7:56 PM > To: Dawid Weiss > Cc: Lucene Users

Re: Question about readVint & writeVint from DataOutput and DataInput

2021-09-03 Thread Uwe Schindler
ted but should >be avoided? Should I submit a PR to prevent negative integers? -- Uwe Schindler Achterdiek 19, 28357 Bremen https://www.thetaphi.de

Re: Range query with Lucene7.7.1 on old indexes.

2021-09-01 Thread Uwe Schindler
uot;), long("20190101115959")) > >No results. > >query = LongPoint.newRangeQuery("xdate", long("2019010100"), >long("20190101115959")) > >No results. > >How to get the results on my old indexes using date range query? > >Can anyone help? > >Thanks -- Uwe Schindler Achterdiek 19, 28357 Bremen https://www.thetaphi.de

RE: lucene 4.10.4 punctuation

2021-08-25 Thread Uwe Schindler
Hi, you should explain to use what you exactly want to do: How do you want to search, how do your documents look like? Why is it important to match on punctuation and how should this matching look like? Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u

RE: Failed to execute Ant run-task command

2021-08-19 Thread Uwe Schindler
Could you please open an issue? Can you also check if it still happens on main branch with Lucene 9.0 and Gradle as build system? - Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: xiaoshi > Sent:

RE: NRT readers and overall indexing/querying throughput

2021-08-08 Thread Uwe Schindler
o search performance go down depending on refresh rate. Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Alexander Lukyanchikov > Sent: Wednesday, August 4, 2021 4:43 AM > To: java-user@lucene.a

RE: Does Lucene have anything like a covering index as an alternative to DocValues?

2021-07-05 Thread Uwe Schindler
. The posting list of each term can only store internal, numeric lucene doc ids. Those have then to be used to lookup the actual contents from e.g. stored fields (possibility A) or DocValues (possibility B). We can't store UUIDs in the highly compressed posting list. Uwe - Uwe Schindler

RE: Control the number of segments without using forceMerge.

2021-07-05 Thread Uwe Schindler
- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Alex K > Sent: Monday, July 5, 2021 4:04 AM > To: java-user@lucene.apache.org > Subject: Control the number of segments without using forceMerge. >

RE: Does Lucene have anything like a covering index as an alternative to DocValues?

2021-07-05 Thread Uwe Schindler
. If you still need to store it as DocValues field, just add it with both types. Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Alex K > Sent: Monday, July 5, 2021 2:30 AM > To: java-user@lucen

RE: Changing Term Vectors for Query

2021-06-07 Thread Uwe Schindler
applicable. If you want to have "per document" scoring factors (not per term), you can also use additional DocValues fields with per-document factors and you can use a function query (e.g. using expressions module) to modify the score. Uwe - Uwe Schindler Achterdiek 19,

RE: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)

2020-12-14 Thread Uwe Schindler
rom FSDirectory > if (dir.getPreload() == false) > dir.setPreload(Constants.PRELOAD_YES); // In-Memory Lucene Index > enabled-> *here setPreload cannot be used* > IndexReader reader = DirectoryReader.open(dir); > IndexSearcher is = new IndexSearcher(reader); > >

RE: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)

2020-12-14 Thread Uwe Schindler
off heap and are part of usual paging. They are just no longer backed by a file. Lucene does most of the stuff outside heap, live with it! Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: baris.ka...@ora

RE: Lucene Migration query

2020-11-20 Thread Uwe Schindler
Hi, > Currently I am using Lucene 7.3, I want to upgrade to lucene 8.5.1. Should > I do reindexing in this case ? No, you don't need that. > Can I make use of backward codec jar without a reindex? Yes, just add the JAR file to your classpath and it can read the indexes. Updates written to the

Re: best way (performance wise) to search for field without value?

2020-11-13 Thread Uwe Schindler
e groups_allowed field empty when the document >> should >> >> able to be retrieved by all users, so we need to also select a >document >> if >> >> the 'groups_allowed' is empty. >> >> >> >> What would be the faster Query construction to do so? >> >> >> >> >> >> Currently I use a TermRangeQuery that basically matches all values >and >> put >> >> that in a MUST_NOT combined with a MatchAllDocumentQuery(), but >that >> gets >> >> rather slow then the number of groups is high. >> >> >> >> Thanks! >> >> >> > >> -- Uwe Schindler Achterdiek 19, 28357 Bremen https://www.thetaphi.de

Re: BooleanQuery: BooleanClause.Occur.MUST_NOT seems to require at least one BooleanClause.Occur.MUST

2020-11-06 Thread Uwe Schindler
leanQuery with just >>> a BooleanClause.Occur.MUST (i.e. results will return fine if they >match). >>> >>> Is this by design or is this an issue? >>> >>> Thanks You, >>> Nissim Shiman >> >> >> >> -- >> Adrien > > >- >To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >For additional commands, e-mail: java-user-h...@lucene.apache.org > -- Uwe Schindler Achterdiek 19, 28357 Bremen https://www.thetaphi.de

RE: stucked indexing process

2020-10-14 Thread Uwe Schindler
eScheduler to enable SSD or spinning disk default settings in your solrconfig.xml: true Use "true" for spinning disks and "false" for SSDs. This prevents the auto-detection from running. Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen https:/

Re: Links to classes missing for BMW

2020-10-12 Thread Uwe Schindler
:22:43 PM UTC schrieb baris.ka...@oracle.com: >Hi Uwe,- > >  Could You please point me to the class documentation please? > >Best regards > > >On 10/12/20 12:16 PM, Uwe Schindler wrote: >> BMW support is in Lucene since version 8.0. >> >> Uwe >> >&g

Re: Links to classes missing for BMW

2020-10-12 Thread Uwe Schindler
lso" so it implies support for Lucene, too, right? > >Best regards > > > >- >To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >For additional commands, e-mail: java-user-h...@lucene.apache.

Re: Fuzzy Search Scoring Adjustment

2020-09-23 Thread Uwe Schindler
gt;otherwise function exactly the same would also work, but all of those >are >either final classes or have no public constructor, effectively making >it >impossible to reuse their logic directly, as near as I can tell. > >If anyone has any ideas of how to approach this, it would be very >helpful. > >Thanks, >Kainoa -- Uwe Schindler Achterdiek 19, 28357 Bremen https://www.thetaphi.de

IntervalQuery replacement for SpanFirstQuery? Closest replacement for slops?

2020-09-21 Thread Uwe Schindler
ond question: What's the "closest" replacement for a PhraseQuery with slop? Should I use maxwidth(slop + 1) or maxgaps(slop-1) or maxgaps(slop). I know SpanQuery slops cannot be fully replaced with intervals, but I don't care about those SpanQuery bugs. Uwe - Uwe Schindler Achterdiek 19, D-2835

RE: [VOTE] Lucene logo contest, third time's a charm

2020-09-06 Thread Uwe Schindler
Hi, My votes (binding): A1, D Reason: I want to keep the original Lucene colors, so A1 is the only alternative. I still really like the old one, if it would be better vectorized, so my second choice is D. Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen https

Re: Tessellate exception in Elasticsearch

2020-06-04 Thread Uwe Schindler
CRS features in recent ES development. My fault. Uwe Am June 4, 2020 1:40:51 PM UTC schrieb Uwe Schindler : >Hi, > >Yes. With different projections there is one issue: Elasticsearch only >converts the polygon points to wgs84. But depending on the projection, >the lines between

Re: Tessellate exception in Elasticsearch

2020-06-04 Thread Uwe Schindler
> [41.90573200001381, 44.2310018589] [9.3213479767, >> > -3.20048586995] ]. Possible malformed shape detected. >> > at >> > org.apache.lucene.geo.Tessellator.tessellate(Tessellator.java:114) >> > ~[lucene-sandbox-7.7.3.jar:7.7.3 >> 1a0d2a901dfec93676b0fe8be425101ceb754b85 - >> > noble - 2020-04-21 10:31:55] >> > at >> > >> >org.apache.lucene.document.LatLonShape.createIndexableFields(LatLonShape.java:73) >> > ~[lucene-sandbox-7.7.3.jar:7.7.3 >> 1a0d2a901dfec93676b0fe8be425101ceb754b85 - >> > noble - 2020-04-21 10:31:55] >> > at >> > >> >org.elasticsearch.index.mapper.GeoShapeFieldMapper.indexShape(GeoShapeFieldMapper.java:146) >> > ~[elasticsearch-6.8.9.jar:6.8.9] >> > >> > This is a very basic geometry. Could someone please explain why >this >> shape >> > is invalid? >> > >> > >> > >> > >> > Thanks in advance, >> > >> > Wouter Claeys >> > >> -- Uwe Schindler Achterdiek 19, 28357 Bremen https://www.thetaphi.de

RE: Need suggetion in replacing forcemerge(1) with alternative which consumes less space

2020-04-14 Thread Uwe Schindler
it is not feasible for > our > use case , because it takes 3X memory. We are creating indexes for huge data. Don't use forceMerge, especially not to work around some issue that comes from wrong multi-threading code and basic misunderstanding on IndexReaders and their relationship to IndexWriters

Re: Lucene 8 early termination

2020-01-23 Thread Uwe Schindler
pointer is greatly appreciated. > >Best, >Wei -- Uwe Schindler Achterdiek 19, 28357 Bremen https://www.thetaphi.de

Re: Disk Free decrease in a directory containing only live lucene indexes

2020-01-21 Thread Uwe Schindler
usage >as calculated by the df util grows more rapidly than that calculated by >the >du util. > >When we terminate the application the disk usage calculated with the >two >utils is the same and it is the one calculated with du when the >application >is running. > >Can

Re: Quest about Lucene's IndexSearcher.search(Query query, int n) API's parameter n

2020-01-09 Thread Uwe Schindler
ay be very >inefficient... > >My current idea: use more detailed near-to-far sub geo ranges to >iteratively/incrementally search/filter -> load documents -> manual >sort -> >combine. > >Any suggestions? -- Uwe Schindler Achterdiek 19, 28357 Bremen https://www.thetaphi.de

RE: Use custom score in ConstantScoreQuery

2019-12-09 Thread Uwe Schindler
Hi, Just add a BoostQuery with a boost factor of 0.5 around the ConstantScoreQuery. It's just one line more in your code. I don't understand why we would need separate query classes for this. Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u

Re: Index-time boosting: Deprecated setBoost method

2019-10-21 Thread Uwe Schindler
cess field2 with boost value 1.0f? > >Before, this was being done at index time. > > >i can see the only way here is the BooleanQuery which combines > >the first boostquery object bq and another one that i need to define >for >bq2 for field2. > >is there any other

RE: Index-time boosting: Deprecated setBoost method

2019-10-21 Thread Uwe Schindler
made the whole thing not reliable. Uwe > Best regards > > > On 10/21/19 12:54 PM, Uwe Schindler wrote: > > Hi, > > > > As I said, before that is a misuse of index-time boosting. In addition in > previous versions it did not even work correctly, because of quer

RE: Index-time boosting: Deprecated setBoost method

2019-10-21 Thread Uwe Schindler
of fields and boost factors. Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: baris.ka...@oracle.com > Sent: Monday, October 21, 2019 6:45 PM > To: java-user@lucene.apache.org > Cc: baris.kazar

RE: Index-time boosting: Deprecated setBoost method

2019-10-21 Thread Uwe Schindler
on the docvalues field. That can be done with Expression modules (using compiled Javascript) or by another query in Lucene that operates on ValueSource (e.g., FunctionQuery). The first one is easier to use for complex formulas.4 Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de

Re: Index-time boosting: Deprecated setBoost method

2019-10-18 Thread Uwe Schindler
regards > > >On 10/18/19 2:57 PM, Uwe Schindler wrote: >> Sorry I was imprecise. It's a mix of both. The factors are stored per >document in index (this is why I called it index time). During query >time the expression use the index time values to fold them into the

Re: Index-time boosting: Deprecated setBoost method

2019-10-18 Thread Uwe Schindler
ine both. >> >>> Maybe it is not needed with MultiFieldQueryParser. >> You use MultiFieldQueryParser to adjust weights of the fields (e.g. >title versus body). The parsed query is then wrapped with an expression >that modifies the score per document according to the docvalu

RE: Index-time boosting: Deprecated setBoost method

2019-10-18 Thread Uwe Schindler
t. So you can combine both. > Maybe it is not needed with MultiFieldQueryParser. You use MultiFieldQueryParser to adjust weights of the fields (e.g. title versus body). The parsed query is then wrapped with an expression that modifies the score per document according to the docvalues. U

RE: Index-time boosting: Deprecated setBoost method

2019-10-18 Thread Uwe Schindler
syntax in the expressions module). This allows you to compile a javascript function that calculated the final score based on the score returned by the inner query and combines them with docvalues that were indexed per document. Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen https

RE: Split package in Lucene 8.2.0

2019-09-05 Thread Uwe Schindler
- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Philippe Cadé > Sent: Thursday, September 5, 2019 2:11 PM > To: java-user@lucene.apache.org > Subject: Split package in Lucene 8.2.0 > > Dear all

Re: AlphaNumeric analyzer/tokenizer

2019-08-19 Thread Uwe Schindler
xt12" etc. >> >> Is there something like an Alphanumeric analyzer which would be very >> similar to SimpleAnalzyer but in addition to letters it would also >keep >> digits in its tokens? I am willing contribute such an analyzer if one >is >> not available. >> >> Thanks and Regards, >> Abhishek >> >> >> -- Uwe Schindler Achterdiek 19, 28357 Bremen https://www.thetaphi.de

RE: AlphaNumeric analyzer/tokenizer

2019-08-16 Thread Uwe Schindler
://lucene.apache.org/core/8_2_0/analyzers-common/org/apache/lucene/analysis/pattern/PatternTokenizerFactory.html (the example there is for Apache Solr, but you can use the same parameter names in CustomAnalyzer) Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail

Re: Lucene 5.2.1 score for MUST_NOT query

2019-08-04 Thread Uwe Schindler
f a hit in response to a query that begins with the >clause >> MUST_NOT? >> Is it 0 or something else? >> What does it mean? >> How is it calculated? >> >> Thank you in advance. Claude Lepère >> >-- >Regards, > >Atri >Apache Concerted -- Uwe Schindler Achterdiek 19, 28357 Bremen https://www.thetaphi.de

RE: Slowness on Java 11 with Lucene 6

2019-07-29 Thread Uwe Schindler
case (default settings with pure Lucene application). Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: GASPARD-EXT Joel > Sent: Monday, July 29, 2019 5:17 PM > To: java-user@lucene.apache.o

RE: field:* vs field:[* TO *]

2019-04-18 Thread Uwe Schindler
Hi, > I was pointed to Lucene from the Solr list. I am wondering if the > performance of the below two queries is expected to be quite different and > would they return the same set of results? > > field:* > field:[* TO *] >From the Lucene side they are identical, but it depends on the

RE: Upper limit on Score

2019-04-18 Thread Uwe Schindler
No there is no limit. - Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Kevin Manuel > Sent: Wednesday, April 17, 2019 7:38 PM > To: java-user@lucene.apache.org > Subject: Upper limit on Score

Re: Noticed performance degrade from lucene-7.5.0 to lucene-8.0.0

2019-04-14 Thread Uwe Schindler
be >executed in 18 to 24 milliseconds now taking 74 to 110 milliseconds. > >Any suggestion please? > >Regards, >Khurram -- Uwe Schindler Achterdiek 19, 28357 Bremen https://www.thetaphi.de

RE: Why does Lucene 7.4.0 commit() Increase Memory Usage x2

2019-04-04 Thread Uwe Schindler
ParallelGC) never gives back any memory to OS, same applies for ConcMarkSweepGC. And I assume you are using this one. Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message----- > From: Uwe Schindler > Sent: Thursda

  1   2   3   4   5   6   7   8   9   10   >