Re: why the of advance(int target) function of DocIdSetIterator is defined with uncertain?

2012-04-17 Thread Mikhail Khludnev
? -- Sincerely yours Mikhail Khludnev ge...@yandex.ru http://www.griddynamics.com mkhlud...@griddynamics.com

Re: no concurrent merging?

2016-08-04 Thread Mikhail Khludnev
ene.index.IndexWriter.mergeInit(IndexWriter.java:3792) > > - locked <6d75db> (a org.apache.solr.update.SolrIndexWriter) > > at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3646) > > at > > > org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588) > > at > > > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626) > > > > > > - > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > -- Sincerely yours Mikhail Khludnev

Re: Searching in a bitMask

2016-08-27 Thread Mikhail Khludnev
mask&0xf == 0xf ? > -- Sincerely yours Mikhail Khludnev

Re: Lucene 6.1: number of hits per document

2016-08-29 Thread Mikhail Khludnev
gt; Quoted from: > http://lucene.472066.n3.nabble.com/Lucene-6-1-number-of-hits-per-document- > tp4293245p4293687.html > > > > > -- > View this message in context: http://lucene.472066.n3. > nabble.com/Lucene-6-1-number-of-hits-per-document-tp4293245p4293755.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev

Re: BlockJoinQuery with sorting

2016-11-28 Thread Mikhail Khludnev
07405p4307650.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > ----- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev

Re: BlockJoinQuery with sorting

2016-11-26 Thread Mikhail Khludnev
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev

Re: complex disjoint search query

2016-10-12 Thread Mikhail Khludnev
a query to achieve > such expectations? > > Regards, > Valentin > > > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev

Re: BlockJoin with RAM Directory

2016-11-29 Thread Mikhail Khludnev
> nabble.com/BlockJoin-with-RAM-Directory-tp4307818.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev

Re: Disabling Lucene Scoring/Ranking

2017-01-09 Thread Mikhail Khludnev
ery can return any documents matching search criteria. > > So, Is there a way to completely disable scoring/ranking altogether? > OR Is there a better solution to it. > > Regards > Rajnish > -- Sincerely yours Mikhail Khludnev

Re: Apply Lucene Query on Bits

2016-12-05 Thread Mikhail Khludnev
Thx > Hendrik > > > -- > Hendrik Saly (salyh, hendrikdev22) > @hendrikdev22 > PGP: 0x22D7F6EC > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev

Re: query parser of SpanNearQuery

2016-12-05 Thread Mikhail Khludnev
Hello, You can check ComplexPhrase and Surround query parsers. On Mon, Dec 5, 2016 at 8:12 AM, Yonghui Zhao <zhaoyong...@gmail.com> wrote: > It seems lucene query parser doesn't support SpanNearQuery. > Is there any query parser supports SpanNearQuery? > -- Sincerely yours Mikhail Khludnev

Re: ComplexPhraseQueryParser with wildcards

2017-01-02 Thread Mikhail Khludnev
searcher = new IndexSearcher(reader); > ComplexPhraseQueryParser parser = new ComplexPhraseQueryParser("field", > new > StandardAnalyzer()); > TopDocs topDocs; > > Query queryOk = parser.parse("field: (john* peters)"); > topDocs = searcher.search(queryOk, 2); > System.out.println("found " + topDocs.totalHits + " docs"); > > Query queryFail = parser.parse("field: (\"john*\" \"peters\")"); > topDocs = searcher.search(queryFail, 2); // -> throws the above > mentioned exception > System.out.println("found " + topDocs.totalHits + " docs"); > > } > > } > -- Sincerely yours Mikhail Khludnev

Re: ComplexPhraseQueryParser with wildcards

2016-12-20 Thread Mikhail Khludnev
> writer.commit(); > > writer.close(); > > > > IndexReader reader = DirectoryReader.open(directory); > > IndexSearcher searcher = new IndexSearcher(reader); > > ComplexPhraseQueryParser parser = new ComplexPhraseQueryParser("field", > > new > > StandardAnalyzer()); > > TopDocs topDocs; > > > > Query queryOk = parser.parse("field: (john* peters)"); > > topDocs = searcher.search(queryOk, 2); > > System.out.println("found " + topDocs.totalHits + " docs"); > > > > Query queryFail = parser.parse("field: (\"john*\" \"peters\")"); > > topDocs = searcher.search(queryFail, 2); // -> throws the above > > mentioned exception > > System.out.println("found " + topDocs.totalHits + " docs"); > > > > } > > > > } > > > > - > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > -- Sincerely yours Mikhail Khludnev

Re: Document serializable representation

2017-03-30 Thread Mikhail Khludnev
r own. This is what Elasticsearch or Solr are doing: They > accept the document, decide which shard they should be located and transfer > the plain fieldname:value pairs over the network. Each node then creates > Lucene IndexableDocuments out of it and passes to their own IndexWriter. > >

Re: join in lucene

2017-03-16 Thread Mikhail Khludnev
s a different class > 2) by reflection if the getter is class ==entity Loader load document > with the key saved in parent object. > > > *second question:* > *this solution is essentially similar to how it works the query times or > not (so similar performance)?* > -- Sincerely yours Mikhail Khludnev

Re: Analyzing Infix Suggestor Exact Match Boost

2017-03-08 Thread Mikhail Khludnev
urned > first. So how can i get back exact matches first?? > > Thanks. > -- Sincerely yours Mikhail Khludnev

Re: Correction: SpanNearQuery Class issue through spans object (Not through Searcher.search() method)

2017-06-20 Thread Mikhail Khludnev
g Apache lucene 6.5.0 version. Please let me know about this > since I am using this for a critical project? > > Thanks, > Ranganath B. N. > > -- Sincerely yours Mikhail Khludnev

Re: What is the fastest way to loop over all documents in an index?

2017-09-05 Thread Mikhail Khludnev
stest way to loop over all documents in an index? > Is it looping over all possible doc id’s (+filtering out deleted > documents)? > > Thank you very much. > > Best regards > Claude > > -- Sincerely yours Mikhail Khludnev

Re: Tracking that all query terms are matched in one document

2017-12-05 Thread Mikhail Khludnev
egards, > >> > > > Vadim Gindin > >> > > > > >> > > > On Mon, Dec 4, 2017 at 3:18 PM, Michael Sokolov < > msoko...@gmail.com > >> > > >> > > > wrote: > >> > > > > >> > > > > You could combine a Boolean and query with the same terms, as an > >> > > optional > >> > > > > clause. Are you sure about the requirement to multiply the score > >> in > >> > > that > >> > > > > case? > >> > > > > > >> > > > > On Dec 4, 2017 5:13 AM, "Vadim Gindin" <vgin...@detectum.com> > >> wrote: > >> > > > > > >> > > > > > Hi all. > >> > > > > > > >> > > > > > I need to track that all query terms are matched in one > >> document. > >> > > When > >> > > > > all > >> > > > > > terms are matched I need to multiply the score of such > document > >> to > >> > > some > >> > > > > > constant coefficient. > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > > > > -- Sincerely yours Mikhail Khludnev

Re: Tracking that all query terms are matched in one document

2017-12-13 Thread Mikhail Khludnev
ality? > How to avoid this? > > Thanks, > Vadim Gindin > > On Fri, Dec 8, 2017 at 2:01 PM, Vadim Gindin <vgin...@detectum.com> wrote: > > > Thank's for your help. I'll try that. > > > > On Tue, Dec 5, 2017 at 4:18 PM, Mikhail Khludnev <m...@apache.org>

Re: Query in a doc context

2017-12-14 Thread Mikhail Khludnev
ding > information: what terms are matched to what fields and so on. > > > It seems, that BooleanQuery/BooleanScorer is not a good place to accumulate > some information from a child Queries/Scorers. > -- Sincerely yours Mikhail Khludnev

Re: Terminology. LeafReader -> TermEnum -> PostingsEnum

2017-12-14 Thread Mikhail Khludnev
ncipal difference > between these 20 implementations and which of them can be really useful? > > Regards, > Vadim Gindin > -- Sincerely yours Mikhail Khludnev

Re: Explain flag in CustomQuery

2018-06-25 Thread Mikhail Khludnev
t SearchContext will be propagated to a Query, but I didn't > found the way how to get. I only have LeafReaderContext or LeafReader. > Could you advice me? > > Regards, > Vadim Gindin > -- Sincerely yours Mikhail Khludnev

Re: How search code files for words which contains a given substrings?

2018-06-26 Thread Mikhail Khludnev
numbers and the positions > inside the line. > Many thanks in advance for your help, > Ira > > -- Sincerely yours Mikhail Khludnev

Re: How search code files for words which contains a given substrings?

2018-06-26 Thread Mikhail Khludnev
the previous Token in a TokenStream, > used in phrase searching. > I am not in phrase searching. > Would you mind to explain how it can help me? > > Thanks, > Ira > > -Original Message- > From: Mikhail Khludnev [mailto:m...@apache.org] > Sent: Tuesd

Re: Query in a doc context

2017-12-30 Thread Mikhail Khludnev
.@donaq.com> > > > wrote: > > > > > > > Apologies if I completely misundetstood but if you are looking to do > a > > > full > > > > doc match, you could duplicate duplicated the doc into another field > > that > > > > is a true full text i

Re: Wrong ID in explain() method.

2017-12-29 Thread Mikhail Khludnev
the document id (that was matched in scorer) to a > > collection. When explain(id) is called it checks specified id in this > > collection and outputs "matched"/"not matched". > > > > The questions. > > 0. This document is founded by the plugin, but explain(

Re: Lucene API to retrieve matched words

2018-09-06 Thread Mikhail Khludnev
or highlighting, just a list of the words. So if > I > search for 'ski' and I match on 'skier' and 'skiis', I would like to get > back a list that includes 'skier' and 'skiis'. > > Is there an API call that provides this? > > > > Thanks > > Mike > > -- Sincerely yours Mikhail Khludnev

Re: Camel case search with Lucene

2018-10-04 Thread Mikhail Khludnev
e, search "redHotChilly" > instead of "red hot chilly" - you should use own pattern tokenizer to > divide the query by regex pattern. > > Regards > Vadim Gindin > > On Thu, Oct 4, 2018 at 11:58 AM Gordin, Ira wrote: > > > Hi friends, > > > > How can I implement Camel case search with Lucene? > > > > Thanks, > > Ira > > > > > > > -- Sincerely yours Mikhail Khludnev

Re: How to access DocValues inside a customized collector?

2018-09-21 Thread Mikhail Khludnev
way to see directly indexed data (Luke seems obsolete, > Marple does not work with lucene 7.4.0 yet)? > > Thanks very much for helps, Lisheng > -- Sincerely yours Mikhail Khludnev

Re: Question About FST, multiple-column index

2018-09-21 Thread Mikhail Khludnev
e any > Combined Index structure like multiple-column indexes in mysql? I think is > there any solutions to extends to FST which make the FINAL state connect to > another FST? > > > THANKS -- Sincerely yours Mikhail Khludnev

Re: How can I use FunctionScoreQuery to replace CustomScoreQuery?

2019-01-26 Thread Mikhail Khludnev
pe, > but I'm stuck. > > > > > > > > -- > Sent from: > http://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev

Re: position-anchored queries

2019-03-21 Thread Mikhail Khludnev
any subsequent terms in the field? > > -Mike > -- Sincerely yours Mikhail Khludnev

Re: How can I use FunctionScoreQuery to replace CustomScoreQuery?

2019-01-29 Thread Mikhail Khludnev
ubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev

Re: About custom score using Solr8/Lucene8

2019-05-08 Thread Mikhail Khludnev
example, at least to understand how to start a minimal basic > project? > > Thanks > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev

Re: Index-time join ToParentBlockJoinQuery query produces incorrect result

2019-07-03 Thread Mikhail Khludnev
On Wed, Jul 3, 2019 at 6:11 PM ANDREI SOLODIN wrote: > > This returns "id3", which is unexpected. > > Please check ToPBJQ javadoc. It's absolutely expected. -- Sincerely yours Mikhail Khludnev

Re: block min-max values for Sort Field with Top-N query..

2019-07-02 Thread Mikhail Khludnev
; & won't work for multi-sort field queries or out-of-order scoring etc.. > > But, in general will this be a good idea to explore or something that is > best not attempted? > > Any help is much appreciated > > -- > Ravi > -- Sincerely yours Mikhail Khludnev

Re: Adding and Removing Facet Entries

2019-08-28 Thread Mikhail Khludnev
ssentially looking for something similar to `add-distinct` > and `remove` from Solr's atomic updates functionality, just directly in > Lucene. > -- Sincerely yours Mikhail Khludnev

Re: Lucene one to many query

2019-09-21 Thread Mikhail Khludnev
> > > > > > > -- > Sent from: > https://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev

Re: Index-time join ToParentBlockJoinQuery query produces incorrect result

2019-07-06 Thread Mikhail Khludnev
provide an BitSet > https://lucene.apache.org/core/8_1_1/core/org/apache/lucene/util/BitSet.html?is-external=true > per sub-reader."? If so, given the data above how do I properly create a > parent query? > > > > > > > > > > > > > > On

Re: ComplexPhraseQueryParser isn't switching search terms to lowercase with StandardAnalyzer

2019-10-22 Thread Mikhail Khludnev
"~2 > Type of query : ComplexPhraseQuery > > If I change teststr to "\"Foo Bar\"" > I get > Query : "Foo Bar" > Type of query : ComplexPhraseQuery > > If I change teststr to "Foo Bar" > I get > Query : content:foo content:bar > Type of query : BooleanQuery > > > In the first two cases I was expecting the search terms to be switched to > lowercase. > > Were the Foo and Bar left as originally specified because the terms are > inside double quotes? > > How can I specify a search term that I want treated as a Phrase, > but also have the query parser apply the LowerCaseFilter? > > I am hoping to avoid the need to handle this using PhraseQuery, > and continue to use the QueryParser. > > > Thanks in advance for any help you can give me, > David Shifflett > > -- Sincerely yours Mikhail Khludnev

Re: [External] Re: ComplexPhraseQueryParser isn't switching search terms to lowercase with StandardAnalyzer

2019-10-22 Thread Mikhail Khludnev
sed contains upper or > lower case J and S (in you John Smith case) > > Apologizes on the 'content:foo'. > I changed the code snippet to "somefield", and missed changing that part > of the output > > David Shifflett > > > On 10/22/19, 5:51 AM, "Mikhail Khludn

Re: [External] Re: ComplexPhraseQueryParser isn't switching search terms to lowercase with StandardAnalyzer

2019-10-22 Thread Mikhail Khludnev
s my conditions: > 1) Uses a StandardAnalyzer > 2) Does the actual query.toString() return lowercase J and S > > David Shifflett > > > On 10/22/19, 10:44 AM, "Mikhail Khludnev" wrote: > > On Tue, Oct 22, 2019 at 5:26 PM Shifflett, David [USA] <

Re: How can i specify a custom Analyzer for a Field of Document?

2019-12-09 Thread Mikhail Khludnev
gt; > I have a document set, most fields to index is only text type, suited for a > StandAnalyzer or a SmartChineseAnalyzer. But the problem is, i have a > special field which is a KeywordList type, like "A;B;C", which i hope i can > fully control the analyzing step. > > How to do this in Lucene? > -- Sincerely yours Mikhail Khludnev

Re: Why Lucene's Suggest API can ONLY load field terms which is Store.YES?

2019-12-27 Thread Mikhail Khludnev
ese doubts. I like to quote this talk https://www.youtube.com/watch?v=T5RmMNDR5XI > > Mikhail Khludnev 于2019年12月27日周五 下午5:05写道: > > > Hello, > > It's by design: StringFields are searchable and filled by analysis > output, > > StoredFields are returned input values. &g

Re: ComplexPhraseQueryParser performance question

2020-02-13 Thread Mikhail Khludnev
here are no one. > Best regards > > On 2/4/20 11:14 AM, baris.ka...@oracle.com wrote: > > > > Thanks but i thought this class would have a mechanism to fix this issue. > > Thanks > > > >> On Feb 4, 2020, at 4:14 AM, Mikhail Khludnev wrote: > >> >

Re: Autocompletion based on one field in index

2020-03-03 Thread Mikhail Khludnev
y to achieve > this? > > > Regards > Kumaran R > -- Sincerely yours Mikhail Khludnev

Re: How to tell Lucene index search to stop when it takes too long

2020-02-27 Thread Mikhail Khludnev
But i cant specify Top n docs > here, right? > > > The collector is defined here > > > https://lucene.apache.org/core/8_4_1/core/org/apache/lucene/search/Collector.html > > > https://lucene.apache.org/core/8_4_1/core/org/apache/lucene/search/TopDocsCollector.html &

Re: How to tell Lucene index search to stop when it takes too long

2020-02-24 Thread Mikhail Khludnev
; > Is there such an api or plan to implement one? > > > Best regards > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev

Re: Can Lucene be used as Rules Engine?

2020-01-22 Thread Mikhail Khludnev
ed number of Fields to > query on. Even if there are fixed number of fields, the query has to check > for each field to match at least one word. > > Is it possible to handle this requirement using Lucene? or should I go for > other options? > > I am new to Lucene, any help would be appreciated. > > > > Thanks, > > Kart > > -- Sincerely yours Mikhail Khludnev

Re: ComplexPhraseQueryParser performance question

2020-02-04 Thread Mikhail Khludnev
--- > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > > > > ----- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev

Re: Question about PhraseQuery's capacity...

2020-01-10 Thread Mikhail Khludnev
> > > I use SmartChineseAnalyzer to do the indexing, and add a document with > a > > > TextField whose value is a long sentence, when anaylized, will get 18 > > > terms. > > > > > > & then i use the same value to construct a PhraseQuery, setting slop to > > 2, > > > and adding the 18 terms concequently... > > > > > > I expect the search api to find this document, but it returns empty. > > > > > > Where am i wrong? > > > > > > > > > -- > > Adrien > > > -- Sincerely yours Mikhail Khludnev

Re: Question abount combining InvertedIndex and SortField

2019-12-31 Thread Mikhail Khludnev
emory footprint by storing only top candidate results in a binary heap. IIRC it's described in this classic paper http://www.savar.se/media/1181/space_optimizations_for_total_ranking.pdf -- Sincerely yours Mikhail Khludnev

Re: Needs advice on auto-keyword-correction mode custom query

2020-01-06 Thread Mikhail Khludnev
r composable? Lucene's > "LeafContext" concept is really very confusing me... > -- Sincerely yours Mikhail Khludnev

Re: Why Lucene's Suggest API can ONLY load field terms which is Store.YES?

2019-12-27 Thread Mikhail Khludnev
String term > = byteRef.utf8ToString();terms.add(term);} > } catch (IOException e) {e.printStackTrace(); > log.error(e.getMessage(), e);}* > > To my supprise, terms seems only returning the STORED value, which is the > original value form, but i expect they should be the terms i put in each > StringField! > > Is this a design miss or impl. limit? > -- Sincerely yours Mikhail Khludnev

Re: About custom score using Solr8/Lucene8

2020-07-02 Thread Mikhail Khludnev
-- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > -- > Vincenzo D'Amore > -- Sincerely yours Mikhail Khludnev

Re: Retrieving query-time join fromQuery hits

2020-06-08 Thread Mikhail Khludnev
frei > > On Wed, Jun 3, 2020 at 9:59 PM Mikhail Khludnev wrote: > > > Hi, Stefan. > > Have you considered faceting/aggregation over `from` field? > > > > On Tue, May 12, 2020 at 7:23 PM Stefan Onofrei > > wrote: > > > > > Hi, > > >

Re: Retrieving query-time join fromQuery hits

2020-06-03 Thread Mikhail Khludnev
i > > [1] > > https://lucene.apache.org/core/8_5_1/join/org/apache/lucene/search/join/JoinUtil.html > [2] > > https://lucene.472066.n3.nabble.com/access-to-joined-documents-td4412376.html > [3] https://issues.apache.org/jira/browse/LUCENE-3602 > -- Sincerely yours Mikhail Khludnev

Re: Vector Search with OpenAI Embeddings: Lucene Is All You Need

2023-09-01 Thread Mikhail Khludnev
wech...@wyona.com> wrote: >> >>> Hi Together >>> >>> You might be interesed in this paper / article >>> >>> https://arxiv.org/abs/2308.14963 >>> >>> Thanks >>> >>> Michael >>> >>> - >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: dev-h...@lucene.apache.org >>> >>> -- Sincerely yours Mikhail Khludnev

Re: How to retain % sign next to number during tokenization

2023-09-21 Thread Mikhail Khludnev
g of special chars like - etc > > > On Wed, Sep 20, 2023 at 16:39 Mikhail Khludnev wrote: > > > Hello, > > Check the whitespace tokenizer. > > > > On Wed, Sep 20, 2023 at 7:46 PM Amitesh Kumar > > wrote: > > > > > Hi, > > > &g

Re: How to retain % sign next to number during tokenization

2023-09-20 Thread Mikhail Khludnev
gt; > On the implementation front, I am using a set of filters like > lowerCaseFilter, EnglishPossessiveFilter etc in addition to base tokenizer > StandardTokenizer. > > Per my analysis, StandardTOkenizer strips off the % sign and hence the > behavior.Has someone faced similar requirement? Any help/guidance is highly > appreciated. > -- Sincerely yours Mikhail Khludnev

Re: Reindexing leaving behind 0 live doc segments

2023-08-29 Thread Mikhail Khludnev
rld = iw.getPooledInstance(sci, true); > segmentReader = rld.getReader(IOContext.READ); > > //process all live docs similar to above using the segmentReader. > > rld.release(segmentReader); > iw.release(rld); > }finally{ >if (iwRef != null) { >iwRef.decref(); > } > } > > Help would be much appreciated! > > Thanks, > Rahul > -- Sincerely yours Mikhail Khludnev

Re: Question about Benchmark

2022-05-17 Thread Mikhail Khludnev
isting index for search? Also, is there a way to configure the > benchmark to use multiple threads for indexing (looks to me that it’s a > single-threaded indexing)? > > --Regards, > Balmukund > -- Sincerely yours Mikhail Khludnev

Re: Filter and FilteredQuery replacements

2022-07-12 Thread Mikhail Khludnev
index, with > the bits turned on for those documents we want to be included in search > results. > > If this has already been answered in a forum post, I apologize. Or if > there's a Lucene specific forum somewhere I could look at, if you could > kindly point me there, I would appreciate it. > > Any help/insight is greatly appreciated. > > Thanks, > Scott Robey > -- Sincerely yours Mikhail Khludnev

Re: Lucene Disable scoring

2022-07-11 Thread Mikhail Khludnev
ad of function calls can cause delay. > As a result I'm looking for a trick to ignore the function call and have > all no scoring on my whole query > > Is it possible to ignore this step? > > thanks a million > -- Sincerely yours Mikhail Khludnev

Re: Unclear on what position means

2022-07-21 Thread Mikhail Khludnev
outside of > Lucene? > > Kendall > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev

Re: Help to understand the per-field formats in Lucene

2022-10-25 Thread Mikhail Khludnev
studied the "KnnVectors" a little. > The "PerFieldKnnVectorsFormat.FieldsWriter" acutally uses the > "Lucene94HnswVectorsFormat". > But why do we have this kind of structures? > > Thanks & Regards > > MyCoy > -- Sincerely yours Mikhail Khludnev

Re: Lucene Suggester APIs question

2022-08-14 Thread Mikhail Khludnev
question about lucene suggester APIs. If I build multiple FSTs > using a suggester, is there a way to merge two generated FSTs? > > -- > > Nitish Jain > -- Sincerely yours Mikhail Khludnev

Re: Multi-segments and HNSW

2022-11-02 Thread Mikhail Khludnev
ct on the retrieving quality and performance. > > I'm wondering if there is any best practice, e.g. how many docs should be > in a single graph? > Or does anyone have some production experience to share? > > Thanks & Regards > MyCoy > -- Sincerely yours Mikhail Khludnev

Re: Question for SynonymQuery

2022-12-28 Thread Mikhail Khludnev
in those cases, since to > support multi-term synonyms it needs to accept a list of Query, which would > make it behave like a BooleanQuery. Also how scoring works with multi-term > is another problem. > > Thanks & Regards! > -- Sincerely yours Mikhail Khludnev

Re: Question for SynonymQuery

2023-01-01 Thread Mikhail Khludnev
s I understand SynonymWeight > will > > > consider all terms as exactly the same while BooleanQuery will favor > the > > > documents with more matched terms. > > > - Is it worth it to support multi-term synonyms in SynonymQuery? My > > feeling > > > is that it's better to just use BooleanQuery in those cases, since to > > > support multi-term synonyms it needs to accept a list of Query, which > > would > > > make it behave like a BooleanQuery. Also how scoring works with > > multi-term > > > is another problem. > > > > > > Thanks & Regards! > > > > > > > > > - > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!

Re: Question for SynonymQuery

2023-01-02 Thread Mikhail Khludnev
-- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!

Re: Need help of example of Lucene use.

2023-01-04 Thread Mikhail Khludnev
; > Currently I am badly required of some examples of using TokenStream, > tokenAttributes, *Filter. > I need to replace the uses of "Token". > > Could somebody please help me in it? > > Regards > Rajib > > > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!

Re: Integrating NLP into Lucene Analysis Chain

2022-11-21 Thread Mikhail Khludnev
39 > ) at production scale and discovered really bad performance during certain > conditions which I attribute to this unnecessary synching. I suspect this > may have impacted others as well > https://stackoverflow.com/questions/42960569/indexing-taking-long-time-when-using-opennlp-lemmatizer-with-solr > > Many thanks, > > Luke Kot-Zaniewski > > > -- Sincerely yours Mikhail Khludnev

Re: What is the corresponding class for org.apache.lucene.codecs.memory.DirectDocValuesFormat in Lucene9

2023-01-30 Thread Mikhail Khludnev
ut the "DirectPostingFormat" is still in Lucene9. > > Could anyone help me to understand how to replace the DirectDocValueFormat > in Lucene9? > > Thanks > Regards > MyCoy > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!

Re: Question for SynonymQuery

2023-01-27 Thread Mikhail Khludnev
t turns out I used FlattenGraphFilter and cause the PositionLength to be > > all 1 and resulted in the behavior above =) > > > > A side note is that we don't need to use WORD_SEPARATOR in the synonym > > file. SynonymMap.Parser.analyze would tokenize and append the separat

Re: Question for SynonymQuery

2023-01-27 Thread Mikhail Khludnev
s. > > Regards, > Anh Dung Bui > > On Mon, Jan 2, 2023 at 8:07 Mikhail Khludnev wrote: > > > Hello Anh, > > I was intrigued by your question. And I managed it to work somehow. > > see > > > > > https://github.com/mkhludnev/likely/blob/eval-mulyw-syns/s

Re: Need help for conversion code from Lucene 2.4.0 to 8.11.2

2023-01-29 Thread Mikhail Khludnev
nce code " Fields fields = reader.fields();" in > your reference link. > > But, there is no "reader.fields()" in 8.11.2. > > Could you please suggest someway to extract all the Terms with an > IndexReader or some alternative ways? > > Regards > Rajib > > --

Re: Need help for conversion code from Lucene 2.4.0 to 8.11.2

2023-01-18 Thread Mikhail Khludnev
{ > //Some internal function to process the doc. > forEach.process(termDocs.doc()); > } > > } > > Regards > Rajib > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!

Re: Efficient sort on SortedDocValues

2022-11-07 Thread Mikhail Khludnev
ve dozens > of such fields in our index, thus there isn't any one field that can be > used to sort the index. So I guess my question if what I am trying to > achieve is possible? I tried to look though Solr codebase, but so far > couldn't come up with anything. Code example is here > https://pastebin.com/i05E2wZy . I am using 9.4.1. Thanks in advance. > > Andrei > > -- Sincerely yours Mikhail Khludnev

Re: Offset-Based Analysis

2023-02-21 Thread Mikhail Khludnev
s sense does a similar solution already exist? If > it doesn’t exist yet would it be something that would be of interest to the > community? > Any thoughts on this would be much appreciated. > > Thanks, > Luke -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!

Re: Offset-Based Analysis

2023-02-22 Thread Mikhail Khludnev
tokenization for those wishing to > tokenize > > outside of their search engine. > > > > Does this approach even make any sense or have any pitfalls I am failing > > to see? Assuming it makes sense does a similar solution already exist? If > > it doesn’t exist yet

Re: Lucene Hunpell Spell checker

2023-02-19 Thread Mikhail Khludnev
a bunch of the languages, just presented 2 examples. > > Feel free to propose any changes, comments fixes :) > > Thank's a lot in advance, > > Thanos > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!

Re: Highlighting query results, my method is too crude, but how to improve it?

2023-02-20 Thread Mikhail Khludnev
the highlighter.getBestTextFragments() > method marks all the occurrences of "note" and "extra" in the content too. > This we don't want. > > I can't see how to separate that part of the query out in the highlighter > methods, and I wonder what best practice would be here. I'm probably being > naive in using a single query for the whole job. Do I need to run a query > for category/volume, and then a subquery on text and title, and just use > the > subquery in the highlighter? If that's the approach, is there a nice simple > explanation somewhere you could point me to? Because I'm a simple user who > has never done anything beyond using the simple QueryParser for everything. > > > > cheers > > T > > > > > > > > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!

Re: Need help for conversion code from Lucene 2.4.0 to 8.11.2

2023-03-03 Thread Mikhail Khludnev
; sequence (aka input stream). This is especially important as Lucene > > index files use checksums since around Lucene 5. > > > > Uwe > > > > Am 06.02.2023 um 11:57 schrieb Saha, Rajib: > >> Hi Mikhail, > >> > >> Thanks for all you’re your suggesti

Re: Can an analyzer access other field's data during index time?

2023-04-24 Thread Mikhail Khludnev
for urgent or sensitive issues > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!

Re: Can an analyzer access other field's data during index time?

2023-04-24 Thread Mikhail Khludnev
f it's possible I may control which part of the text > shall be stored during the index process? In other words, is it possible to > strip the header when storing the text into the field? > > Best regards, > > Guan > > -Original Message- > From: Mikhail Khludnev >

Re: Can an analyzer access other field's data during index time?

2023-04-26 Thread Mikhail Khludnev
utshell, I will need two parts to make this work: > > 1. a custom tokenizer/filter; > 2. a custom field; > > Let me know if there is any caveat... > > And thank you so much for guiding me through! > > Guan > > -Original Message- > From: Mikhail Khlu

Re: Can an analyzer access other field's data during index time?

2023-04-25 Thread Mikhail Khludnev
However, all 3 lines would still be stored in the field if index=true and > stored=true... > > I wonder if I could only store line 2 and 3 in the field in such a > scenario? > > Many thanks, > > Guan > > -Original Message- > From: Mikhail Khludnev > Sent: Monda

Re: Run time error in IndexWriter.addDocument

2023-04-03 Thread Mikhail Khludnev
;stempel-*.jar"? > > Regards > Rajib > > -Original Message- > From: Mikhail Khludnev > Sent: 03 April 2023 14:05 > To: java-user@lucene.apache.org > Subject: Re: Run time error in IndexWriter.addDocument > > Hi > > It seems like some

Re: Run time error in IndexWriter.addDocument

2023-04-03 Thread Mikhail Khludnev
va:1757) > at > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1400) > > Regards > Rajib > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!

Re: Need help for conversion code from Lucene 2.4.0 to 8.11.2

2023-01-31 Thread Mikhail Khludnev
ndexWriter.optimize() > > Is there any similar concept in 8.11? If so, can you please help with APIs > org.apache.lucene.index.IndexWriter#addIndexes(org.apache.lucene.store.Directory...) But it kicks merge underneath. Should be fine. === > > Re

Re: Access child boolean query matched terms in parent custom wrapper query

2023-07-17 Thread Mikhail Khludnev
on those terms or > proceed with this document without affecting it boolean score. > > What is the best way to achieve this? > -- Sincerely yours Mikhail Khludnev

Re: retrieving search matches with their frequency and positions

2023-07-09 Thread Mikhail Khludnev
to get the matches in > a form of terms with properties like frequncy and positions. > How can achive this? > > Thanks in advance! > Ned > > -- Sincerely yours Mikhail Khludnev

Re: retrieving search matches with their frequency and positions

2023-07-10 Thread Mikhail Khludnev
by the analyzer or indexer. > > I've found the MatchesIterator interface and FilterMatchesIterator class > but was not able to use it. > > Thank you! > Ned > -- Sincerely yours Mikhail Khludnev

Re: retrieving search matches with their frequency and positions

2023-07-10 Thread Mikhail Khludnev
oreDocs[i].doc, > "fieldName", query);` method exposed. I'm using lucene core 8.11.2 and > currently I cannot upgrade to 9.0.0 or later. > > Any ideas? Which API version are you referring to? > > Thanks. > Ned > ____ > Von: Mikhail Khl

Re: What is the approximate processing mechanism for field length?

2023-08-10 Thread Mikhail Khludnev
;keywords" field has 78 > tokens. I think its field_length(dl) is 78, but lucene handled as > 76(approximate) as described in function explainTF(Explaination freq, long > norm). > Thank you very much for your reading and look forward to your > answer! > > > Koo > Drive development engineer -- Sincerely yours Mikhail Khludnev

Re: NumericRangeQuery in Lucene 5.5.5: replacing the deprecated setBoost while keeping the NumericRange type?

2023-11-25 Thread Mikhail Khludnev
e preserving the > NumericRangeQuery type? > BoostQuery doesn't allow this and I haven't found a way. > > Thanks for your help. > > Claude Lepère > -- Sincerely yours Mikhail Khludnev

Re: Regarding extracting Token as String from TokenStream.

2024-01-25 Thread Mikhail Khludnev
mation from my side. > > Thanks In Advance. > > Regards > Rajib > > -- Sincerely yours Mikhail Khludnev

Re: Need suggestion for a Lucene upgrade scenario

2024-01-30 Thread Mikhail Khludnev
ValueTermAttribute.toString(); > > //How to get startOffset & endOffset as like in Lucene 2.4 > > //Do some calculation based on startOffset & endOffset > } > > Please let me know, if there is any further information is required from > my side. > > Regards > Rajib > -- Sincerely yours Mikhail Khludnev

  1   2   >