Re: why the of advance(int target) function of DocIdSetIterator is defined with uncertain?

2012-04-17 Thread Mikhail Khludnev
orer: docIds: [1, 2, 6] > TermScorer: docIds: [2, 4] > after first call advance(5) > currentDoc=6 > only first scorer is now in the heap, scorerDocQueue.size()==1 > then call advance(6) > because scorerDocQueue.size() < minimumNrMatchers, it just return > NO_MORE_DOCS > > My question is why the advance(int target) method is defined like this? > for the reason of efficient or any other reasons? > > -- Sincerely yours Mikhail Khludnev ge...@yandex.ru <http://www.griddynamics.com>

Re: no concurrent merging?

2016-08-04 Thread Mikhail Khludnev
apache.lucene.index.IndexWriter.mergeInit(IndexWriter.java:3792) > > - locked <6d75db> (a org.apache.solr.update.SolrIndexWriter) > > at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3646) > > at > > > org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588) > > at > > > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626) > > > > > > - > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > -- Sincerely yours Mikhail Khludnev

Re: Searching in a bitMask

2016-08-27 Thread Mikhail Khludnev
; bitmask&0xf == 0xf ? > -- Sincerely yours Mikhail Khludnev

Re: Lucene 6.1: number of hits per document

2016-08-29 Thread Mikhail Khludnev
ment- > tp4293245p4293687.html > > > > > -- > View this message in context: http://lucene.472066.n3. > nabble.com/Lucene-6-1-number-of-hits-per-document-tp4293245p4293755.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev

Re: complex disjoint search query

2016-10-12 Thread Mikhail Khludnev
ry to achieve > such expectations? > > Regards, > Valentin > > > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev

Re: BlockJoinQuery with sorting

2016-11-26 Thread Mikhail Khludnev
--- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev

Re: BlockJoinQuery with sorting

2016-11-28 Thread Mikhail Khludnev
405p4307650.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > --------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev

Re: BlockJoin with RAM Directory

2016-11-29 Thread Mikhail Khludnev
iling list archive at Nabble.com. > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev

Re: Apply Lucene Query on Bits

2016-12-05 Thread Mikhail Khludnev
t; -- > Hendrik Saly (salyh, hendrikdev22) > @hendrikdev22 > PGP: 0x22D7F6EC > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev

Re: query parser of SpanNearQuery

2016-12-05 Thread Mikhail Khludnev
Hello, You can check ComplexPhrase and Surround query parsers. On Mon, Dec 5, 2016 at 8:12 AM, Yonghui Zhao wrote: > It seems lucene query parser doesn't support SpanNearQuery. > Is there any query parser supports SpanNearQuery? > -- Sincerely yours Mikhail Khludnev

Re: ComplexPhraseQueryParser with wildcards

2016-12-20 Thread Mikhail Khludnev
> IndexSearcher searcher = new IndexSearcher(reader); > > ComplexPhraseQueryParser parser = new ComplexPhraseQueryParser("field", > > new > > StandardAnalyzer()); > > TopDocs topDocs; > > > > Query queryOk = parser.parse("field: (john* peters)"); > > topDocs = searcher.search(queryOk, 2); > > System.out.println("found " + topDocs.totalHits + " docs"); > > > > Query queryFail = parser.parse("field: (\"john*\" \"peters\")"); > > topDocs = searcher.search(queryFail, 2); // -> throws the above > > mentioned exception > > System.out.println("found " + topDocs.totalHits + " docs"); > > > > } > > > > } > > > > - > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > -- Sincerely yours Mikhail Khludnev

Re: ComplexPhraseQueryParser with wildcards

2017-01-02 Thread Mikhail Khludnev
IndexSearcher(reader); > ComplexPhraseQueryParser parser = new ComplexPhraseQueryParser("field", > new > StandardAnalyzer()); > TopDocs topDocs; > > Query queryOk = parser.parse("field: (john* peters)"); > topDocs = searcher.search(queryOk, 2); > System.out.println("found " + topDocs.totalHits + " docs"); > > Query queryFail = parser.parse("field: (\"john*\" \"peters\")"); > topDocs = searcher.search(queryFail, 2); // -> throws the above > mentioned exception > System.out.println("found " + topDocs.totalHits + " docs"); > > } > > } > -- Sincerely yours Mikhail Khludnev

Re: Disabling Lucene Scoring/Ranking

2017-01-09 Thread Mikhail Khludnev
ching search criteria. > > So, Is there a way to completely disable scoring/ranking altogether? > OR Is there a better solution to it. > > Regards > Rajnish > -- Sincerely yours Mikhail Khludnev

Re: Analyzing Infix Suggestor Exact Match Boost

2017-03-08 Thread Mikhail Khludnev
how can i get back exact matches first?? > > Thanks. > -- Sincerely yours Mikhail Khludnev

Re: join in lucene

2017-03-16 Thread Mikhail Khludnev
s a different class > 2) by reflection if the getter is class ==entity Loader load document > with the key saved in parent object. > > > *second question:* > *this solution is essentially similar to how it works the query times or > not (so similar performance)?* > -- Sincerely yours Mikhail Khludnev

Re: Document serializable representation

2017-03-30 Thread Mikhail Khludnev
g: They > accept the document, decide which shard they should be located and transfer > the plain fieldname:value pairs over the network. Each node then creates > Lucene IndexableDocuments out of it and passes to their own IndexWriter. > > --- > Denis Bazhenov > > > > > > -- Sincerely yours Mikhail Khludnev

Re: Correction: SpanNearQuery Class issue through spans object (Not through Searcher.search() method)

2017-06-20 Thread Mikhail Khludnev
Please let me know about this > since I am using this for a critical project? > > Thanks, > Ranganath B. N. > > -- Sincerely yours Mikhail Khludnev

Re: What is the fastest way to loop over all documents in an index?

2017-09-05 Thread Mikhail Khludnev
way to loop over all documents in an index? > Is it looping over all possible doc id’s (+filtering out deleted > documents)? > > Thank you very much. > > Best regards > Claude > > -- Sincerely yours Mikhail Khludnev

Re: Tracking that all query terms are matched in one document

2017-12-05 Thread Mikhail Khludnev
gt; > > >> > > > Yes, I'm sure. Could you explain your proposal in more detail? > >> > > > > >> > > > Regards, > >> > > > Vadim Gindin > >> > > > > >> > > > On Mon, Dec 4, 2017 at 3:18 P

Re: Tracking that all query terms are matched in one document

2017-12-13 Thread Mikhail Khludnev
How to avoid this? > > Thanks, > Vadim Gindin > > On Fri, Dec 8, 2017 at 2:01 PM, Vadim Gindin wrote: > > > Thank's for your help. I'll try that. > > > > On Tue, Dec 5, 2017 at 4:18 PM, Mikhail Khludnev > wrote: > > > >> Vadim,

Re: Query in a doc context

2017-12-14 Thread Mikhail Khludnev
: what terms are matched to what fields and so on. > > > It seems, that BooleanQuery/BooleanScorer is not a good place to accumulate > some information from a child Queries/Scorers. > -- Sincerely yours Mikhail Khludnev

Re: Terminology. LeafReader -> TermEnum -> PostingsEnum

2017-12-14 Thread Mikhail Khludnev
ference > between these 20 implementations and which of them can be really useful? > > Regards, > Vadim Gindin > -- Sincerely yours Mikhail Khludnev

Re: Wrong ID in explain() method.

2017-12-29 Thread Mikhail Khludnev
ion. When explain(id) is called it checks specified id in this > > collection and outputs "matched"/"not matched". > > > > The questions. > > 0. This document is founded by the plugin, but explain(id) method takes > > the wrong ID. Why? It happens in the real installation, but in the test > > case - it works fine. > > 1. ID=342 and others come to explain(id) method. Note, it is not a > > document id - it is ID of the nested object (category). Why does it > happen? > > 2. I have a test case, based on ESIntegTestCase. It works fine with this > > document. But this document is not founded in the real index. > > > > Regards, > > Vadim Gindin > > > -- Sincerely yours Mikhail Khludnev

Re: Query in a doc context

2017-12-30 Thread Mikhail Khludnev
> > Apologies if I completely misundetstood but if you are looking to do > a > > > full > > > > doc match, you could duplicate duplicated the doc into another field > > that > > > > is a true full text index of the document. > > > > >

Re: Explain flag in CustomQuery

2018-06-25 Thread Mikhail Khludnev
ted that SearchContext will be propagated to a Query, but I didn't > found the way how to get. I only have LeafReaderContext or LeafReader. > Could you advice me? > > Regards, > Vadim Gindin > -- Sincerely yours Mikhail Khludnev

Re: How search code files for words which contains a given substrings?

2018-06-26 Thread Mikhail Khludnev
e I will get the 'a' positions in TokenStream. > Additional question how I can get the line numbers and the positions > inside the line. > Many thanks in advance for your help, > Ira > > -- Sincerely yours Mikhail Khludnev

Re: How search code files for words which contains a given substrings?

2018-06-26 Thread Mikhail Khludnev
lative to the previous Token in a TokenStream, > used in phrase searching. > I am not in phrase searching. > Would you mind to explain how it can help me? > > Thanks, > Ira > > -Original Message- > From: Mikhail Khludnev [mailto:m...@apache.org] > S

Re: Lucene API to retrieve matched words

2018-09-06 Thread Mikhail Khludnev
highlighting, just a list of the words. So if > I > search for 'ski' and I match on 'skier' and 'skiis', I would like to get > back a list that includes 'skier' and 'skiis'. > > Is there an API call that provides this? > > > > Thanks > > Mike > > -- Sincerely yours Mikhail Khludnev

Re: How to access DocValues inside a customized collector?

2018-09-21 Thread Mikhail Khludnev
ave a way to see directly indexed data (Luke seems obsolete, > Marple does not work with lucene 7.4.0 yet)? > > Thanks very much for helps, Lisheng > -- Sincerely yours Mikhail Khludnev

Re: Question About FST, multiple-column index

2018-09-21 Thread Mikhail Khludnev
there any > Combined Index structure like multiple-column indexes in mysql? I think is > there any solutions to extends to FST which make the FINAL state connect to > another FST? > > > THANKS -- Sincerely yours Mikhail Khludnev

Re: Camel case search with Lucene

2018-10-04 Thread Mikhail Khludnev
e, search "redHotChilly" > instead of "red hot chilly" - you should use own pattern tokenizer to > divide the query by regex pattern. > > Regards > Vadim Gindin > > On Thu, Oct 4, 2018 at 11:58 AM Gordin, Ira wrote: > > > Hi friends, > > > > How can I implement Camel case search with Lucene? > > > > Thanks, > > Ira > > > > > > > -- Sincerely yours Mikhail Khludnev

Re: How can I use FunctionScoreQuery to replace CustomScoreQuery?

2019-01-26 Thread Mikhail Khludnev
e query type, > but I'm stuck. > > > > > > > > -- > Sent from: > http://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev

Re: How can I use FunctionScoreQuery to replace CustomScoreQuery?

2019-01-29 Thread Mikhail Khludnev
- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev

Re: position-anchored queries

2019-03-21 Thread Mikhail Khludnev
are not any subsequent terms in the field? > > -Mike > -- Sincerely yours Mikhail Khludnev

Re: About custom score using Solr8/Lucene8

2019-05-08 Thread Mikhail Khludnev
example, at least to understand how to start a minimal basic > project? > > Thanks > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev

Re: block min-max values for Sort Field with Top-N query..

2019-07-02 Thread Mikhail Khludnev
amp; won't work for multi-sort field queries or out-of-order scoring etc.. > > But, in general will this be a good idea to explore or something that is > best not attempted? > > Any help is much appreciated > > -- > Ravi > -- Sincerely yours Mikhail Khludnev

Re: Index-time join ToParentBlockJoinQuery query produces incorrect result

2019-07-03 Thread Mikhail Khludnev
On Wed, Jul 3, 2019 at 6:11 PM ANDREI SOLODIN wrote: > > This returns "id3", which is unexpected. > > Please check ToPBJQ javadoc. It's absolutely expected. -- Sincerely yours Mikhail Khludnev

Re: Index-time join ToParentBlockJoinQuery query produces incorrect result

2019-07-05 Thread Mikhail Khludnev
lter must provide an BitSet > https://lucene.apache.org/core/8_1_1/core/org/apache/lucene/util/BitSet.html?is-external=true > per sub-reader."? If so, given the data above how do I properly create a > parent query? > > > > > > > > > > > > &g

Re: Adding and Removing Facet Entries

2019-08-28 Thread Mikhail Khludnev
I'm essentially looking for something similar to `add-distinct` > and `remove` from Solr's atomic updates functionality, just directly in > Lucene. > -- Sincerely yours Mikhail Khludnev

Re: Lucene one to many query

2019-09-21 Thread Mikhail Khludnev
> > > > > > > -- > Sent from: > https://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev

Re: ComplexPhraseQueryParser isn't switching search terms to lowercase with StandardAnalyzer

2019-10-22 Thread Mikhail Khludnev
"~2 > Type of query : ComplexPhraseQuery > > If I change teststr to "\"Foo Bar\"" > I get > Query : "Foo Bar" > Type of query : ComplexPhraseQuery > > If I change teststr to "Foo Bar" > I get > Query : content:foo content:bar > Type of query : BooleanQuery > > > In the first two cases I was expecting the search terms to be switched to > lowercase. > > Were the Foo and Bar left as originally specified because the terms are > inside double quotes? > > How can I specify a search term that I want treated as a Phrase, > but also have the query parser apply the LowerCaseFilter? > > I am hoping to avoid the need to handle this using PhraseQuery, > and continue to use the QueryParser. > > > Thanks in advance for any help you can give me, > David Shifflett > > -- Sincerely yours Mikhail Khludnev

Re: [External] Re: ComplexPhraseQueryParser isn't switching search terms to lowercase with StandardAnalyzer

2019-10-22 Thread Mikhail Khludnev
ing used contains upper or > lower case J and S (in you John Smith case) > > Apologizes on the 'content:foo'. > I changed the code snippet to "somefield", and missed changing that part > of the output > > David Shifflett > > > On 10/22/19, 5:51 AM, "

Re: [External] Re: ComplexPhraseQueryParser isn't switching search terms to lowercase with StandardAnalyzer

2019-10-22 Thread Mikhail Khludnev
s my conditions: > 1) Uses a StandardAnalyzer > 2) Does the actual query.toString() return lowercase J and S > > David Shifflett > > > On 10/22/19, 10:44 AM, "Mikhail Khludnev" wrote: > > On Tue, Oct 22, 2019 at 5:26 PM Shifflett, David [USA] <

Re: How can i specify a custom Analyzer for a Field of Document?

2019-12-09 Thread Mikhail Khludnev
> I have a document set, most fields to index is only text type, suited for a > StandAnalyzer or a SmartChineseAnalyzer. But the problem is, i have a > special field which is a KeywordList type, like "A;B;C", which i hope i can > fully control the analyzing step. > > How to do this in Lucene? > -- Sincerely yours Mikhail Khludnev

Re: Why Lucene's Suggest API can ONLY load field terms which is Store.YES?

2019-12-27 Thread Mikhail Khludnev
ll){String term > = byteRef.utf8ToString();terms.add(term);} > } catch (IOException e) {e.printStackTrace(); > log.error(e.getMessage(), e);}* > > To my supprise, terms seems only returning the STORED value, which is the > original value form, but i expect they should be the terms i put in each > StringField! > > Is this a design miss or impl. limit? > -- Sincerely yours Mikhail Khludnev

Re: Why Lucene's Suggest API can ONLY load field terms which is Store.YES?

2019-12-27 Thread Mikhail Khludnev
ese doubts. I like to quote this talk https://www.youtube.com/watch?v=T5RmMNDR5XI > > Mikhail Khludnev 于2019年12月27日周五 下午5:05写道: > > > Hello, > > It's by design: StringFields are searchable and filled by analysis > output, > > StoredFields are returned input value

Re: Question abount combining InvertedIndex and SortField

2019-12-31 Thread Mikhail Khludnev
o reduce memory footprint by storing only top candidate results in a binary heap. IIRC it's described in this classic paper http://www.savar.se/media/1181/space_optimizations_for_total_ranking.pdf -- Sincerely yours Mikhail Khludnev

Re: Needs advice on auto-keyword-correction mode custom query

2020-01-06 Thread Mikhail Khludnev
, How can Lucene's Query API become high-order composable? Lucene's > "LeafContext" concept is really very confusing me... > -- Sincerely yours Mikhail Khludnev

Re: Question about PhraseQuery's capacity...

2020-01-10 Thread Mikhail Khludnev
> > > > I use SmartChineseAnalyzer to do the indexing, and add a document with > a > > > TextField whose value is a long sentence, when anaylized, will get 18 > > > terms. > > > > > > & then i use the same value to construct a PhraseQuery, setting slop to > > 2, > > > and adding the 18 terms concequently... > > > > > > I expect the search api to find this document, but it returns empty. > > > > > > Where am i wrong? > > > > > > > > > -- > > Adrien > > > -- Sincerely yours Mikhail Khludnev

Re: Can Lucene be used as Rules Engine?

2020-01-22 Thread Mikhail Khludnev
27;t use fixed number of Fields to > query on. Even if there are fixed number of fields, the query has to check > for each field to match at least one word. > > Is it possible to handle this requirement using Lucene? or should I go for > other options? > > I am new to Lucene, any help would be appreciated. > > > > Thanks, > > Kart > > -- Sincerely yours Mikhail Khludnev

Re: ComplexPhraseQueryParser performance question

2020-02-04 Thread Mikhail Khludnev
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > > > > --------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev

Re: ComplexPhraseQueryParser performance question

2020-02-13 Thread Mikhail Khludnev
t; There are no one. > Best regards > > On 2/4/20 11:14 AM, baris.ka...@oracle.com wrote: > > > > Thanks but i thought this class would have a mechanism to fix this issue. > > Thanks > > > >> On Feb 4, 2020, at 4:14 AM, Mikhail Khludnev wrote: > >> &g

Re: How to tell Lucene index search to stop when it takes too long

2020-02-24 Thread Mikhail Khludnev
gt; Is there such an api or plan to implement one? > > > Best regards > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev

Re: How to tell Lucene index search to stop when it takes too long

2020-02-27 Thread Mikhail Khludnev
But i cant specify Top n docs > here, right? > > > The collector is defined here > > > https://lucene.apache.org/core/8_4_1/core/org/apache/lucene/search/Collector.html > > > https://lucene.apache.org/core/8_4_1/core/org/apache/lucene/search/TopDocsCollector.html &

Re: Autocompletion based on one field in index

2020-03-03 Thread Mikhail Khludnev
to achieve > this? > > > Regards > Kumaran R > -- Sincerely yours Mikhail Khludnev

Re: Retrieving query-time join fromQuery hits

2020-06-03 Thread Mikhail Khludnev
i > > [1] > > https://lucene.apache.org/core/8_5_1/join/org/apache/lucene/search/join/JoinUtil.html > [2] > > https://lucene.472066.n3.nabble.com/access-to-joined-documents-td4412376.html > [3] https://issues.apache.org/jira/browse/LUCENE-3602 > -- Sincerely yours Mikhail Khludnev

Re: Retrieving query-time join fromQuery hits

2020-06-08 Thread Mikhail Khludnev
nks, > Stefan Onofrei > > On Wed, Jun 3, 2020 at 9:59 PM Mikhail Khludnev wrote: > > > Hi, Stefan. > > Have you considered faceting/aggregation over `from` field? > > > > On Tue, May 12, 2020 at 7:23 PM Stefan Onofrei > > wrote: > > > > >

Re: About custom score using Solr8/Lucene8

2020-07-02 Thread Mikhail Khludnev
--- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > -- > Vincenzo D'Amore > -- Sincerely yours Mikhail Khludnev

Re: Question about Benchmark

2022-05-16 Thread Mikhail Khludnev
xisting index for search? Also, is there a way to configure the > benchmark to use multiple threads for indexing (looks to me that it’s a > single-threaded indexing)? > > --Regards, > Balmukund > -- Sincerely yours Mikhail Khludnev

Re: Lucene Disable scoring

2022-07-11 Thread Mikhail Khludnev
verhead of function calls can cause delay. > As a result I'm looking for a trick to ignore the function call and have > all no scoring on my whole query > > Is it possible to ignore this step? > > thanks a million > -- Sincerely yours Mikhail Khludnev

Re: Filter and FilteredQuery replacements

2022-07-11 Thread Mikhail Khludnev
gt; instances representing all of the Lucene Doc IDs in the index, with > the bits turned on for those documents we want to be included in search > results. > > If this has already been answered in a forum post, I apologize. Or if > there's a Lucene specific forum somewhere I could look at, if you could > kindly point me there, I would appreciate it. > > Any help/insight is greatly appreciated. > > Thanks, > Scott Robey > -- Sincerely yours Mikhail Khludnev

Re: Unclear on what position means

2022-07-21 Thread Mikhail Khludnev
ment, outside of > Lucene? > > Kendall > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev

Re: Lucene Suggester APIs question

2022-08-14 Thread Mikhail Khludnev
question about lucene suggester APIs. If I build multiple FSTs > using a suggester, is there a way to merge two generated FSTs? > > -- > > Nitish Jain > -- Sincerely yours Mikhail Khludnev

Re: Help to understand the per-field formats in Lucene

2022-10-25 Thread Mikhail Khludnev
example, I've studied the "KnnVectors" a little. > The "PerFieldKnnVectorsFormat.FieldsWriter" acutally uses the > "Lucene94HnswVectorsFormat". > But why do we have this kind of structures? > > Thanks & Regards > > MyCoy > -- Sincerely yours Mikhail Khludnev

Re: Multi-segments and HNSW

2022-11-02 Thread Mikhail Khludnev
> real impact on the retrieving quality and performance. > > I'm wondering if there is any best practice, e.g. how many docs should be > in a single graph? > Or does anyone have some production experience to share? > > Thanks & Regards > MyCoy > -- Sincerely yours Mikhail Khludnev

Re: Efficient sort on SortedDocValues

2022-11-07 Thread Mikhail Khludnev
We may have dozens > of such fields in our index, thus there isn't any one field that can be > used to sort the index. So I guess my question if what I am trying to > achieve is possible? I tried to look though Solr codebase, but so far > couldn't come up with anything. Code example is here > https://pastebin.com/i05E2wZy . I am using 9.4.1. Thanks in advance. > > Andrei > > -- Sincerely yours Mikhail Khludnev

Re: Integrating NLP into Lucene Analysis Chain

2022-11-21 Thread Mikhail Khludnev
tectorOp.java#L39 > ) at production scale and discovered really bad performance during certain > conditions which I attribute to this unnecessary synching. I suspect this > may have impacted others as well > https://stackoverflow.com/questions/42960569/indexing-taking-long-time-when-using-opennlp-lemmatizer-with-solr > > Many thanks, > > Luke Kot-Zaniewski > > > -- Sincerely yours Mikhail Khludnev

Re: Question for SynonymQuery

2022-12-28 Thread Mikhail Khludnev
se BooleanQuery in those cases, since to > support multi-term synonyms it needs to accept a list of Query, which would > make it behave like a BooleanQuery. Also how scoring works with multi-term > is another problem. > > Thanks & Regards! > -- Sincerely yours Mikhail Khludnev

Re: Question for SynonymQuery

2023-01-01 Thread Mikhail Khludnev
are computed? As I understand SynonymWeight > will > > > consider all terms as exactly the same while BooleanQuery will favor > the > > > documents with more matched terms. > > > - Is it worth it to support multi-term synonyms in SynonymQuery? My > > feeling > > > is that it's better to just use BooleanQuery in those cases, since to > > > support multi-term synonyms it needs to accept a list of Query, which > > would > > > make it behave like a BooleanQuery. Also how scoring works with > > multi-term > > > is another problem. > > > > > > Thanks & Regards! > > > > > > > > > - > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!

Re: Question for SynonymQuery

2023-01-02 Thread Mikhail Khludnev
------ > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!

Re: Need help of example of Lucene use.

2023-01-04 Thread Mikhail Khludnev
; > Currently I am badly required of some examples of using TokenStream, > tokenAttributes, *Filter. > I need to replace the uses of "Token". > > Could somebody please help me in it? > > Regards > Rajib > > > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!

Re: Need help for conversion code from Lucene 2.4.0 to 8.11.2

2023-01-18 Thread Mikhail Khludnev
{ > //Some internal function to process the doc. > forEach.process(termDocs.doc()); > } > > } > > Regards > Rajib > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!

Re: Question for SynonymQuery

2023-01-27 Thread Mikhail Khludnev
urns out I used FlattenGraphFilter and cause the PositionLength to be > > all 1 and resulted in the behavior above =) > > > > A side note is that we don't need to use WORD_SEPARATOR in the synonym > > file. SynonymMap.Parser.analyze would tokenize and append the separato

Re: Question for SynonymQuery

2023-01-27 Thread Mikhail Khludnev
gt; us. > > Regards, > Anh Dung Bui > > On Mon, Jan 2, 2023 at 8:07 Mikhail Khludnev wrote: > > > Hello Anh, > > I was intrigued by your question. And I managed it to work somehow. > > see > > > > > https://github.com/mkhludnev/likely/blob/eval-mulyw-s

Re: Need help for conversion code from Lucene 2.4.0 to 8.11.2

2023-01-29 Thread Mikhail Khludnev
nce code " Fields fields = reader.fields();" in > your reference link. > > But, there is no "reader.fields()" in 8.11.2. > > Could you please suggest someway to extract all the Terms with an > IndexReader or some alternative ways? > > Regards > Rajib > > --

Re: What is the corresponding class for org.apache.lucene.codecs.memory.DirectDocValuesFormat in Lucene9

2023-01-30 Thread Mikhail Khludnev
ne9. > > But the "DirectPostingFormat" is still in Lucene9. > > Could anyone help me to understand how to replace the DirectDocValueFormat > in Lucene9? > > Thanks > Regards > MyCoy > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!

Re: Need help for conversion code from Lucene 2.4.0 to 8.11.2

2023-01-31 Thread Mikhail Khludnev
ndexWriter.optimize() > > Is there any similar concept in 8.11? If so, can you please help with APIs > org.apache.lucene.index.IndexWriter#addIndexes(org.apache.lucene.store.Directory...) But it kicks merge underneath. Should be fine. === > > Regard

Re: Lucene Hunpell Spell checker

2023-02-19 Thread Mikhail Khludnev
t; happens for a bunch of the languages, just presented 2 examples. > > Feel free to propose any changes, comments fixes :) > > Thank's a lot in advance, > > Thanos > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!

Re: Highlighting query results, my method is too crude, but how to improve it?

2023-02-20 Thread Mikhail Khludnev
gory/volume, but unfortunately the highlighter.getBestTextFragments() > method marks all the occurrences of "note" and "extra" in the content too. > This we don't want. > > I can't see how to separate that part of the query out in the highlighter > methods, and I wonder what best practice would be here. I'm probably being > naive in using a single query for the whole job. Do I need to run a query > for category/volume, and then a subquery on text and title, and just use > the > subquery in the highlighter? If that's the approach, is there a nice simple > explanation somewhere you could point me to? Because I'm a simple user who > has never done anything beyond using the simple QueryParser for everything. > > > > cheers > > T > > > > > > > > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!

Re: Offset-Based Analysis

2023-02-21 Thread Mikhail Khludnev
sense does a similar solution already exist? If > it doesn’t exist yet would it be something that would be of interest to the > community? > Any thoughts on this would be much appreciated. > > Thanks, > Luke -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!

Re: Offset-Based Analysis

2023-02-22 Thread Mikhail Khludnev
en offset based tokenization for those wishing to > tokenize > > outside of their search engine. > > > > Does this approach even make any sense or have any pitfalls I am failing > > to see? Assuming it makes sense does a similar solution already exist? If > > it do

Re: Need help for conversion code from Lucene 2.4.0 to 8.11.2

2023-03-03 Thread Mikhail Khludnev
ocked(FSDirectory fsdir) > >>> IndexReader.unlock(Directory directory) > >>> > >>> In 8.11, are IndexReader and IndexWritter synchronized enough > internally > >>> for not using the APIs? > >>> > >> org.apache.lucene.store.BaseDirectory#

Re: Run time error in IndexWriter.addDocument

2023-04-03 Thread Mikhail Khludnev
va:1757) > at > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1400) > > Regards > Rajib > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!

Re: Run time error in IndexWriter.addDocument

2023-04-03 Thread Mikhail Khludnev
"stempel-*.jar"? > > Regards > Rajib > > -Original Message- > From: Mikhail Khludnev > Sent: 03 April 2023 14:05 > To: java-user@lucene.apache.org > Subject: Re: Run time error in IndexWriter.addDocument > > Hi > > It seems like some

Re: Can an analyzer access other field's data during index time?

2023-04-24 Thread Mikhail Khludnev
ld not > be used for urgent or sensitive issues > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!

Re: Can an analyzer access other field's data during index time?

2023-04-24 Thread Mikhail Khludnev
27;s possible I may control which part of the text > shall be stored during the index process? In other words, is it possible to > strip the header when storing the text into the field? > > Best regards, > > Guan > > -Original Message- > From: Mikhail Khludnev

Re: Can an analyzer access other field's data during index time?

2023-04-25 Thread Mikhail Khludnev
t; However, all 3 lines would still be stored in the field if index=true and > stored=true... > > I wonder if I could only store line 2 and 3 in the field in such a > scenario? > > Many thanks, > > Guan > > -Original Message- > From: Mikhail Khludnev > Sent

Re: Can an analyzer access other field's data during index time?

2023-04-26 Thread Mikhail Khludnev
lue() method. > > In a nutshell, I will need two parts to make this work: > > 1. a custom tokenizer/filter; > 2. a custom field; > > Let me know if there is any caveat... > > And thank you so much for guiding me through! > > Guan > > -Original Messag

Re: retrieving search matches with their frequency and positions

2023-07-09 Thread Mikhail Khludnev
other words, I'd like to get the matches in > a form of terms with properties like frequncy and positions. > How can achive this? > > Thanks in advance! > Ned > > -- Sincerely yours Mikhail Khludnev

Re: retrieving search matches with their frequency and positions

2023-07-10 Thread Mikhail Khludnev
d by the analyzer or indexer. > > I've found the MatchesIterator interface and FilterMatchesIterator class > but was not able to use it. > > Thank you! > Ned > -- Sincerely yours Mikhail Khludnev

Re: retrieving search matches with their frequency and positions

2023-07-10 Thread Mikhail Khludnev
cs.scoreDocs[i].doc, > "fieldName", query);` method exposed. I'm using lucene core 8.11.2 and > currently I cannot upgrade to 9.0.0 or later. > > Any ideas? Which API version are you referring to? > > Thanks. > Ned > ____ > Von: M

Re: Access child boolean query matched terms in parent custom wrapper query

2023-07-17 Thread Mikhail Khludnev
statistics on those terms or > proceed with this document without affecting it boolean score. > > What is the best way to achieve this? > -- Sincerely yours Mikhail Khludnev

Re: What is the approximate processing mechanism for field length?

2023-08-10 Thread Mikhail Khludnev
ple, "keywords" field has 78 > tokens. I think its field_length(dl) is 78, but lucene handled as > 76(approximate) as described in function explainTF(Explaination freq, long > norm). >    Thank you very much for your reading and look forward to your > answer! > > > Koo  > Drive development engineer -- Sincerely yours Mikhail Khludnev

Re: Reindexing leaving behind 0 live doc segments

2023-08-28 Thread Mikhail Khludnev
rld = iw.getPooledInstance(sci, true); > segmentReader = rld.getReader(IOContext.READ); > > //process all live docs similar to above using the segmentReader. > > rld.release(segmentReader); > iw.release(rld); > }finally{ >if (iwRef != null) { >iwRef.decref(); > } > } > > Help would be much appreciated! > > Thanks, > Rahul > -- Sincerely yours Mikhail Khludnev

Re: Vector Search with OpenAI Embeddings: Lucene Is All You Need

2023-09-01 Thread Mikhail Khludnev
hael Wechner < >> michael.wech...@wyona.com> wrote: >> >>> Hi Together >>> >>> You might be interesed in this paper / article >>> >>> https://arxiv.org/abs/2308.14963 >>> >>> Thanks >>> >>> Michael >>> >>> - >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: dev-h...@lucene.apache.org >>> >>> -- Sincerely yours Mikhail Khludnev

Re: How to retain % sign next to number during tokenization

2023-09-20 Thread Mikhail Khludnev
gt; > On the implementation front, I am using a set of filters like > lowerCaseFilter, EnglishPossessiveFilter etc in addition to base tokenizer > StandardTokenizer. > > Per my analysis, StandardTOkenizer strips off the % sign and hence the > behavior.Has someone faced similar requirement? Any help/guidance is highly > appreciated. > -- Sincerely yours Mikhail Khludnev

Re: How to retain % sign next to number during tokenization

2023-09-21 Thread Mikhail Khludnev
nizing of special chars like - etc > > > On Wed, Sep 20, 2023 at 16:39 Mikhail Khludnev wrote: > > > Hello, > > Check the whitespace tokenizer. > > > > On Wed, Sep 20, 2023 at 7:46 PM Amitesh Kumar > > wrote: > > > > > Hi, > > &g

Re: How to get terms of a particular field of a particular document

2023-11-12 Thread Mikhail Khludnev
gt; with the code above? > I can do this, but want to make sure, that I don’t update it in a wrong > way. > > > > -- Sincerely yours Mikhail Khludnev

Re: How to get terms of a particular field of a particular document

2023-11-12 Thread Mikhail Khludnev
terms of a document field independent of a query? > > Can you maybe give a code example? > > Thanks > > Michael > > > > Am 12.11.23 um 18:46 schrieb Mikhail Khludnev: > > Hello, > > This is what highlighters do. There are two options: > > - index

Re: Stored field already compressed

2023-11-14 Thread Mikhail Khludnev
essing it again. > This seems wasteful. Is there a solution to this? Or would I have to > implement my own Codec or some such? I started digging down that route and > it doesn’t look pretty. 😊 > > > > Tony > > -- Sincerely yours Mikhail Khludnev

  1   2   >