?
--
Sincerely yours
Mikhail Khludnev
ge...@yandex.ru
http://www.griddynamics.com
mkhlud...@griddynamics.com
ene.index.IndexWriter.mergeInit(IndexWriter.java:3792)
> > - locked <6d75db> (a org.apache.solr.update.SolrIndexWriter)
> > at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3646)
> > at
> >
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588)
> > at
> >
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626)
> >
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
>
--
Sincerely yours
Mikhail Khludnev
mask&0xf == 0xf ?
>
--
Sincerely yours
Mikhail Khludnev
gt; Quoted from:
> http://lucene.472066.n3.nabble.com/Lucene-6-1-number-of-hits-per-document-
> tp4293245p4293687.html
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Lucene-6-1-number-of-hits-per-document-tp4293245p4293755.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
--
Sincerely yours
Mikhail Khludnev
07405p4307650.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> -----
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
--
Sincerely yours
Mikhail Khludnev
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
--
Sincerely yours
Mikhail Khludnev
a query to achieve
> such expectations?
>
> Regards,
> Valentin
>
>
>
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
--
Sincerely yours
Mikhail Khludnev
> nabble.com/BlockJoin-with-RAM-Directory-tp4307818.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
--
Sincerely yours
Mikhail Khludnev
ery can return any documents matching search criteria.
>
> So, Is there a way to completely disable scoring/ranking altogether?
> OR Is there a better solution to it.
>
> Regards
> Rajnish
>
--
Sincerely yours
Mikhail Khludnev
Thx
> Hendrik
>
>
> --
> Hendrik Saly (salyh, hendrikdev22)
> @hendrikdev22
> PGP: 0x22D7F6EC
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
--
Sincerely yours
Mikhail Khludnev
Hello,
You can check ComplexPhrase and Surround query parsers.
On Mon, Dec 5, 2016 at 8:12 AM, Yonghui Zhao <zhaoyong...@gmail.com> wrote:
> It seems lucene query parser doesn't support SpanNearQuery.
> Is there any query parser supports SpanNearQuery?
>
--
Sincerely yours
Mikhail Khludnev
searcher = new IndexSearcher(reader);
> ComplexPhraseQueryParser parser = new ComplexPhraseQueryParser("field",
> new
> StandardAnalyzer());
> TopDocs topDocs;
>
> Query queryOk = parser.parse("field: (john* peters)");
> topDocs = searcher.search(queryOk, 2);
> System.out.println("found " + topDocs.totalHits + " docs");
>
> Query queryFail = parser.parse("field: (\"john*\" \"peters\")");
> topDocs = searcher.search(queryFail, 2); // -> throws the above
> mentioned exception
> System.out.println("found " + topDocs.totalHits + " docs");
>
> }
>
> }
>
--
Sincerely yours
Mikhail Khludnev
> writer.commit();
> > writer.close();
> >
> > IndexReader reader = DirectoryReader.open(directory);
> > IndexSearcher searcher = new IndexSearcher(reader);
> > ComplexPhraseQueryParser parser = new ComplexPhraseQueryParser("field",
> > new
> > StandardAnalyzer());
> > TopDocs topDocs;
> >
> > Query queryOk = parser.parse("field: (john* peters)");
> > topDocs = searcher.search(queryOk, 2);
> > System.out.println("found " + topDocs.totalHits + " docs");
> >
> > Query queryFail = parser.parse("field: (\"john*\" \"peters\")");
> > topDocs = searcher.search(queryFail, 2); // -> throws the above
> > mentioned exception
> > System.out.println("found " + topDocs.totalHits + " docs");
> >
> > }
> >
> > }
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
>
--
Sincerely yours
Mikhail Khludnev
r own. This is what Elasticsearch or Solr are doing: They
> accept the document, decide which shard they should be located and transfer
> the plain fieldname:value pairs over the network. Each node then creates
> Lucene IndexableDocuments out of it and passes to their own IndexWriter.
>
>
s a different class
> 2) by reflection if the getter is class ==entity Loader load document
> with the key saved in parent object.
>
>
> *second question:*
> *this solution is essentially similar to how it works the query times or
> not (so similar performance)?*
>
--
Sincerely yours
Mikhail Khludnev
urned
> first. So how can i get back exact matches first??
>
> Thanks.
>
--
Sincerely yours
Mikhail Khludnev
g Apache lucene 6.5.0 version. Please let me know about this
> since I am using this for a critical project?
>
> Thanks,
> Ranganath B. N.
>
>
--
Sincerely yours
Mikhail Khludnev
stest way to loop over all documents in an index?
> Is it looping over all possible doc id’s (+filtering out deleted
> documents)?
>
> Thank you very much.
>
> Best regards
> Claude
>
>
--
Sincerely yours
Mikhail Khludnev
egards,
> >> > > > Vadim Gindin
> >> > > >
> >> > > > On Mon, Dec 4, 2017 at 3:18 PM, Michael Sokolov <
> msoko...@gmail.com
> >> >
> >> > > > wrote:
> >> > > >
> >> > > > > You could combine a Boolean and query with the same terms, as an
> >> > > optional
> >> > > > > clause. Are you sure about the requirement to multiply the score
> >> in
> >> > > that
> >> > > > > case?
> >> > > > >
> >> > > > > On Dec 4, 2017 5:13 AM, "Vadim Gindin" <vgin...@detectum.com>
> >> wrote:
> >> > > > >
> >> > > > > > Hi all.
> >> > > > > >
> >> > > > > > I need to track that all query terms are matched in one
> >> document.
> >> > > When
> >> > > > > all
> >> > > > > > terms are matched I need to multiply the score of such
> document
> >> to
> >> > > some
> >> > > > > > constant coefficient.
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>
--
Sincerely yours
Mikhail Khludnev
ality?
> How to avoid this?
>
> Thanks,
> Vadim Gindin
>
> On Fri, Dec 8, 2017 at 2:01 PM, Vadim Gindin <vgin...@detectum.com> wrote:
>
> > Thank's for your help. I'll try that.
> >
> > On Tue, Dec 5, 2017 at 4:18 PM, Mikhail Khludnev <m...@apache.org>
ding
> information: what terms are matched to what fields and so on.
>
>
> It seems, that BooleanQuery/BooleanScorer is not a good place to accumulate
> some information from a child Queries/Scorers.
>
--
Sincerely yours
Mikhail Khludnev
ncipal difference
> between these 20 implementations and which of them can be really useful?
>
> Regards,
> Vadim Gindin
>
--
Sincerely yours
Mikhail Khludnev
t SearchContext will be propagated to a Query, but I didn't
> found the way how to get. I only have LeafReaderContext or LeafReader.
> Could you advice me?
>
> Regards,
> Vadim Gindin
>
--
Sincerely yours
Mikhail Khludnev
numbers and the positions
> inside the line.
> Many thanks in advance for your help,
> Ira
>
>
--
Sincerely yours
Mikhail Khludnev
the previous Token in a TokenStream,
> used in phrase searching.
> I am not in phrase searching.
> Would you mind to explain how it can help me?
>
> Thanks,
> Ira
>
> -Original Message-
> From: Mikhail Khludnev [mailto:m...@apache.org]
> Sent: Tuesd
.@donaq.com>
> > > wrote:
> > >
> > > > Apologies if I completely misundetstood but if you are looking to do
> a
> > > full
> > > > doc match, you could duplicate duplicated the doc into another field
> > that
> > > > is a true full text i
the document id (that was matched in scorer) to a
> > collection. When explain(id) is called it checks specified id in this
> > collection and outputs "matched"/"not matched".
> >
> > The questions.
> > 0. This document is founded by the plugin, but explain(
or highlighting, just a list of the words. So if
> I
> search for 'ski' and I match on 'skier' and 'skiis', I would like to get
> back a list that includes 'skier' and 'skiis'.
>
> Is there an API call that provides this?
>
>
>
> Thanks
>
> Mike
>
>
--
Sincerely yours
Mikhail Khludnev
e, search "redHotChilly"
> instead of "red hot chilly" - you should use own pattern tokenizer to
> divide the query by regex pattern.
>
> Regards
> Vadim Gindin
>
> On Thu, Oct 4, 2018 at 11:58 AM Gordin, Ira wrote:
>
> > Hi friends,
> >
> > How can I implement Camel case search with Lucene?
> >
> > Thanks,
> > Ira
> >
> >
> >
>
--
Sincerely yours
Mikhail Khludnev
way to see directly indexed data (Luke seems obsolete,
> Marple does not work with lucene 7.4.0 yet)?
>
> Thanks very much for helps, Lisheng
>
--
Sincerely yours
Mikhail Khludnev
e any
> Combined Index structure like multiple-column indexes in mysql? I think is
> there any solutions to extends to FST which make the FINAL state connect to
> another FST?
>
>
> THANKS
--
Sincerely yours
Mikhail Khludnev
pe,
> but I'm stuck.
>
>
>
>
>
>
>
> --
> Sent from:
> http://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
--
Sincerely yours
Mikhail Khludnev
any subsequent terms in the field?
>
> -Mike
>
--
Sincerely yours
Mikhail Khludnev
ubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
--
Sincerely yours
Mikhail Khludnev
example, at least to understand how to start a minimal basic
> project?
>
> Thanks
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
--
Sincerely yours
Mikhail Khludnev
On Wed, Jul 3, 2019 at 6:11 PM ANDREI SOLODIN wrote:
>
> This returns "id3", which is unexpected.
>
> Please check ToPBJQ javadoc. It's absolutely expected.
--
Sincerely yours
Mikhail Khludnev
; & won't work for multi-sort field queries or out-of-order scoring etc..
>
> But, in general will this be a good idea to explore or something that is
> best not attempted?
>
> Any help is much appreciated
>
> --
> Ravi
>
--
Sincerely yours
Mikhail Khludnev
ssentially looking for something similar to `add-distinct`
> and `remove` from Solr's atomic updates functionality, just directly in
> Lucene.
>
--
Sincerely yours
Mikhail Khludnev
>
>
>
>
>
>
> --
> Sent from:
> https://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
--
Sincerely yours
Mikhail Khludnev
provide an BitSet
> https://lucene.apache.org/core/8_1_1/core/org/apache/lucene/util/BitSet.html?is-external=true
> per sub-reader."? If so, given the data above how do I properly create a
> parent query?
> > > >
> > > >
> > > > > > On
"~2
> Type of query : ComplexPhraseQuery
>
> If I change teststr to "\"Foo Bar\""
> I get
> Query : "Foo Bar"
> Type of query : ComplexPhraseQuery
>
> If I change teststr to "Foo Bar"
> I get
> Query : content:foo content:bar
> Type of query : BooleanQuery
>
>
> In the first two cases I was expecting the search terms to be switched to
> lowercase.
>
> Were the Foo and Bar left as originally specified because the terms are
> inside double quotes?
>
> How can I specify a search term that I want treated as a Phrase,
> but also have the query parser apply the LowerCaseFilter?
>
> I am hoping to avoid the need to handle this using PhraseQuery,
> and continue to use the QueryParser.
>
>
> Thanks in advance for any help you can give me,
> David Shifflett
>
>
--
Sincerely yours
Mikhail Khludnev
sed contains upper or
> lower case J and S (in you John Smith case)
>
> Apologizes on the 'content:foo'.
> I changed the code snippet to "somefield", and missed changing that part
> of the output
>
> David Shifflett
>
>
> On 10/22/19, 5:51 AM, "Mikhail Khludn
s my conditions:
> 1) Uses a StandardAnalyzer
> 2) Does the actual query.toString() return lowercase J and S
>
> David Shifflett
>
>
> On 10/22/19, 10:44 AM, "Mikhail Khludnev" wrote:
>
> On Tue, Oct 22, 2019 at 5:26 PM Shifflett, David [USA] <
gt;
> I have a document set, most fields to index is only text type, suited for a
> StandAnalyzer or a SmartChineseAnalyzer. But the problem is, i have a
> special field which is a KeywordList type, like "A;B;C", which i hope i can
> fully control the analyzing step.
>
> How to do this in Lucene?
>
--
Sincerely yours
Mikhail Khludnev
ese doubts. I like to quote
this talk https://www.youtube.com/watch?v=T5RmMNDR5XI
>
> Mikhail Khludnev 于2019年12月27日周五 下午5:05写道:
>
> > Hello,
> > It's by design: StringFields are searchable and filled by analysis
> output,
> > StoredFields are returned input values.
&g
here are no one.
> Best regards
>
> On 2/4/20 11:14 AM, baris.ka...@oracle.com wrote:
> >
> > Thanks but i thought this class would have a mechanism to fix this issue.
> > Thanks
> >
> >> On Feb 4, 2020, at 4:14 AM, Mikhail Khludnev wrote:
> >>
>
y to achieve
> this?
>
>
> Regards
> Kumaran R
>
--
Sincerely yours
Mikhail Khludnev
But i cant specify Top n docs
> here, right?
>
>
> The collector is defined here
>
>
> https://lucene.apache.org/core/8_4_1/core/org/apache/lucene/search/Collector.html
>
>
> https://lucene.apache.org/core/8_4_1/core/org/apache/lucene/search/TopDocsCollector.html
&
;
> Is there such an api or plan to implement one?
>
>
> Best regards
>
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
--
Sincerely yours
Mikhail Khludnev
ed number of Fields to
> query on. Even if there are fixed number of fields, the query has to check
> for each field to match at least one word.
>
> Is it possible to handle this requirement using Lucene? or should I go for
> other options?
>
> I am new to Lucene, any help would be appreciated.
>
>
>
> Thanks,
>
> Kart
>
>
--
Sincerely yours
Mikhail Khludnev
---
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>
> >
> > -----
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
--
Sincerely yours
Mikhail Khludnev
> > > I use SmartChineseAnalyzer to do the indexing, and add a document with
> a
> > > TextField whose value is a long sentence, when anaylized, will get 18
> > > terms.
> > >
> > > & then i use the same value to construct a PhraseQuery, setting slop to
> > 2,
> > > and adding the 18 terms concequently...
> > >
> > > I expect the search api to find this document, but it returns empty.
> > >
> > > Where am i wrong?
> > >
> >
> >
> > --
> > Adrien
> >
>
--
Sincerely yours
Mikhail Khludnev
emory footprint by storing only top candidate results in a binary
heap.
IIRC it's described in this classic paper
http://www.savar.se/media/1181/space_optimizations_for_total_ranking.pdf
--
Sincerely yours
Mikhail Khludnev
r composable? Lucene's
> "LeafContext" concept is really very confusing me...
>
--
Sincerely yours
Mikhail Khludnev
String term
> = byteRef.utf8ToString();terms.add(term);}
> } catch (IOException e) {e.printStackTrace();
> log.error(e.getMessage(), e);}*
>
> To my supprise, terms seems only returning the STORED value, which is the
> original value form, but i expect they should be the terms i put in each
> StringField!
>
> Is this a design miss or impl. limit?
>
--
Sincerely yours
Mikhail Khludnev
--
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
>
> --
> Vincenzo D'Amore
>
--
Sincerely yours
Mikhail Khludnev
frei
>
> On Wed, Jun 3, 2020 at 9:59 PM Mikhail Khludnev wrote:
>
> > Hi, Stefan.
> > Have you considered faceting/aggregation over `from` field?
> >
> > On Tue, May 12, 2020 at 7:23 PM Stefan Onofrei
> > wrote:
> >
> > > Hi,
> > >
i
>
> [1]
>
> https://lucene.apache.org/core/8_5_1/join/org/apache/lucene/search/join/JoinUtil.html
> [2]
>
> https://lucene.472066.n3.nabble.com/access-to-joined-documents-td4412376.html
> [3] https://issues.apache.org/jira/browse/LUCENE-3602
>
--
Sincerely yours
Mikhail Khludnev
wech...@wyona.com> wrote:
>>
>>> Hi Together
>>>
>>> You might be interesed in this paper / article
>>>
>>> https://arxiv.org/abs/2308.14963
>>>
>>> Thanks
>>>
>>> Michael
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>
--
Sincerely yours
Mikhail Khludnev
g of special chars like - etc
>
>
> On Wed, Sep 20, 2023 at 16:39 Mikhail Khludnev wrote:
>
> > Hello,
> > Check the whitespace tokenizer.
> >
> > On Wed, Sep 20, 2023 at 7:46 PM Amitesh Kumar
> > wrote:
> >
> > > Hi,
> > >
&g
gt;
> On the implementation front, I am using a set of filters like
> lowerCaseFilter, EnglishPossessiveFilter etc in addition to base tokenizer
> StandardTokenizer.
>
> Per my analysis, StandardTOkenizer strips off the % sign and hence the
> behavior.Has someone faced similar requirement? Any help/guidance is highly
> appreciated.
>
--
Sincerely yours
Mikhail Khludnev
rld = iw.getPooledInstance(sci, true);
> segmentReader = rld.getReader(IOContext.READ);
>
> //process all live docs similar to above using the segmentReader.
>
> rld.release(segmentReader);
> iw.release(rld);
> }finally{
>if (iwRef != null) {
>iwRef.decref();
> }
> }
>
> Help would be much appreciated!
>
> Thanks,
> Rahul
>
--
Sincerely yours
Mikhail Khludnev
isting index for search? Also, is there a way to configure the
> benchmark to use multiple threads for indexing (looks to me that it’s a
> single-threaded indexing)?
>
> --Regards,
> Balmukund
>
--
Sincerely yours
Mikhail Khludnev
index, with
> the bits turned on for those documents we want to be included in search
> results.
>
> If this has already been answered in a forum post, I apologize. Or if
> there's a Lucene specific forum somewhere I could look at, if you could
> kindly point me there, I would appreciate it.
>
> Any help/insight is greatly appreciated.
>
> Thanks,
> Scott Robey
>
--
Sincerely yours
Mikhail Khludnev
ad of function calls can cause delay.
> As a result I'm looking for a trick to ignore the function call and have
> all no scoring on my whole query
>
> Is it possible to ignore this step?
>
> thanks a million
>
--
Sincerely yours
Mikhail Khludnev
outside of
> Lucene?
>
> Kendall
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
--
Sincerely yours
Mikhail Khludnev
studied the "KnnVectors" a little.
> The "PerFieldKnnVectorsFormat.FieldsWriter" acutally uses the
> "Lucene94HnswVectorsFormat".
> But why do we have this kind of structures?
>
> Thanks & Regards
>
> MyCoy
>
--
Sincerely yours
Mikhail Khludnev
question about lucene suggester APIs. If I build multiple FSTs
> using a suggester, is there a way to merge two generated FSTs?
>
> --
>
> Nitish Jain
>
--
Sincerely yours
Mikhail Khludnev
ct on the retrieving quality and performance.
>
> I'm wondering if there is any best practice, e.g. how many docs should be
> in a single graph?
> Or does anyone have some production experience to share?
>
> Thanks & Regards
> MyCoy
>
--
Sincerely yours
Mikhail Khludnev
in those cases, since to
> support multi-term synonyms it needs to accept a list of Query, which would
> make it behave like a BooleanQuery. Also how scoring works with multi-term
> is another problem.
>
> Thanks & Regards!
>
--
Sincerely yours
Mikhail Khludnev
s I understand SynonymWeight
> will
> > > consider all terms as exactly the same while BooleanQuery will favor
> the
> > > documents with more matched terms.
> > > - Is it worth it to support multi-term synonyms in SynonymQuery? My
> > feeling
> > > is that it's better to just use BooleanQuery in those cases, since to
> > > support multi-term synonyms it needs to accept a list of Query, which
> > would
> > > make it behave like a BooleanQuery. Also how scoring works with
> > multi-term
> > > is another problem.
> > >
> > > Thanks & Regards!
> > >
> >
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
>
--
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!
--
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
--
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!
;
> Currently I am badly required of some examples of using TokenStream,
> tokenAttributes, *Filter.
> I need to replace the uses of "Token".
>
> Could somebody please help me in it?
>
> Regards
> Rajib
>
>
>
--
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!
39
> ) at production scale and discovered really bad performance during certain
> conditions which I attribute to this unnecessary synching. I suspect this
> may have impacted others as well
> https://stackoverflow.com/questions/42960569/indexing-taking-long-time-when-using-opennlp-lemmatizer-with-solr
> > Many thanks,
> > Luke Kot-Zaniewski
> >
>
--
Sincerely yours
Mikhail Khludnev
ut the "DirectPostingFormat" is still in Lucene9.
>
> Could anyone help me to understand how to replace the DirectDocValueFormat
> in Lucene9?
>
> Thanks
> Regards
> MyCoy
>
--
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!
t turns out I used FlattenGraphFilter and cause the PositionLength to be
> > all 1 and resulted in the behavior above =)
> >
> > A side note is that we don't need to use WORD_SEPARATOR in the synonym
> > file. SynonymMap.Parser.analyze would tokenize and append the separat
s.
>
> Regards,
> Anh Dung Bui
>
> On Mon, Jan 2, 2023 at 8:07 Mikhail Khludnev wrote:
>
> > Hello Anh,
> > I was intrigued by your question. And I managed it to work somehow.
> > see
> >
> >
> https://github.com/mkhludnev/likely/blob/eval-mulyw-syns/s
nce code " Fields fields = reader.fields();" in
> your reference link.
>
> But, there is no "reader.fields()" in 8.11.2.
>
> Could you please suggest someway to extract all the Terms with an
> IndexReader or some alternative ways?
>
> Regards
> Rajib
>
> --
{
> //Some internal function to process the doc.
> forEach.process(termDocs.doc());
> }
>
> }
>
> Regards
> Rajib
>
--
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!
ve dozens
> of such fields in our index, thus there isn't any one field that can be
> used to sort the index. So I guess my question if what I am trying to
> achieve is possible? I tried to look though Solr codebase, but so far
> couldn't come up with anything. Code example is here
> https://pastebin.com/i05E2wZy . I am using 9.4.1. Thanks in advance.
>
> Andrei
>
>
--
Sincerely yours
Mikhail Khludnev
s sense does a similar solution already exist? If
> it doesn’t exist yet would it be something that would be of interest to the
> community?
> Any thoughts on this would be much appreciated.
>
> Thanks,
> Luke
--
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!
tokenization for those wishing to
> tokenize
> > outside of their search engine.
> >
> > Does this approach even make any sense or have any pitfalls I am failing
> > to see? Assuming it makes sense does a similar solution already exist? If
> > it doesn’t exist yet
a bunch of the languages, just presented 2 examples.
> > Feel free to propose any changes, comments fixes :)
> > Thank's a lot in advance,
> > Thanos
>
--
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!
the highlighter.getBestTextFragments()
> method marks all the occurrences of "note" and "extra" in the content too.
> This we don't want.
>
> I can't see how to separate that part of the query out in the highlighter
> methods, and I wonder what best practice would be here. I'm probably being
> naive in using a single query for the whole job. Do I need to run a query
> for category/volume, and then a subquery on text and title, and just use
> the
> subquery in the highlighter? If that's the approach, is there a nice simple
> explanation somewhere you could point me to? Because I'm a simple user who
> has never done anything beyond using the simple QueryParser for everything.
>
>
>
> cheers
>
> T
>
>
>
>
>
>
>
>
--
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!
; sequence (aka input stream). This is especially important as Lucene
> > index files use checksums since around Lucene 5.
> >
> > Uwe
> >
> > Am 06.02.2023 um 11:57 schrieb Saha, Rajib:
> >> Hi Mikhail,
> >>
> >> Thanks for all you’re your suggesti
for urgent or sensitive issues
>
--
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!
f it's possible I may control which part of the text
> shall be stored during the index process? In other words, is it possible to
> strip the header when storing the text into the field?
>
> Best regards,
>
> Guan
>
> -Original Message-
> From: Mikhail Khludnev
>
utshell, I will need two parts to make this work:
>
> 1. a custom tokenizer/filter;
> 2. a custom field;
>
> Let me know if there is any caveat...
>
> And thank you so much for guiding me through!
>
> Guan
>
> -Original Message-
> From: Mikhail Khlu
However, all 3 lines would still be stored in the field if index=true and
> stored=true...
>
> I wonder if I could only store line 2 and 3 in the field in such a
> scenario?
>
> Many thanks,
>
> Guan
>
> -Original Message-
> From: Mikhail Khludnev
> Sent: Monda
;stempel-*.jar"?
>
> Regards
> Rajib
>
> -Original Message-
> From: Mikhail Khludnev
> Sent: 03 April 2023 14:05
> To: java-user@lucene.apache.org
> Subject: Re: Run time error in IndexWriter.addDocument
>
> Hi
>
> It seems like some
va:1757)
> at
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1400)
>
> Regards
> Rajib
>
--
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!
ndexWriter.optimize()
>
> Is there any similar concept in 8.11? If so, can you please help with APIs
>
org.apache.lucene.index.IndexWriter#addIndexes(org.apache.lucene.store.Directory...)
But it kicks merge underneath. Should be fine.
===
>
> Re
on those terms or
> proceed with this document without affecting it boolean score.
>
> What is the best way to achieve this?
>
--
Sincerely yours
Mikhail Khludnev
to get the matches in
> a form of terms with properties like frequncy and positions.
> How can achive this?
>
> Thanks in advance!
> Ned
>
>
--
Sincerely yours
Mikhail Khludnev
by the analyzer or indexer.
>
> I've found the MatchesIterator interface and FilterMatchesIterator class
> but was not able to use it.
>
> Thank you!
> Ned
>
--
Sincerely yours
Mikhail Khludnev
oreDocs[i].doc,
> "fieldName", query);` method exposed. I'm using lucene core 8.11.2 and
> currently I cannot upgrade to 9.0.0 or later.
>
> Any ideas? Which API version are you referring to?
>
> Thanks.
> Ned
> ____
> Von: Mikhail Khl
;keywords" field has 78
> tokens. I think its field_length(dl) is 78, but lucene handled as
> 76(approximate) as described in function explainTF(Explaination freq, long
> norm).
> Thank you very much for your reading and look forward to your
> answer!
>
>
> Koo
> Drive development engineer
--
Sincerely yours
Mikhail Khludnev
e preserving the
> NumericRangeQuery type?
> BoostQuery doesn't allow this and I haven't found a way.
>
> Thanks for your help.
>
> Claude Lepère
>
--
Sincerely yours
Mikhail Khludnev
mation from my side.
>
> Thanks In Advance.
>
> Regards
> Rajib
>
>
--
Sincerely yours
Mikhail Khludnev
ValueTermAttribute.toString();
>
> //How to get startOffset & endOffset as like in Lucene 2.4
>
> //Do some calculation based on startOffset & endOffset
> }
>
> Please let me know, if there is any further information is required from
> my side.
>
> Regards
> Rajib
>
--
Sincerely yours
Mikhail Khludnev
1 - 100 of 106 matches
Mail list logo