date:20210614

Re: Potential bug

2021-06-14 Thread Baris Kazar

i was clear on what i wanted to do with Lucene experiments in this thread.
(last part of first paragraph below)

Best regards

From: Baris Kazar 
Sent: Monday, June 14, 2021 10:28:47 AM
To: Atri Sharma ; java-user@lucene.apache.org 
; a.benede...@sease.io ; 
Baris Kazar 
Subject: Re: Potential bug

Dear Folks,-
 i have a lot of experience in performance tuning and parallel processing: 17+7 
years. So, when you say "you dont know what you ask for", that does not sound 
good at all besides i was clear on that.

Alessandro, i appreciate the apology and i would like to apologize if i hurt 
feelings and i never mean to hurt anybody's feelings and i still think i was 
not aggressive but i need to re-explain
what was wrong with the email:

I was not trying to be aggressive with my responses.
I write in this forum for a long time and never received an email like Yours.

I revised your email for this list. Because with my expertise, i dont think i 
should get a comment like the X Y problem example.

Moreover code can have bugs and raising is not a good word choice here. I am 
not here to find problems with Lucene and we are all here to use and make 
Lucene better.

And i appreciate the work committers as volunteers are doing and there is no 
doubt there. Lucene 8.y.z is much better with your work. Kudos to that success.

We need to keep the tone neutral is what i am looking for here. Yes, respect is 
fundemantal,
that is what i have been telling here in my last emails.

Would You please look at my revised email?
I think the email should have been composed
that way.

I would like to focus on my question please.
I hope we keep the tone neutral and professional.
Thanks for understanding.

Best regards

From: Atri Sharma 
Sent: Monday, June 14, 2021 8:46 AM
To: java-user@lucene.apache.org
Cc: Baris Kazar
Subject: Re: Potential bug

+1 to Adrien.

Let's keep the tone neutral.

On Mon, 14 Jun 2021, 16:00 Adrien Grand, 
mailto:jpou...@gmail.com>> wrote:
Baris, you called out an insult from Alessandro and your replies suggest
anger, but I couldn't see an insult from Alessandro actually.

+1 to Alessandro's call to make the tone softer on this discussion.

On Mon, Jun 14, 2021 at 11:28 AM Alessandro Benedetti 
mailto:a.benede...@sease.io>>
wrote:

> Hi Baris,
> first of all apologies for having misspelled your name, definitely, it was
> not meant as an insult.
> Secondly, your tone is not acceptable on this mailing list (or anywhere
> else).
> You must remember that we, committers, are operating on a volunteering
> basis, contributing code and helping people in our free time purely driven
> by passion.
> Respect is fundamental, we are not here to be treated aggressively.
>
> Regards
>
> --
> Alessandro Benedetti
> Apache Lucene/Solr Committer
> Director, R&D Software Engineer, Search Consultant
>
> www.sease.io
>
>
> On Fri, 11 Jun 2021 at 17:10, 
> mailto:baris.ka...@oracle.com>> wrote:
>
> > Let me guide to a professional answer to the below email:
> >
> >
> > Hi Baris,
> >
> > Since You mentioned You did all the performance study on your
> > application and still believe that
> >
> > the bottleneck is the fuzzy search api from Lucene, it would be best to
> > time the application for:
> >
> >   * matching phase (identifying candidates from the corpus of documents)
> >   * or in the ranking phase (scoring them by relevance)?
> >
> > Maybe this will help speedup further.
> >
> > Also, what do You mean by "what is the user needs to to limit te search
> > process" ? can you elaborate?
> >
> > Cheers
> >
> >
> >
> > My answer would be :
> >
> > i cant access the Lucene code so how can time these two cases please?
> >
> > i mean by that sentence that when i see the hits are good i would like
> > to limit the number of hits.
> >
> >
> >
> > this is more like a professional conversation please. Thanks.
> >
> > Best regards
> >
> >
> > On 6/11/21 11:57 AM, Alessandro Benedetti wrote:
> > > Hi Bazir,
> > > this feels like an X Y problem [1 <
> >
> https://urldefense.com/v3/__https://xyproblem.info__;!!GqivPVa7Brio!IrgovQa8yo6rznUAykFBDcTgg_ixlPdRqBgWx6UAfWeZTlJ99CVYsv69Tq2Yo0eBzg$
> > >].
> > > Can you express what is your original user requirement?
> > > Most of the time, at the cost of indexing time/space you may get
> quicker
> > > query times.
> > > Also, you should identify where are you wasting most of your time, in
> the
> > > matching phase (identifying candidates from the corpus of documents) or
> > in
> > > the ranking phase (scoring them by relevance)?
> > >
> > > TopScoreDocCollector is quite a solid class, there's a ton to study,
> > > analyze and experiment before raising the alarm of a bug :)
> > >
> > > Also didn't understand this :
> > > "what if the user needs to limit the search process?"
> > >

Re: Potential bug

2021-06-14 Thread Baris Kazar

Dear Folks,-
 i have a lot of experience in performance tuning and parallel processing: 17+7 
years. So, when you say "you dont know what you ask for", that does not sound 
good at all besides i was clear on that.

Alessandro, i appreciate the apology and i would like to apologize if i hurt 
feelings and i never mean to hurt anybody's feelings and i still think i was 
not aggressive but i need to re-explain
what was wrong with the email:

I was not trying to be aggressive with my responses.
I write in this forum for a long time and never received an email like Yours.

I revised your email for this list. Because with my expertise, i dont think i 
should get a comment like the X Y problem example.

Moreover code can have bugs and raising is not a good word choice here. I am 
not here to find problems with Lucene and we are all here to use and make 
Lucene better.

And i appreciate the work committers as volunteers are doing and there is no 
doubt there. Lucene 8.y.z is much better with your work. Kudos to that success.

We need to keep the tone neutral is what i am looking for here. Yes, respect is 
fundemantal,
that is what i have been telling here in my last emails.

Would You please look at my revised email?
I think the email should have been composed
that way.

I would like to focus on my question please.
I hope we keep the tone neutral and professional.
Thanks for understanding.

Best regards

From: Atri Sharma 
Sent: Monday, June 14, 2021 8:46 AM
To: java-user@lucene.apache.org
Cc: Baris Kazar
Subject: Re: Potential bug

+1 to Adrien.

Let's keep the tone neutral.

On Mon, 14 Jun 2021, 16:00 Adrien Grand, 
mailto:jpou...@gmail.com>> wrote:
Baris, you called out an insult from Alessandro and your replies suggest
anger, but I couldn't see an insult from Alessandro actually.

+1 to Alessandro's call to make the tone softer on this discussion.

On Mon, Jun 14, 2021 at 11:28 AM Alessandro Benedetti 
mailto:a.benede...@sease.io>>
wrote:

> Hi Baris,
> first of all apologies for having misspelled your name, definitely, it was
> not meant as an insult.
> Secondly, your tone is not acceptable on this mailing list (or anywhere
> else).
> You must remember that we, committers, are operating on a volunteering
> basis, contributing code and helping people in our free time purely driven
> by passion.
> Respect is fundamental, we are not here to be treated aggressively.
>
> Regards
>
> --
> Alessandro Benedetti
> Apache Lucene/Solr Committer
> Director, R&D Software Engineer, Search Consultant
>
> www.sease.io
>
>
> On Fri, 11 Jun 2021 at 17:10, 
> mailto:baris.ka...@oracle.com>> wrote:
>
> > Let me guide to a professional answer to the below email:
> >
> >
> > Hi Baris,
> >
> > Since You mentioned You did all the performance study on your
> > application and still believe that
> >
> > the bottleneck is the fuzzy search api from Lucene, it would be best to
> > time the application for:
> >
> >   * matching phase (identifying candidates from the corpus of documents)
> >   * or in the ranking phase (scoring them by relevance)?
> >
> > Maybe this will help speedup further.
> >
> > Also, what do You mean by "what is the user needs to to limit te search
> > process" ? can you elaborate?
> >
> > Cheers
> >
> >
> >
> > My answer would be :
> >
> > i cant access the Lucene code so how can time these two cases please?
> >
> > i mean by that sentence that when i see the hits are good i would like
> > to limit the number of hits.
> >
> >
> >
> > this is more like a professional conversation please. Thanks.
> >
> > Best regards
> >
> >
> > On 6/11/21 11:57 AM, Alessandro Benedetti wrote:
> > > Hi Bazir,
> > > this feels like an X Y problem [1 <
> >
> https://urldefense.com/v3/__https://xyproblem.info__;!!GqivPVa7Brio!IrgovQa8yo6rznUAykFBDcTgg_ixlPdRqBgWx6UAfWeZTlJ99CVYsv69Tq2Yo0eBzg$
> > >].
> > > Can you express what is your original user requirement?
> > > Most of the time, at the cost of indexing time/space you may get
> quicker
> > > query times.
> > > Also, you should identify where are you wasting most of your time, in
> the
> > > matching phase (identifying candidates from the corpus of documents) or
> > in
> > > the ranking phase (scoring them by relevance)?
> > >
> > > TopScoreDocCollector is quite a solid class, there's a ton to study,
> > > analyze and experiment before raising the alarm of a bug :)
> > >
> > > Also didn't understand this :
> > > "what if the user needs to limit the search process?"
> > > Can you elaborate?
> > >
> > > Cheers
> > >
> > >
> > >
> > > [1]
> >
> https://urldefense.com/v3/__https://xyproblem.info__;!!GqivPVa7Brio!IrgovQa8yo6rznUAykFBDcTgg_ixlPdRqBgWx6UAfWeZTlJ99CVYsv69Tq2Yo0eBzg$
> > > --
> > > Alessandro Benedetti
> > > Apache Lucene/Solr Committer
> > > Director, R&D Software

Re: Potential bug

2021-06-14 Thread Atri Sharma

+1 to Adrien.

Let's keep the tone neutral.

On Mon, 14 Jun 2021, 16:00 Adrien Grand,  wrote:

> Baris, you called out an insult from Alessandro and your replies suggest
> anger, but I couldn't see an insult from Alessandro actually.
>
> +1 to Alessandro's call to make the tone softer on this discussion.
>
> On Mon, Jun 14, 2021 at 11:28 AM Alessandro Benedetti <
> a.benede...@sease.io>
> wrote:
>
> > Hi Baris,
> > first of all apologies for having misspelled your name, definitely, it
> was
> > not meant as an insult.
> > Secondly, your tone is not acceptable on this mailing list (or anywhere
> > else).
> > You must remember that we, committers, are operating on a volunteering
> > basis, contributing code and helping people in our free time purely
> driven
> > by passion.
> > Respect is fundamental, we are not here to be treated aggressively.
> >
> > Regards
> >
> > --
> > Alessandro Benedetti
> > Apache Lucene/Solr Committer
> > Director, R&D Software Engineer, Search Consultant
> >
> > www.sease.io
> >
> >
> > On Fri, 11 Jun 2021 at 17:10,  wrote:
> >
> > > Let me guide to a professional answer to the below email:
> > >
> > >
> > > Hi Baris,
> > >
> > > Since You mentioned You did all the performance study on your
> > > application and still believe that
> > >
> > > the bottleneck is the fuzzy search api from Lucene, it would be best to
> > > time the application for:
> > >
> > >   * matching phase (identifying candidates from the corpus of
> documents)
> > >   * or in the ranking phase (scoring them by relevance)?
> > >
> > > Maybe this will help speedup further.
> > >
> > > Also, what do You mean by "what is the user needs to to limit te search
> > > process" ? can you elaborate?
> > >
> > > Cheers
> > >
> > >
> > >
> > > My answer would be :
> > >
> > > i cant access the Lucene code so how can time these two cases please?
> > >
> > > i mean by that sentence that when i see the hits are good i would like
> > > to limit the number of hits.
> > >
> > >
> > >
> > > this is more like a professional conversation please. Thanks.
> > >
> > > Best regards
> > >
> > >
> > > On 6/11/21 11:57 AM, Alessandro Benedetti wrote:
> > > > Hi Bazir,
> > > > this feels like an X Y problem [1 <
> > >
> >
> https://urldefense.com/v3/__https://xyproblem.info__;!!GqivPVa7Brio!IrgovQa8yo6rznUAykFBDcTgg_ixlPdRqBgWx6UAfWeZTlJ99CVYsv69Tq2Yo0eBzg$
> > > >].
> > > > Can you express what is your original user requirement?
> > > > Most of the time, at the cost of indexing time/space you may get
> > quicker
> > > > query times.
> > > > Also, you should identify where are you wasting most of your time, in
> > the
> > > > matching phase (identifying candidates from the corpus of documents)
> or
> > > in
> > > > the ranking phase (scoring them by relevance)?
> > > >
> > > > TopScoreDocCollector is quite a solid class, there's a ton to study,
> > > > analyze and experiment before raising the alarm of a bug :)
> > > >
> > > > Also didn't understand this :
> > > > "what if the user needs to limit the search process?"
> > > > Can you elaborate?
> > > >
> > > > Cheers
> > > >
> > > >
> > > >
> > > > [1]
> > >
> >
> https://urldefense.com/v3/__https://xyproblem.info__;!!GqivPVa7Brio!IrgovQa8yo6rznUAykFBDcTgg_ixlPdRqBgWx6UAfWeZTlJ99CVYsv69Tq2Yo0eBzg$
> > > > --
> > > > Alessandro Benedetti
> > > > Apache Lucene/Solr Committer
> > > > Director, R&D Software Engineer, Search Consultant
> > > >
> > > >
> > >
> >
> https://urldefense.com/v3/__http://www.sease.io__;!!GqivPVa7Brio!IrgovQa8yo6rznUAykFBDcTgg_ixlPdRqBgWx6UAfWeZTlJ99CVYsv69Tq07hrsXPw$
> > > >
> > > >
> > > > On Wed, 9 Jun 2021 at 19:08,  wrote:
> > > >
> > > >> Yes, i did those and i believe i am at the best level of performance
> > now
> > > >> and it is not bad at all but i want to make it much better.
> > > >>
> > > >> i see like a linear drop in timings when i go lower number of words
> > but
> > > >> let me do that quick study again.
> > > >>
> > > >> Fuzzy search  is always expensive but that seems to suit best to my
> > > needs.
> > > >>
> > > >>
> > > >> Thanks Diego for these great questions and i already explored them.
> > But
> > > >> thanks again.
> > > >>
> > > >> Best regards
> > > >>
> > > >>
> > > >> On 6/9/21 2:04 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) wrote:
> > > >>> I have never used fuzzy search but from the documentation it seems
> > very
> > > >> expensive, and if you do it on 10 terms and 1M documents it seems
> very
> > > very
> > > >> very expensive.
> > > >>> Are you using the default 'fuzzyness' parameter? (0.5) - It might
> end
> > > up
> > > >> exploring a lot of documents, did you try to play with that
> parameter?
> > > >>> Have you tried to see how the performance change if you do not use
> > > fuzzy
> > > >> (just to see if is fuzzy the introduce the slow down)?
> > > >>> Or what happens to performance if you do fuzzy with 1, 2, 5 terms
> > > >> instead of 10?
> > > >>>
> > > >>> From: java-user@lucene.ap

Re: Handling Archive Data Using Lucene 7.6

2021-06-14 Thread Adrien Grand

Hi Rashmi,

This upgrade skips 3 major versions, the simplest path will be to reindex
your content.


On Fri, Jun 11, 2021 at 10:40 AM Rashmi Bisanal
 wrote:

> Hi Lucene Support Team ,
>
>
>
> Objective : Upgrade Lucene 3.6 to 7.6
>
>
>
> Description : We have huge data against version Lucene 3.6 .All of this
> data needs to upgraded to version Lucene 7.6 without any changes .
> Requesting your support on how to proceed with this ?
>
>
>
>
>
> Regards
>
> Rashmi
>
>
> 
> Disclaimer: This message and the information contained herein is
> proprietary and confidential and subject to the Tech Mahindra policy
> statement, you may review the policy at
> http://www.techmahindra.com/Disclaimer.html externally
> http://tim.techmahindra.com/tim/disclaimer.html internally within
> TechMahindra.
> 
>
>


-- 
Adrien

Re: Potential bug

2021-06-14 Thread Adrien Grand

Baris, you called out an insult from Alessandro and your replies suggest
anger, but I couldn't see an insult from Alessandro actually.

+1 to Alessandro's call to make the tone softer on this discussion.

On Mon, Jun 14, 2021 at 11:28 AM Alessandro Benedetti 
wrote:

> Hi Baris,
> first of all apologies for having misspelled your name, definitely, it was
> not meant as an insult.
> Secondly, your tone is not acceptable on this mailing list (or anywhere
> else).
> You must remember that we, committers, are operating on a volunteering
> basis, contributing code and helping people in our free time purely driven
> by passion.
> Respect is fundamental, we are not here to be treated aggressively.
>
> Regards
>
> --
> Alessandro Benedetti
> Apache Lucene/Solr Committer
> Director, R&D Software Engineer, Search Consultant
>
> www.sease.io
>
>
> On Fri, 11 Jun 2021 at 17:10,  wrote:
>
> > Let me guide to a professional answer to the below email:
> >
> >
> > Hi Baris,
> >
> > Since You mentioned You did all the performance study on your
> > application and still believe that
> >
> > the bottleneck is the fuzzy search api from Lucene, it would be best to
> > time the application for:
> >
> >   * matching phase (identifying candidates from the corpus of documents)
> >   * or in the ranking phase (scoring them by relevance)?
> >
> > Maybe this will help speedup further.
> >
> > Also, what do You mean by "what is the user needs to to limit te search
> > process" ? can you elaborate?
> >
> > Cheers
> >
> >
> >
> > My answer would be :
> >
> > i cant access the Lucene code so how can time these two cases please?
> >
> > i mean by that sentence that when i see the hits are good i would like
> > to limit the number of hits.
> >
> >
> >
> > this is more like a professional conversation please. Thanks.
> >
> > Best regards
> >
> >
> > On 6/11/21 11:57 AM, Alessandro Benedetti wrote:
> > > Hi Bazir,
> > > this feels like an X Y problem [1 <
> >
> https://urldefense.com/v3/__https://xyproblem.info__;!!GqivPVa7Brio!IrgovQa8yo6rznUAykFBDcTgg_ixlPdRqBgWx6UAfWeZTlJ99CVYsv69Tq2Yo0eBzg$
> > >].
> > > Can you express what is your original user requirement?
> > > Most of the time, at the cost of indexing time/space you may get
> quicker
> > > query times.
> > > Also, you should identify where are you wasting most of your time, in
> the
> > > matching phase (identifying candidates from the corpus of documents) or
> > in
> > > the ranking phase (scoring them by relevance)?
> > >
> > > TopScoreDocCollector is quite a solid class, there's a ton to study,
> > > analyze and experiment before raising the alarm of a bug :)
> > >
> > > Also didn't understand this :
> > > "what if the user needs to limit the search process?"
> > > Can you elaborate?
> > >
> > > Cheers
> > >
> > >
> > >
> > > [1]
> >
> https://urldefense.com/v3/__https://xyproblem.info__;!!GqivPVa7Brio!IrgovQa8yo6rznUAykFBDcTgg_ixlPdRqBgWx6UAfWeZTlJ99CVYsv69Tq2Yo0eBzg$
> > > --
> > > Alessandro Benedetti
> > > Apache Lucene/Solr Committer
> > > Director, R&D Software Engineer, Search Consultant
> > >
> > >
> >
> https://urldefense.com/v3/__http://www.sease.io__;!!GqivPVa7Brio!IrgovQa8yo6rznUAykFBDcTgg_ixlPdRqBgWx6UAfWeZTlJ99CVYsv69Tq07hrsXPw$
> > >
> > >
> > > On Wed, 9 Jun 2021 at 19:08,  wrote:
> > >
> > >> Yes, i did those and i believe i am at the best level of performance
> now
> > >> and it is not bad at all but i want to make it much better.
> > >>
> > >> i see like a linear drop in timings when i go lower number of words
> but
> > >> let me do that quick study again.
> > >>
> > >> Fuzzy search  is always expensive but that seems to suit best to my
> > needs.
> > >>
> > >>
> > >> Thanks Diego for these great questions and i already explored them.
> But
> > >> thanks again.
> > >>
> > >> Best regards
> > >>
> > >>
> > >> On 6/9/21 2:04 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) wrote:
> > >>> I have never used fuzzy search but from the documentation it seems
> very
> > >> expensive, and if you do it on 10 terms and 1M documents it seems very
> > very
> > >> very expensive.
> > >>> Are you using the default 'fuzzyness' parameter? (0.5) - It might end
> > up
> > >> exploring a lot of documents, did you try to play with that parameter?
> > >>> Have you tried to see how the performance change if you do not use
> > fuzzy
> > >> (just to see if is fuzzy the introduce the slow down)?
> > >>> Or what happens to performance if you do fuzzy with 1, 2, 5 terms
> > >> instead of 10?
> > >>>
> > >>> From: java-user@lucene.apache.org At: 06/09/21 18:56:31To:
> > >> java-user@lucene.apache.org,  baris.ka...@oracle.com
> > >>> Subject: Re: Potential bug
> > >>>
> > >>> i cant reveal those details i am very sorry. but it is more than 1
> > >> million.
> > >>> let me tell that i have a lot of code that processes results from
> > lucene
> > >>> but the bottle neck is lucene fuzzy search.
> > >>>
> > >>> Best regards
> > >>>
> > >>>
> > >>> On 6/9/

Re: Potential bug

2021-06-14 Thread Alessandro Benedetti

Hi Baris,
first of all apologies for having misspelled your name, definitely, it was
not meant as an insult.
Secondly, your tone is not acceptable on this mailing list (or anywhere
else).
You must remember that we, committers, are operating on a volunteering
basis, contributing code and helping people in our free time purely driven
by passion.
Respect is fundamental, we are not here to be treated aggressively.

Regards

--
Alessandro Benedetti
Apache Lucene/Solr Committer
Director, R&D Software Engineer, Search Consultant

www.sease.io


On Fri, 11 Jun 2021 at 17:10,  wrote:

> Let me guide to a professional answer to the below email:
>
>
> Hi Baris,
>
> Since You mentioned You did all the performance study on your
> application and still believe that
>
> the bottleneck is the fuzzy search api from Lucene, it would be best to
> time the application for:
>
>   * matching phase (identifying candidates from the corpus of documents)
>   * or in the ranking phase (scoring them by relevance)?
>
> Maybe this will help speedup further.
>
> Also, what do You mean by "what is the user needs to to limit te search
> process" ? can you elaborate?
>
> Cheers
>
>
>
> My answer would be :
>
> i cant access the Lucene code so how can time these two cases please?
>
> i mean by that sentence that when i see the hits are good i would like
> to limit the number of hits.
>
>
>
> this is more like a professional conversation please. Thanks.
>
> Best regards
>
>
> On 6/11/21 11:57 AM, Alessandro Benedetti wrote:
> > Hi Bazir,
> > this feels like an X Y problem [1 <
> https://urldefense.com/v3/__https://xyproblem.info__;!!GqivPVa7Brio!IrgovQa8yo6rznUAykFBDcTgg_ixlPdRqBgWx6UAfWeZTlJ99CVYsv69Tq2Yo0eBzg$
> >].
> > Can you express what is your original user requirement?
> > Most of the time, at the cost of indexing time/space you may get quicker
> > query times.
> > Also, you should identify where are you wasting most of your time, in the
> > matching phase (identifying candidates from the corpus of documents) or
> in
> > the ranking phase (scoring them by relevance)?
> >
> > TopScoreDocCollector is quite a solid class, there's a ton to study,
> > analyze and experiment before raising the alarm of a bug :)
> >
> > Also didn't understand this :
> > "what if the user needs to limit the search process?"
> > Can you elaborate?
> >
> > Cheers
> >
> >
> >
> > [1]
> https://urldefense.com/v3/__https://xyproblem.info__;!!GqivPVa7Brio!IrgovQa8yo6rznUAykFBDcTgg_ixlPdRqBgWx6UAfWeZTlJ99CVYsv69Tq2Yo0eBzg$
> > --
> > Alessandro Benedetti
> > Apache Lucene/Solr Committer
> > Director, R&D Software Engineer, Search Consultant
> >
> >
> https://urldefense.com/v3/__http://www.sease.io__;!!GqivPVa7Brio!IrgovQa8yo6rznUAykFBDcTgg_ixlPdRqBgWx6UAfWeZTlJ99CVYsv69Tq07hrsXPw$
> >
> >
> > On Wed, 9 Jun 2021 at 19:08,  wrote:
> >
> >> Yes, i did those and i believe i am at the best level of performance now
> >> and it is not bad at all but i want to make it much better.
> >>
> >> i see like a linear drop in timings when i go lower number of words but
> >> let me do that quick study again.
> >>
> >> Fuzzy search  is always expensive but that seems to suit best to my
> needs.
> >>
> >>
> >> Thanks Diego for these great questions and i already explored them. But
> >> thanks again.
> >>
> >> Best regards
> >>
> >>
> >> On 6/9/21 2:04 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) wrote:
> >>> I have never used fuzzy search but from the documentation it seems very
> >> expensive, and if you do it on 10 terms and 1M documents it seems very
> very
> >> very expensive.
> >>> Are you using the default 'fuzzyness' parameter? (0.5) - It might end
> up
> >> exploring a lot of documents, did you try to play with that parameter?
> >>> Have you tried to see how the performance change if you do not use
> fuzzy
> >> (just to see if is fuzzy the introduce the slow down)?
> >>> Or what happens to performance if you do fuzzy with 1, 2, 5 terms
> >> instead of 10?
> >>>
> >>> From: java-user@lucene.apache.org At: 06/09/21 18:56:31To:
> >> java-user@lucene.apache.org,  baris.ka...@oracle.com
> >>> Subject: Re: Potential bug
> >>>
> >>> i cant reveal those details i am very sorry. but it is more than 1
> >> million.
> >>> let me tell that i have a lot of code that processes results from
> lucene
> >>> but the bottle neck is lucene fuzzy search.
> >>>
> >>> Best regards
> >>>
> >>>
> >>> On 6/9/21 1:53 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) wrote:
>  How many documents do you have in the index?
>  and can you show an example of query?
> 
> 
>  From: java-user@lucene.apache.org At: 06/09/21 18:33:25To:
> >>> java-user@lucene.apache.org,  baris.ka...@oracle.com
>  Subject: Re: Potential bug
> 
>  i have only two fields one string the other is a number (stored as
>  string), i guess you cant go simpler than this.
> 
>  i retreieve the hits and my major bottleneck is lucene fuzzy search.
> 
> 
>

Re: Potential bug

Re: Potential bug

Re: Potential bug

Re: Handling Archive Data Using Lucene 7.6

Re: Potential bug

Re: Potential bug

6 matches

Site Navigation

Mail list logo

Footer information