Re: 8.5 release

2020-03-06 Thread Ishan Chattopadhyaya
Alan*, not Alas.

On Sat, 7 Mar, 2020, 5:44 pm Ishan Chattopadhyaya, <
ichattopadhy...@gmail.com> wrote:

> Alas, I've left a comment for you in
> https://issues.apache.org/jira/browse/LUCENE-9170. I leave it up to
> your judgement whether it continues to be a blocker or can be
> prioritized down.
>
> On Fri, Mar 6, 2020 at 10:59 PM Chris Hostetter
>  wrote:
> >
> >
> > : I’ve created a branch for the 8.5 release (`branch_8_5`) and pushed it
> > : to the apache repository.  We’re now at feature freeze, so only bug
> > : fixes should be pushed to the branch.
> >
> > I'm a little confused where folks should put stuff in CHANGES.txt right
> > now if it's *NOT* something ready to backport all the way to branch_8_5.
> >
> > specifically: branch_8x now has an 8.6 section in CHANGES.txt, but master
> > doesn't ... so if i have a feature (or low priority bug fix that I don't
> > want to rush into 8.5 w/o more review) I'm not sure where/how to record
> it
> > w/o having weird merge conflicts (or causing weird merge conflicts later0
> >
> >
> > -Hoss
> > http://www.lucidworks.com/
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
>


Re: 8.5 release

2020-03-06 Thread Ishan Chattopadhyaya
Alas, I've left a comment for you in
https://issues.apache.org/jira/browse/LUCENE-9170. I leave it up to
your judgement whether it continues to be a blocker or can be
prioritized down.

On Fri, Mar 6, 2020 at 10:59 PM Chris Hostetter
 wrote:
>
>
> : I’ve created a branch for the 8.5 release (`branch_8_5`) and pushed it
> : to the apache repository.  We’re now at feature freeze, so only bug
> : fixes should be pushed to the branch.
>
> I'm a little confused where folks should put stuff in CHANGES.txt right
> now if it's *NOT* something ready to backport all the way to branch_8_5.
>
> specifically: branch_8x now has an 8.6 section in CHANGES.txt, but master
> doesn't ... so if i have a feature (or low priority bug fix that I don't
> want to rush into 8.5 w/o more review) I'm not sure where/how to record it
> w/o having weird merge conflicts (or causing weird merge conflicts later0
>
>
> -Hoss
> http://www.lucidworks.com/
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Inconsistent query results in Lucene 8.1.0

2020-03-06 Thread David Smiley
Hi Phil,

Please start new threads (emails) for new problems instead of replying to
an existing one.  The behavior of the existing thread does not result in an
error; yours does, and so I think they are entirely dissimilar.  Also,
you'll need to dig deeper to learn what the particular error was and report
that.  Go to Solr's logs.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Fri, Mar 6, 2020 at 2:01 PM Staley, Phil R - DCF <
phil.sta...@wisconsin.gov> wrote:

> We recently upgraded to our Drupal 8 sites to SOLR 8.3.1.  We are now
> getting reports of certain patterns of search terms resulting in an error
> that reads, “The website encountered an unexpected error. Please try again
> later.”
>
>
>
> Below is a list of example terms that always result in this error and a
> similar list that works fine.  The problem pattern seems to be a search
> term that contains 2 or 3 characters followed by a space, followed by
> additional text.
>
>
>
> To confirm that the problem is version 8 of SOLR, I have updated our local
> and UAT sites with the latest Drupal updates that did include an update to
> the Search API Solr module and tested the terms below under SOLR 7.7.2,
> 8.3.1, and 8.4.1.  Under version 7.7.2  everything works fine. Under either
> of the version 8, the problem returns.
>
>
>
> Thoughts?
>
>
>
> Search terms that result in error
>
> • w-2 agency directory
>
> • agency w-2 directory
>
> • w-2 agency
>
> • w-2 directory
>
> • w2 agency directory
>
> • w2 agency
>
> • w2 directory
>
>
>
> Search terms that do not result in error • w-22 agency directory • agency
> directory w-2 • agency w-2directory • agencyw-2 directory • w-2 • w2 •
> agency directory • agency • directory • -2 agency directory • 2 agency
> directory • w-2agency directory • w2agency directory
>
>
>
>
>
> *From:* Michele Palmia 
> *Sent:* Friday, March 6, 2020 9:50 AM
> *To:* dev@lucene.apache.org
> *Subject:* Re: Inconsistent query results in Lucene 8.1.0
>
>
>
> Hi all,
>
>
>
> I looked into this today. I can reproduce it and I believe it's a bug.
>
> This is caused by the following working together:
> - LUCENE-7386
> 
> Flatten nested disjunctions
>
> - LUCENE-7925
> 
> Deduplicate SHOULD and MUST clauses in BooleanQuery
>
>
>
> Blended term queries modify the df/ttf of their terms to make sure all
> terms produce identical scores. In this case, two blended term queries
> contain a few terms each, only some of which overlap. The two queries
> calculate different df/ttf for their terms respectively, since the two sets
> are different. During the rewrite process,
>
>1. the two Blended queries get rewritten as Boolean queries
>themselves, with each (modified) TermQuery as a SHOULD clause
>2. the nested Boolean queries get flattened, since they are nested
>disjunctions
>3. the Term queries (some of which are actually Boost queries) are
>deduplicated, with one of the two TermQuery and its modified TermStates
>being picked at random (the randomness is due to the HashSet underlying
>Lucene's MultiSet).
>
> I haven't managed to create a failing test yet, I'll share it when I have
> one ready.
>
> If anybody has suggestions or pointers on how this should be fixed, I'm
> also happy to provide a patch - I'm just a bit clueless what the right
> thing to do would be here: I have a feeling (2.) should not happen for
> (rewritten) Blended Queries?
>
>
>
> Cheers,
>
> Michele
>
>
>
>
>
> On Tue, Mar 3, 2020 at 7:55 PM Fiona Hasanaj  wrote:
>
> Hello,
>
>
>
> I’m Fiona with Basis Technology. We’re investigating what we believe to be
> a bug involving inconsistent query results. We have binary searched this
> issue and found that it specifically appears when flattening nested
> disjunctions was introduced with the merge of LUCENE-7386
> .
> In order to reproduce the issue, I 

RE: Inconsistent query results in Lucene 8.1.0

2020-03-06 Thread Staley, Phil R - DCF
We recently upgraded to our Drupal 8 sites to SOLR 8.3.1.  We are now getting 
reports of certain patterns of search terms resulting in an error that reads, 
“The website encountered an unexpected error. Please try again later.”



Below is a list of example terms that always result in this error and a similar 
list that works fine.  The problem pattern seems to be a search term that 
contains 2 or 3 characters followed by a space, followed by additional text.



To confirm that the problem is version 8 of SOLR, I have updated our local and 
UAT sites with the latest Drupal updates that did include an update to the 
Search API Solr module and tested the terms below under SOLR 7.7.2, 8.3.1, and 
8.4.1.  Under version 7.7.2  everything works fine. Under either of the version 
8, the problem returns.



Thoughts?



Search terms that result in error

• w-2 agency directory

• agency w-2 directory

• w-2 agency

• w-2 directory

• w2 agency directory

• w2 agency

• w2 directory



Search terms that do not result in error • w-22 agency directory • agency 
directory w-2 • agency w-2directory • agencyw-2 directory • w-2 • w2 • agency 
directory • agency • directory • -2 agency directory • 2 agency directory • 
w-2agency directory • w2agency directory


From: Michele Palmia 
Sent: Friday, March 6, 2020 9:50 AM
To: dev@lucene.apache.org
Subject: Re: Inconsistent query results in Lucene 8.1.0

Hi all,

I looked into this today. I can reproduce it and I believe it's a bug.
This is caused by the following working together:
- 
LUCENE-7386
 Flatten nested disjunctions
- 
LUCENE-7925
 Deduplicate SHOULD and MUST clauses in BooleanQuery

Blended term queries modify the df/ttf of their terms to make sure all terms 
produce identical scores. In this case, two blended term queries contain a few 
terms each, only some of which overlap. The two queries calculate different 
df/ttf for their terms respectively, since the two sets are different. During 
the rewrite process,

  1.  the two Blended queries get rewritten as Boolean queries themselves, with 
each (modified) TermQuery as a SHOULD clause
  2.  the nested Boolean queries get flattened, since they are nested 
disjunctions
  3.  the Term queries (some of which are actually Boost queries) are 
deduplicated, with one of the two TermQuery and its modified TermStates being 
picked at random (the randomness is due to the HashSet underlying Lucene's 
MultiSet).
I haven't managed to create a failing test yet, I'll share it when I have one 
ready.
If anybody has suggestions or pointers on how this should be fixed, I'm also 
happy to provide a patch - I'm just a bit clueless what the right thing to do 
would be here: I have a feeling (2.) should not happen for (rewritten) Blended 
Queries?

Cheers,
Michele


On Tue, Mar 3, 2020 at 7:55 PM Fiona Hasanaj 
mailto:fi...@basistech.com>> wrote:
Hello,

I’m Fiona with Basis Technology. We’re investigating what we believe to be a 
bug involving inconsistent query results. We have binary searched this issue 
and found that it specifically appears when flattening nested disjunctions was 
introduced with the merge of 
LUCENE-7386.
 In order to reproduce the issue, I have attached a Lucene index built in 
Lucene 8.1.0 as names_index.tar.gz and if you run the attached Java class 
(LuceneSearchIndex.java) multiple times against Lucene 8.0.0 you'll see the 
max_score is the same between runs whereas if you run it against Lucene 8.1.0 
you'll see inconsistent max_score between runs (try a max of 10 runs and you 
should be able to see that sometimes it returns max_score of 1.8651859 and 
sometimes 2.1415303).

From debugging in Lucene 8.1.0, the query against the name index before 
flattening its nested disjunctions looks like below:



(((bt_rni_name_encoded_1:ALFR)^0.75 bt_rni_name_encoded_1:ALTR 
(bt_rni_name_encoded_1:ANTR)^0.75 

Re: Inconsistent query results in Lucene 8.1.0

2020-03-06 Thread Atri Sharma
> the two Blended queries get rewritten as Boolean queries themselves, with 
> each (modified) TermQuery as a SHOULD clause
> the nested Boolean queries get flattened, since they are nested disjunctions
> the Term queries (some of which are actually Boost queries) are deduplicated, 
> with one of the two TermQuery and its modified TermStates being picked at 
> random (the randomness is due to the HashSet underlying Lucene's MultiSet).

This seems a bit worrisome in itself -- the data structure supporting
the implementation should not affect the selection.

-- 
Regards,

Atri
Apache Concerted

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Inconsistent query results in Lucene 8.1.0

2020-03-06 Thread Michael Sokolov
So - I think you should open an issue. Can you determine whether
flattening on its own would result in a bug? If not, then perhaps
focus on the merging (deduplication) and whether it properly respects
boosting?

On Fri, Mar 6, 2020 at 10:50 AM Michele Palmia  wrote:
>
> Hi all,
>
> I looked into this today. I can reproduce it and I believe it's a bug.
> This is caused by the following working together:
> - LUCENE-7386 Flatten nested disjunctions
> - LUCENE-7925 Deduplicate SHOULD and MUST clauses in BooleanQuery
>
> Blended term queries modify the df/ttf of their terms to make sure all terms 
> produce identical scores. In this case, two blended term queries contain a 
> few terms each, only some of which overlap. The two queries calculate 
> different df/ttf for their terms respectively, since the two sets are 
> different. During the rewrite process,
>
> the two Blended queries get rewritten as Boolean queries themselves, with 
> each (modified) TermQuery as a SHOULD clause
> the nested Boolean queries get flattened, since they are nested disjunctions
> the Term queries (some of which are actually Boost queries) are deduplicated, 
> with one of the two TermQuery and its modified TermStates being picked at 
> random (the randomness is due to the HashSet underlying Lucene's MultiSet).
>
> I haven't managed to create a failing test yet, I'll share it when I have one 
> ready.
> If anybody has suggestions or pointers on how this should be fixed, I'm also 
> happy to provide a patch - I'm just a bit clueless what the right thing to do 
> would be here: I have a feeling (2.) should not happen for (rewritten) 
> Blended Queries?
>
> Cheers,
> Michele
>
>
> On Tue, Mar 3, 2020 at 7:55 PM Fiona Hasanaj  wrote:
>>
>> Hello,
>>
>> I’m Fiona with Basis Technology. We’re investigating what we believe to be a 
>> bug involving inconsistent query results. We have binary searched this issue 
>> and found that it specifically appears when flattening nested disjunctions 
>> was introduced with the merge of LUCENE-7386. In order to reproduce the 
>> issue, I have attached a Lucene index built in Lucene 8.1.0 as 
>> names_index.tar.gz and if you run the attached Java class 
>> (LuceneSearchIndex.java) multiple times against Lucene 8.0.0 you'll see the 
>> max_score is the same between runs whereas if you run it against Lucene 
>> 8.1.0 you'll see inconsistent max_score between runs (try a max of 10 runs 
>> and you should be able to see that sometimes it returns max_score of 
>> 1.8651859 and sometimes 2.1415303).
>>
>> From debugging in Lucene 8.1.0, the query against the name index before 
>> flattening its nested disjunctions looks like below:
>>
>> (((bt_rni_name_encoded_1:ALFR)^0.75 bt_rni_name_encoded_1:ALTR 
>> (bt_rni_name_encoded_1:ANTR)^0.75 (bt_rni_name_encoded_1:LTR)^0.666) 
>> ((bt_rni_name_encoded_1:ALTR)^0.75 (bt_rni_name_encoded_1:FLTMR)^0.75 
>> (bt_rni_name_encoded_1:FLTRN)^0.75 (bt_rni_name_encoded_1:FLTS)^0.75 
>> (bt_rni_name_encoded_1:FTR)^0.666 
>> (bt_rni_name_encoded_1:LTR)^0.666)) | 
>> (((bt_rni_name_encoded_2:FLTR)^0.75) (bt_rni_name_encoded_2:FLTR 
>> (bt_rni_name_encoded_2:FLTRN)^0.75))
>>
>>
>> The term that's causing the difference in the final score is 
>> bt_rni_name_encoded_1:ALTR and as we can see in the above query, it shows 
>> twice nested under different clauses: in the first clause that it occurs the 
>> docFreq for it is 3, and for the same term but in the second clause that it 
>> appears in, its docFreq is 2. This happens in Lucene 8.0.0 as well; is a 
>> term being read with different docFreq values expected behaviour?
>>
>> After flattening the nested disjunctions (part of query rewrite process), 
>> the query looks like below:
>>
>> ((bt_rni_name_encoded_1:FTR)^0.666 (bt_rni_name_encoded_1:FLTRN)^0.75 
>> (bt_rni_name_encoded_1:FLTMR)^0.75 (bt_rni_name_encoded_1:ALFR)^0.75 
>> (bt_rni_name_encoded_1:FLTS)^0.75 (bt_rni_name_encoded_1:ANTR)^0.75 
>> (bt_rni_name_encoded_1:LTR)^1.333 (bt_rni_name_encoded_1:ALTR)^1.75) | 
>> ((bt_rni_name_encoded_2:FLTRN)^0.75 (bt_rni_name_encoded_2:FLTR)^1.75)
>>
>>
>> As you can see, bt_rni_name_encoded_1:ALTR shows only once, but the weight 
>> has been summed up from the original query. This is the version of the query 
>> that actually gets used, and the docFreq here for the 
>> bt_rni_name_encoded_1:ALTR term sometimes it shows as 3 and sometimes it 
>> shows as 2 between runs and final score changes accordingly to that. Is this 
>> "coin toss" pick of docFreq for the same term expected behaviour?
>>
>> Looks like the issue stems from one of the behaviours observed and 
>> highlighted in bold.
>>
>> Looking forward to hearing back from you.
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org

-
To 

Re: 8.5 release

2020-03-06 Thread Chris Hostetter

: I’ve created a branch for the 8.5 release (`branch_8_5`) and pushed it 
: to the apache repository.  We’re now at feature freeze, so only bug 
: fixes should be pushed to the branch.

I'm a little confused where folks should put stuff in CHANGES.txt right 
now if it's *NOT* something ready to backport all the way to branch_8_5.

specifically: branch_8x now has an 8.6 section in CHANGES.txt, but master 
doesn't ... so if i have a feature (or low priority bug fix that I don't 
want to rush into 8.5 w/o more review) I'm not sure where/how to record it 
w/o having weird merge conflicts (or causing weird merge conflicts later0


-Hoss
http://www.lucidworks.com/

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: CHANGES.txt and issue categorization

2020-03-06 Thread Jason Gerlowski
> Furthermore the message itself is often not code reviewed but should be.

+1 to that point especially.  I know on a few things I've worked on
that once you get down in the weeds of the implementation, tests, etc.
... coming up with a good high-level "why might a user care" sentence
can be tough.  We should push each other more on getting that
peer-reviewed for the sake of users who have to dig through
CHANGES.txt.

On Thu, Mar 5, 2020 at 5:17 PM Houston Putman  wrote:
>
> +1 to move the entries.
>
> I would suggest that we document this organization somewhere though, so that 
> future developers can adhere to the guidelines without finding this thread. I 
> don't have a strong opinion on where this would go, maybe a legend at the top 
> of CHANGES.txt or in the developer docs.
>
> I do agree that the JIRA categories should be aligned at some point, as that 
> would likely help a lot.
>
> - Houston
>
> On Thu, Mar 5, 2020 at 5:00 PM Bruno Roustant  
> wrote:
>>
>> +1 to move these entries. And I agree with the categories definitions.
>>
>> Le mer. 4 mars 2020 à 10:24, Adrien Grand  a écrit :
>>>
>>> +1 to move these entries.
>>>
>>> On Wed, Mar 4, 2020 at 4:27 AM David Smiley  
>>> wrote:

 I'll simply move these items around tomorrow this time, unless I hear 
 feedback to the contrary.

 ~ David Smiley
 Apache Lucene/Solr Search Developer
 http://www.linkedin.com/in/davidwsmiley


 On Mon, Mar 2, 2020 at 1:07 PM David Smiley  
 wrote:
>
> I'd like us to reflect on how we categorize issues in CHANGES.txt.  We 
> have these categories:
> (Lucene) 'API Changes', 'New Features', 'Improvements', 'Optimizations', 
> 'Bug Fixes', 'Other'
> (Solr) 'New Features', 'Improvements', 'Optimizations', 'Bug Fixes', 
> 'Other Changes'
> (I lifted these from dev-tools/scripts/addVersion.py line 215)
>
> In particular, I'm often surprised at how some of us categorize New 
> Features or Improvements that should better be categorized as something 
> else.  I think the root cause of these problems may be that we don't have 
> JIRA categories that directly align.  Furthermore, our dev practices will 
> typically result in a CHANGES.txt being added out of band from the 
> code-review process, and thus no peer-review on ideal placement.  
> Furthermore the message itself is often not code reviewed but should be.  
> Perhaps we can simply get in the habit of adding a JIRA comment (or GH 
> code review) what we propose the category & issue summary should be.
>
> Here is my attempt at a definition for _some_ of these categories.  I 
> don't pretend to think we all agree 100% but it's up for discussion:
> 
> * New Features:  A user-visible new capability.  Usually opt-in.
>
> * Improvements:  A user-visible improvement to an existing capability 
> that somehow expands its ability or that which improves the behavior.  
> Not a refactoring, not an optimization.
>
> * Optimizations: Something is now more efficient.  Usually automatic (not 
> opt-in).
>
> * Other:  Anything else: Refactorings, tests, build, docs, etc.  And 
> adding log statements.
> 
>
> I recommend the following changes to Lucene 8.5:
>
> These are "Improvements" that I think are better categorized as 
> "Optimizations"
> * LUCENE-9211: Add compression for Binary doc value fields. (Mark Harwood)
> * LUCENE-4702: Better compression of terms dictionaries. (Adrien Grand)
> * LUCENE-9228: Sort dvUpdates in the term order before applying if they 
> all update a
>   single field to the same value. This optimization can reduce the flush 
> time by around
>   20% for the docValues update user cases. (Nhat Nguyen, Adrien Grand, 
> Simon Willnauer)
> * LUCENE-9245: Reduce AutomatonTermsEnum memory usage. (Bruno Roustant, 
> Robert Muir)
> * LUCENE-9237: Faster UniformSplit intersect TermsEnum. (Bruno Roustant)
>
> These "Improvements" I think are better categorized as "Other":
> * LUCENE-9109: Backport some changes from master (except StackWalker) to 
> improve
>   TestSecurityManager (Uwe Schindler)
> * LUCENE-9110: Backport refactored stack analysis in tests to use 
> generalized
>   LuceneTestCase methods (Uwe Schindler)
> * LUCENE-9141: Simplify LatLonShapeXQuery API by adding a new abstract 
> class called LatLonGeometry. Queries are
>   executed with input objects that extend such interface. (Ignacio Vera)
> * LUCENE-9194: Simplify XYShapeXQuery API by adding a new abstract class 
> called XYGeometry. Queries are
>   executed with input objects that extend such interface. (Ignacio Vera)
>
> Maybe this "Other" item should be  "Optimization"? (not sure):
> * LUCENE-9068: FuzzyQuery builds its Automaton up-front (Alan Woodward, 
> Mike Drob)
>

Re: Inconsistent query results in Lucene 8.1.0

2020-03-06 Thread Michele Palmia
Hi all,

I looked into this today. I can reproduce it and I believe it's a bug.
This is caused by the following working together:
- LUCENE-7386  Flatten
nested disjunctions
- LUCENE-7925 
Deduplicate SHOULD and MUST clauses in BooleanQuery

Blended term queries modify the df/ttf of their terms to make sure all
terms produce identical scores. In this case, two blended term queries
contain a few terms each, only some of which overlap. The two queries
calculate different df/ttf for their terms respectively, since the two sets
are different. During the rewrite process,

   1. the two Blended queries get rewritten as Boolean queries themselves,
   with each (modified) TermQuery as a SHOULD clause
   2. the nested Boolean queries get flattened, since they are nested
   disjunctions
   3. the Term queries (some of which are actually Boost queries) are
   deduplicated, with one of the two TermQuery and its modified TermStates
   being picked at random (the randomness is due to the HashSet underlying
   Lucene's MultiSet).

I haven't managed to create a failing test yet, I'll share it when I have
one ready.
If anybody has suggestions or pointers on how this should be fixed, I'm
also happy to provide a patch - I'm just a bit clueless what the right
thing to do would be here: I have a feeling (2.) should not happen for
(rewritten) Blended Queries?

Cheers,
Michele


On Tue, Mar 3, 2020 at 7:55 PM Fiona Hasanaj  wrote:

> Hello,
>
> I’m Fiona with Basis Technology. We’re investigating what we believe to be
> a bug involving inconsistent query results. We have binary searched this
> issue and found that it specifically appears when flattening nested
> disjunctions was introduced with the merge of LUCENE-7386
> . In order to
> reproduce the issue, I have attached a Lucene index built in Lucene 8.1.0
> as names_index.tar.gz and if you run the attached Java class
> (LuceneSearchIndex.java) multiple times against Lucene 8.0.0 you'll see the
> max_score is the same between runs whereas if you run it against Lucene
> 8.1.0 you'll see inconsistent max_score between runs (try a max of 10 runs
> and you should be able to see that sometimes it returns max_score of
> 1.8651859 and sometimes 2.1415303).
>
> From debugging in Lucene 8.1.0, the query against the name index before
> flattening its nested disjunctions looks like below:
>
> (((bt_rni_name_encoded_1:ALFR)^0.75 bt_rni_name_encoded_1:ALTR 
> (bt_rni_name_encoded_1:ANTR)^0.75 (bt_rni_name_encoded_1:LTR)^0.666) 
> ((bt_rni_name_encoded_1:ALTR)^0.75 (bt_rni_name_encoded_1:FLTMR)^0.75 
> (bt_rni_name_encoded_1:FLTRN)^0.75 (bt_rni_name_encoded_1:FLTS)^0.75 
> (bt_rni_name_encoded_1:FTR)^0.666 (bt_rni_name_encoded_1:LTR)^0.666)) 
> | (((bt_rni_name_encoded_2:FLTR)^0.75) (bt_rni_name_encoded_2:FLTR 
> (bt_rni_name_encoded_2:FLTRN)^0.75))
>
>
> The term that's causing the difference in the final score is
> bt_rni_name_encoded_1:ALTR and as we can see in the above query, it shows
> twice nested under different clauses: in the first clause that it occurs
> the docFreq for it is 3, and for the same term but in the second clause
> that it appears in, its docFreq is 2. This happens in Lucene 8.0.0 as well; 
> *is
> a term being read with different docFreq values expected behaviour? *
>
> After flattening the nested disjunctions (part of query rewrite process),
> the query looks like below:
>
> ((bt_rni_name_encoded_1:FTR)^0.666 (bt_rni_name_encoded_1:FLTRN)^0.75 
> (bt_rni_name_encoded_1:FLTMR)^0.75 (bt_rni_name_encoded_1:ALFR)^0.75 
> (bt_rni_name_encoded_1:FLTS)^0.75 (bt_rni_name_encoded_1:ANTR)^0.75 
> (bt_rni_name_encoded_1:LTR)^1.333 (bt_rni_name_encoded_1:ALTR)^1.75) | 
> ((bt_rni_name_encoded_2:FLTRN)^0.75 (bt_rni_name_encoded_2:FLTR)^1.75)
>
>
> As you can see, bt_rni_name_encoded_1:ALTR shows only once, but the weight
> has been summed up from the original query. This is the version of the
> query that actually gets used, and the docFreq here for the
> bt_rni_name_encoded_1:ALTR term sometimes it shows as 3 and sometimes it
> shows as 2 between runs and final score changes accordingly to that. *Is
> this "coin toss" pick of docFreq for the same term expected behaviour? *
>
> Looks like the issue stems from one of the behaviours observed and
> highlighted in bold.
>
> Looking forward to hearing back from you.
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org


Re: Welcome Nhat Nguyen to the PMC

2020-03-06 Thread Joel Bernstein
Welcome Nhat!

Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Mar 5, 2020 at 2:03 PM Nhat Nguyen 
wrote:

> Thank you very much for the warm welcome!
>
> On Wed, Mar 4, 2020 at 10:59 AM Jan Høydahl  wrote:
>
>> Welcome Nhat!
>>
>> Jan
>>
>> > 3. mar. 2020 kl. 17:34 skrev Adrien Grand :
>> >
>> > I am pleased to announce that Nhat Nguyen has accepted the PMC's
>> invitation to join.
>> >
>> > Welcome Nhat!
>> >
>> > --
>> > Adrien
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>