[jira] [Commented] (SOLR-13126) Inconsistent score in debug and result with multiple multiplicative boosts

2019-02-14 Thread Thomas Aglassinger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768378#comment-16768378
 ] 

Thomas Aglassinger commented on SOLR-13126:
---

[~mkhludnev], I still should have my test environment around but I'm currently 
in a scrum sprint working on some other tasks. But I should be able to take a 
look at it early on next week and integrate a patched Solr in a SAP commerce 
environment for testing. BTW our hackish fix has been in production use for a 
couple of weeks and has been working fine so far. AFAIU your fix does the same 
in a cleaner way, so the general direction of the suggested solution seems 
promising.

> Inconsistent score in debug and result with multiple multiplicative boosts
> --
>
> Key: SOLR-13126
> URL: https://issues.apache.org/jira/browse/SOLR-13126
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: search
>Affects Versions: 7.5.0
> Environment: Reproduced with macOS 10.14.1, a quick test with Windows 
> 10 showed the same result.
>Reporter: Thomas Aglassinger
>Priority: Major
> Attachments: 
> 0001-use-deprecated-classes-to-fix-regression-introduced-.patch, 
> 0002-SOLR-13126-Added-test-case.patch, 2019-02-14_1715.png, SOLR-13126.patch, 
> debugQuery.json, image-2019-02-13-16-17-56-272.png, screenshot-1.png, 
> solr_match_neither_nextteil_nor_sony.json, 
> solr_match_neither_nextteil_nor_sony.txt, solr_match_netzteil_and_sony.json, 
> solr_match_netzteil_and_sony.txt, solr_match_netzteil_only.json, 
> solr_match_netzteil_only.txt
>
>
> Under certain circumstances search results from queries with multiple 
> multiplicative boosts using the Solr functions {{product()}} and {{query()}} 
> result in a score that is inconsistent with the one from the debugQuery 
> information. Also only the debug score is correct while the actual search 
> results show a wrong score.
> This seems somewhat similar to the behaviour described in 
> https://issues.apache.org/jira/browse/LUCENE-7132, though this issue has been 
> resolved a while ago.
> A little background: we are using Solr as a search platform for the 
> e-commerce framework SAP Hybris. There the shop administrator can create 
> multiplicative boost rules (see below for an example) where a value like 2.0 
> means that an item gets boosted to 200%. This works fine in the demo shop 
> distributed by SAP but breaks in our shop. We encountered the issue when 
> Upgrading from Solr 7.2.1 / Hybris 6.7 to Solr 7.5 / Hybris 18.8.3 (which 
> would have been named Hybris 6.8 but the version naming schema changed).
> We reduced the Solr query generated by Hybris to the relevant parts and could 
> reproduce the issue in the Solr admin without any Hybris connection.
> I attached the JSON result of a test query but here's a description of the 
> parts that seemed most relevant to me.
> The {{responseHeader.params}} reads (slightly rearranged):
> {code:java}
> "q":"{!boost b=$ymb}(+{!lucene v=$yq})",
> "ymb":"product(query({!v=\"name_text_de\\:Netzteil\\^=2.0\"},1),query({!v=\"name_text_de\\:Sony\\^=3.0\"},1))",
> "yq":"*:*",
> "sort":"score desc",
> "debugQuery":"true",
> // Added to keep the output small but probably unrelated to the actual issue
> "fl":"score,id,code_string,name_text_de",
> "fq":"catalogId:\"someProducts\"",
> "rows":"10",
> {code}
> This example boosts the German product name (field {{name_text_de}}) in case 
> in contains certain terms:
>  * "Netzteil" (power supply) is boosted to 200%
>  * "Sony" is boosted to 300%
> Consequently a product containing both terms should be boosted to 600%.
> Also the query function has the value 1 specified as default in case the name 
> does not contain the respective term resulting in a pseudo boost that 
> preserves the score.
> According to the debug information the parser used is the LuceneQParser, 
> which translates this to the following parsed query:
> {quote}FunctionScoreQuery(FunctionScoreQuery(+*:*, scored by 
> boost(product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0)
> {quote}
> And the translated boost is:
> {quote}org.apache.lucene.queries.function.valuesource.ProductFloatFunction:product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0))
> {quote}
> When taking a look at the search result, among other the following products 
> are included (see the JSON comments for an analysis of each result):
> {code:javascript}
>  {
> "id":"someProducts/Online/test711",
> "name_text_de":"Original Sony Vaio Netzteil",
> "code_string":"test711",
> // CORRECT, both "Netzteil" 

[jira] [Commented] (SOLR-13126) Inconsistent score in debug and result with multiple multiplicative boosts

2019-02-14 Thread Aliaksandr Asiptsou (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768301#comment-16768301
 ] 

Aliaksandr Asiptsou commented on SOLR-13126:


Hi [~mkhludnev] ,

We've verified our case (see comment from [~Tom Burgmans]  
https://issues.apache.org/jira/browse/SOLR-13126?focusedCommentId=16767308=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16767308)
 and found it working fine with the latest patch.

query:

 
{code:java}
http://localhost:8983/solr/OTSTests_A/select?defType=edismax=*,score,[explain%20style=text]=title:tax=sum(3,query($BoostQuery))={!edismax%20boost=3%20bf=}min_term:(%22xxxdoesnexistxxx%22)
{code}
result:

 

!2019-02-14_1715.png!

The values in _score_ and _explain_ are now the same.

> Inconsistent score in debug and result with multiple multiplicative boosts
> --
>
> Key: SOLR-13126
> URL: https://issues.apache.org/jira/browse/SOLR-13126
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: search
>Affects Versions: 7.5.0
> Environment: Reproduced with macOS 10.14.1, a quick test with Windows 
> 10 showed the same result.
>Reporter: Thomas Aglassinger
>Priority: Major
> Attachments: 
> 0001-use-deprecated-classes-to-fix-regression-introduced-.patch, 
> 0002-SOLR-13126-Added-test-case.patch, 2019-02-14_1715.png, SOLR-13126.patch, 
> debugQuery.json, image-2019-02-13-16-17-56-272.png, screenshot-1.png, 
> solr_match_neither_nextteil_nor_sony.json, 
> solr_match_neither_nextteil_nor_sony.txt, solr_match_netzteil_and_sony.json, 
> solr_match_netzteil_and_sony.txt, solr_match_netzteil_only.json, 
> solr_match_netzteil_only.txt
>
>
> Under certain circumstances search results from queries with multiple 
> multiplicative boosts using the Solr functions {{product()}} and {{query()}} 
> result in a score that is inconsistent with the one from the debugQuery 
> information. Also only the debug score is correct while the actual search 
> results show a wrong score.
> This seems somewhat similar to the behaviour described in 
> https://issues.apache.org/jira/browse/LUCENE-7132, though this issue has been 
> resolved a while ago.
> A little background: we are using Solr as a search platform for the 
> e-commerce framework SAP Hybris. There the shop administrator can create 
> multiplicative boost rules (see below for an example) where a value like 2.0 
> means that an item gets boosted to 200%. This works fine in the demo shop 
> distributed by SAP but breaks in our shop. We encountered the issue when 
> Upgrading from Solr 7.2.1 / Hybris 6.7 to Solr 7.5 / Hybris 18.8.3 (which 
> would have been named Hybris 6.8 but the version naming schema changed).
> We reduced the Solr query generated by Hybris to the relevant parts and could 
> reproduce the issue in the Solr admin without any Hybris connection.
> I attached the JSON result of a test query but here's a description of the 
> parts that seemed most relevant to me.
> The {{responseHeader.params}} reads (slightly rearranged):
> {code:java}
> "q":"{!boost b=$ymb}(+{!lucene v=$yq})",
> "ymb":"product(query({!v=\"name_text_de\\:Netzteil\\^=2.0\"},1),query({!v=\"name_text_de\\:Sony\\^=3.0\"},1))",
> "yq":"*:*",
> "sort":"score desc",
> "debugQuery":"true",
> // Added to keep the output small but probably unrelated to the actual issue
> "fl":"score,id,code_string,name_text_de",
> "fq":"catalogId:\"someProducts\"",
> "rows":"10",
> {code}
> This example boosts the German product name (field {{name_text_de}}) in case 
> in contains certain terms:
>  * "Netzteil" (power supply) is boosted to 200%
>  * "Sony" is boosted to 300%
> Consequently a product containing both terms should be boosted to 600%.
> Also the query function has the value 1 specified as default in case the name 
> does not contain the respective term resulting in a pseudo boost that 
> preserves the score.
> According to the debug information the parser used is the LuceneQParser, 
> which translates this to the following parsed query:
> {quote}FunctionScoreQuery(FunctionScoreQuery(+*:*, scored by 
> boost(product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0)
> {quote}
> And the translated boost is:
> {quote}org.apache.lucene.queries.function.valuesource.ProductFloatFunction:product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0))
> {quote}
> When taking a look at the search result, among other the following products 
> are included (see the JSON comments for an analysis of each result):
> {code:javascript}
>  {
> "id":"someProducts/Online/test711",
> 

[jira] [Commented] (SOLR-13126) Inconsistent score in debug and result with multiple multiplicative boosts

2019-02-14 Thread Mikhail Khludnev (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768024#comment-16768024
 ] 

Mikhail Khludnev commented on SOLR-13126:
-

[~roskakori], got it. Do you have an ability to verify the fix 
[^SOLR-13126.patch]?

> Inconsistent score in debug and result with multiple multiplicative boosts
> --
>
> Key: SOLR-13126
> URL: https://issues.apache.org/jira/browse/SOLR-13126
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: search
>Affects Versions: 7.5.0
> Environment: Reproduced with macOS 10.14.1, a quick test with Windows 
> 10 showed the same result.
>Reporter: Thomas Aglassinger
>Priority: Major
> Attachments: 
> 0001-use-deprecated-classes-to-fix-regression-introduced-.patch, 
> 0002-SOLR-13126-Added-test-case.patch, SOLR-13126.patch, debugQuery.json, 
> image-2019-02-13-16-17-56-272.png, screenshot-1.png, 
> solr_match_neither_nextteil_nor_sony.json, 
> solr_match_neither_nextteil_nor_sony.txt, solr_match_netzteil_and_sony.json, 
> solr_match_netzteil_and_sony.txt, solr_match_netzteil_only.json, 
> solr_match_netzteil_only.txt
>
>
> Under certain circumstances search results from queries with multiple 
> multiplicative boosts using the Solr functions {{product()}} and {{query()}} 
> result in a score that is inconsistent with the one from the debugQuery 
> information. Also only the debug score is correct while the actual search 
> results show a wrong score.
> This seems somewhat similar to the behaviour described in 
> https://issues.apache.org/jira/browse/LUCENE-7132, though this issue has been 
> resolved a while ago.
> A little background: we are using Solr as a search platform for the 
> e-commerce framework SAP Hybris. There the shop administrator can create 
> multiplicative boost rules (see below for an example) where a value like 2.0 
> means that an item gets boosted to 200%. This works fine in the demo shop 
> distributed by SAP but breaks in our shop. We encountered the issue when 
> Upgrading from Solr 7.2.1 / Hybris 6.7 to Solr 7.5 / Hybris 18.8.3 (which 
> would have been named Hybris 6.8 but the version naming schema changed).
> We reduced the Solr query generated by Hybris to the relevant parts and could 
> reproduce the issue in the Solr admin without any Hybris connection.
> I attached the JSON result of a test query but here's a description of the 
> parts that seemed most relevant to me.
> The {{responseHeader.params}} reads (slightly rearranged):
> {code:java}
> "q":"{!boost b=$ymb}(+{!lucene v=$yq})",
> "ymb":"product(query({!v=\"name_text_de\\:Netzteil\\^=2.0\"},1),query({!v=\"name_text_de\\:Sony\\^=3.0\"},1))",
> "yq":"*:*",
> "sort":"score desc",
> "debugQuery":"true",
> // Added to keep the output small but probably unrelated to the actual issue
> "fl":"score,id,code_string,name_text_de",
> "fq":"catalogId:\"someProducts\"",
> "rows":"10",
> {code}
> This example boosts the German product name (field {{name_text_de}}) in case 
> in contains certain terms:
>  * "Netzteil" (power supply) is boosted to 200%
>  * "Sony" is boosted to 300%
> Consequently a product containing both terms should be boosted to 600%.
> Also the query function has the value 1 specified as default in case the name 
> does not contain the respective term resulting in a pseudo boost that 
> preserves the score.
> According to the debug information the parser used is the LuceneQParser, 
> which translates this to the following parsed query:
> {quote}FunctionScoreQuery(FunctionScoreQuery(+*:*, scored by 
> boost(product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0)
> {quote}
> And the translated boost is:
> {quote}org.apache.lucene.queries.function.valuesource.ProductFloatFunction:product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0))
> {quote}
> When taking a look at the search result, among other the following products 
> are included (see the JSON comments for an analysis of each result):
> {code:javascript}
>  {
> "id":"someProducts/Online/test711",
> "name_text_de":"Original Sony Vaio Netzteil",
> "code_string":"test711",
> // CORRECT, both "Netzteil" and "Sony" are included in the name
> "score":6.0},
>   {
> "id":"someProducts/Online/taxTestingProductThree",
> "name_text_de":"Steuertestprodukt Zwei",
> "code_string":"taxTestingProductThree",
> // CORRECT, neither "Netzteil" nor "Sony" are included in the name
> "score":1.0},
>   {
> "id":"someProducts/Online/79785630",
> 

[jira] [Commented] (SOLR-13126) Inconsistent score in debug and result with multiple multiplicative boosts

2019-02-14 Thread Thomas Aglassinger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768021#comment-16768021
 ] 

Thomas Aglassinger commented on SOLR-13126:
---

[~mkhludnev]: The values and queries for the test cases are simplified versions 
of the situation we encountered in production. The actual query is a lot more 
complex and generated by an [e-commerce framework from 
SAP|https://www.sap.com/products/crm/e-commerce-platforms/technical-information.html].

Does is this answer your question or do you need more information on a specific 
part?

> Inconsistent score in debug and result with multiple multiplicative boosts
> --
>
> Key: SOLR-13126
> URL: https://issues.apache.org/jira/browse/SOLR-13126
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: search
>Affects Versions: 7.5.0
> Environment: Reproduced with macOS 10.14.1, a quick test with Windows 
> 10 showed the same result.
>Reporter: Thomas Aglassinger
>Priority: Major
> Attachments: 
> 0001-use-deprecated-classes-to-fix-regression-introduced-.patch, 
> 0002-SOLR-13126-Added-test-case.patch, SOLR-13126.patch, debugQuery.json, 
> image-2019-02-13-16-17-56-272.png, screenshot-1.png, 
> solr_match_neither_nextteil_nor_sony.json, 
> solr_match_neither_nextteil_nor_sony.txt, solr_match_netzteil_and_sony.json, 
> solr_match_netzteil_and_sony.txt, solr_match_netzteil_only.json, 
> solr_match_netzteil_only.txt
>
>
> Under certain circumstances search results from queries with multiple 
> multiplicative boosts using the Solr functions {{product()}} and {{query()}} 
> result in a score that is inconsistent with the one from the debugQuery 
> information. Also only the debug score is correct while the actual search 
> results show a wrong score.
> This seems somewhat similar to the behaviour described in 
> https://issues.apache.org/jira/browse/LUCENE-7132, though this issue has been 
> resolved a while ago.
> A little background: we are using Solr as a search platform for the 
> e-commerce framework SAP Hybris. There the shop administrator can create 
> multiplicative boost rules (see below for an example) where a value like 2.0 
> means that an item gets boosted to 200%. This works fine in the demo shop 
> distributed by SAP but breaks in our shop. We encountered the issue when 
> Upgrading from Solr 7.2.1 / Hybris 6.7 to Solr 7.5 / Hybris 18.8.3 (which 
> would have been named Hybris 6.8 but the version naming schema changed).
> We reduced the Solr query generated by Hybris to the relevant parts and could 
> reproduce the issue in the Solr admin without any Hybris connection.
> I attached the JSON result of a test query but here's a description of the 
> parts that seemed most relevant to me.
> The {{responseHeader.params}} reads (slightly rearranged):
> {code:java}
> "q":"{!boost b=$ymb}(+{!lucene v=$yq})",
> "ymb":"product(query({!v=\"name_text_de\\:Netzteil\\^=2.0\"},1),query({!v=\"name_text_de\\:Sony\\^=3.0\"},1))",
> "yq":"*:*",
> "sort":"score desc",
> "debugQuery":"true",
> // Added to keep the output small but probably unrelated to the actual issue
> "fl":"score,id,code_string,name_text_de",
> "fq":"catalogId:\"someProducts\"",
> "rows":"10",
> {code}
> This example boosts the German product name (field {{name_text_de}}) in case 
> in contains certain terms:
>  * "Netzteil" (power supply) is boosted to 200%
>  * "Sony" is boosted to 300%
> Consequently a product containing both terms should be boosted to 600%.
> Also the query function has the value 1 specified as default in case the name 
> does not contain the respective term resulting in a pseudo boost that 
> preserves the score.
> According to the debug information the parser used is the LuceneQParser, 
> which translates this to the following parsed query:
> {quote}FunctionScoreQuery(FunctionScoreQuery(+*:*, scored by 
> boost(product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0)
> {quote}
> And the translated boost is:
> {quote}org.apache.lucene.queries.function.valuesource.ProductFloatFunction:product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0))
> {quote}
> When taking a look at the search result, among other the following products 
> are included (see the JSON comments for an analysis of each result):
> {code:javascript}
>  {
> "id":"someProducts/Online/test711",
> "name_text_de":"Original Sony Vaio Netzteil",
> "code_string":"test711",
> // CORRECT, both "Netzteil" and "Sony" are included in the name
> "score":6.0},
>   {
> 

[jira] [Commented] (SOLR-13126) Inconsistent score in debug and result with multiple multiplicative boosts

2019-02-14 Thread Mikhail Khludnev (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768001#comment-16768001
 ] 

Mikhail Khludnev commented on SOLR-13126:
-

It seems to me that LUCENE-8099 replaced the plain multiplicative boost at JS 
with conditional logic {{DoubleValues.withDefault()}}, which drops boost score 
when one one of multiplied queries doesn't match to a doc. see 
[^SOLR-13126.patch]

> Inconsistent score in debug and result with multiple multiplicative boosts
> --
>
> Key: SOLR-13126
> URL: https://issues.apache.org/jira/browse/SOLR-13126
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: search
>Affects Versions: 7.5.0
> Environment: Reproduced with macOS 10.14.1, a quick test with Windows 
> 10 showed the same result.
>Reporter: Thomas Aglassinger
>Priority: Major
> Attachments: 
> 0001-use-deprecated-classes-to-fix-regression-introduced-.patch, 
> 0002-SOLR-13126-Added-test-case.patch, SOLR-13126.patch, debugQuery.json, 
> image-2019-02-13-16-17-56-272.png, screenshot-1.png, 
> solr_match_neither_nextteil_nor_sony.json, 
> solr_match_neither_nextteil_nor_sony.txt, solr_match_netzteil_and_sony.json, 
> solr_match_netzteil_and_sony.txt, solr_match_netzteil_only.json, 
> solr_match_netzteil_only.txt
>
>
> Under certain circumstances search results from queries with multiple 
> multiplicative boosts using the Solr functions {{product()}} and {{query()}} 
> result in a score that is inconsistent with the one from the debugQuery 
> information. Also only the debug score is correct while the actual search 
> results show a wrong score.
> This seems somewhat similar to the behaviour described in 
> https://issues.apache.org/jira/browse/LUCENE-7132, though this issue has been 
> resolved a while ago.
> A little background: we are using Solr as a search platform for the 
> e-commerce framework SAP Hybris. There the shop administrator can create 
> multiplicative boost rules (see below for an example) where a value like 2.0 
> means that an item gets boosted to 200%. This works fine in the demo shop 
> distributed by SAP but breaks in our shop. We encountered the issue when 
> Upgrading from Solr 7.2.1 / Hybris 6.7 to Solr 7.5 / Hybris 18.8.3 (which 
> would have been named Hybris 6.8 but the version naming schema changed).
> We reduced the Solr query generated by Hybris to the relevant parts and could 
> reproduce the issue in the Solr admin without any Hybris connection.
> I attached the JSON result of a test query but here's a description of the 
> parts that seemed most relevant to me.
> The {{responseHeader.params}} reads (slightly rearranged):
> {code:java}
> "q":"{!boost b=$ymb}(+{!lucene v=$yq})",
> "ymb":"product(query({!v=\"name_text_de\\:Netzteil\\^=2.0\"},1),query({!v=\"name_text_de\\:Sony\\^=3.0\"},1))",
> "yq":"*:*",
> "sort":"score desc",
> "debugQuery":"true",
> // Added to keep the output small but probably unrelated to the actual issue
> "fl":"score,id,code_string,name_text_de",
> "fq":"catalogId:\"someProducts\"",
> "rows":"10",
> {code}
> This example boosts the German product name (field {{name_text_de}}) in case 
> in contains certain terms:
>  * "Netzteil" (power supply) is boosted to 200%
>  * "Sony" is boosted to 300%
> Consequently a product containing both terms should be boosted to 600%.
> Also the query function has the value 1 specified as default in case the name 
> does not contain the respective term resulting in a pseudo boost that 
> preserves the score.
> According to the debug information the parser used is the LuceneQParser, 
> which translates this to the following parsed query:
> {quote}FunctionScoreQuery(FunctionScoreQuery(+*:*, scored by 
> boost(product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0)
> {quote}
> And the translated boost is:
> {quote}org.apache.lucene.queries.function.valuesource.ProductFloatFunction:product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0))
> {quote}
> When taking a look at the search result, among other the following products 
> are included (see the JSON comments for an analysis of each result):
> {code:javascript}
>  {
> "id":"someProducts/Online/test711",
> "name_text_de":"Original Sony Vaio Netzteil",
> "code_string":"test711",
> // CORRECT, both "Netzteil" and "Sony" are included in the name
> "score":6.0},
>   {
> "id":"someProducts/Online/taxTestingProductThree",
> "name_text_de":"Steuertestprodukt Zwei",
> "code_string":"taxTestingProductThree",
> // CORRECT, neither 

[jira] [Commented] (SOLR-13126) Inconsistent score in debug and result with multiple multiplicative boosts

2019-02-13 Thread Tobias Ibounig (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16767853#comment-16767853
 ] 

Tobias Ibounig commented on SOLR-13126:
---

[~mkhludnev]

Testcase was derived from SampleTest where these configs are used, we just 
didn't change them.

https://github.com/apache/lucene-solr/blob/master/solr/core/src/test/org/apache/solr/SampleTest.java

> Inconsistent score in debug and result with multiple multiplicative boosts
> --
>
> Key: SOLR-13126
> URL: https://issues.apache.org/jira/browse/SOLR-13126
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: search
>Affects Versions: 7.5.0
> Environment: Reproduced with macOS 10.14.1, a quick test with Windows 
> 10 showed the same result.
>Reporter: Thomas Aglassinger
>Priority: Major
> Attachments: 
> 0001-use-deprecated-classes-to-fix-regression-introduced-.patch, 
> 0002-SOLR-13126-Added-test-case.patch, SOLR-13126.patch, debugQuery.json, 
> image-2019-02-13-16-17-56-272.png, screenshot-1.png, 
> solr_match_neither_nextteil_nor_sony.json, 
> solr_match_neither_nextteil_nor_sony.txt, solr_match_netzteil_and_sony.json, 
> solr_match_netzteil_and_sony.txt, solr_match_netzteil_only.json, 
> solr_match_netzteil_only.txt
>
>
> Under certain circumstances search results from queries with multiple 
> multiplicative boosts using the Solr functions {{product()}} and {{query()}} 
> result in a score that is inconsistent with the one from the debugQuery 
> information. Also only the debug score is correct while the actual search 
> results show a wrong score.
> This seems somewhat similar to the behaviour described in 
> https://issues.apache.org/jira/browse/LUCENE-7132, though this issue has been 
> resolved a while ago.
> A little background: we are using Solr as a search platform for the 
> e-commerce framework SAP Hybris. There the shop administrator can create 
> multiplicative boost rules (see below for an example) where a value like 2.0 
> means that an item gets boosted to 200%. This works fine in the demo shop 
> distributed by SAP but breaks in our shop. We encountered the issue when 
> Upgrading from Solr 7.2.1 / Hybris 6.7 to Solr 7.5 / Hybris 18.8.3 (which 
> would have been named Hybris 6.8 but the version naming schema changed).
> We reduced the Solr query generated by Hybris to the relevant parts and could 
> reproduce the issue in the Solr admin without any Hybris connection.
> I attached the JSON result of a test query but here's a description of the 
> parts that seemed most relevant to me.
> The {{responseHeader.params}} reads (slightly rearranged):
> {code:java}
> "q":"{!boost b=$ymb}(+{!lucene v=$yq})",
> "ymb":"product(query({!v=\"name_text_de\\:Netzteil\\^=2.0\"},1),query({!v=\"name_text_de\\:Sony\\^=3.0\"},1))",
> "yq":"*:*",
> "sort":"score desc",
> "debugQuery":"true",
> // Added to keep the output small but probably unrelated to the actual issue
> "fl":"score,id,code_string,name_text_de",
> "fq":"catalogId:\"someProducts\"",
> "rows":"10",
> {code}
> This example boosts the German product name (field {{name_text_de}}) in case 
> in contains certain terms:
>  * "Netzteil" (power supply) is boosted to 200%
>  * "Sony" is boosted to 300%
> Consequently a product containing both terms should be boosted to 600%.
> Also the query function has the value 1 specified as default in case the name 
> does not contain the respective term resulting in a pseudo boost that 
> preserves the score.
> According to the debug information the parser used is the LuceneQParser, 
> which translates this to the following parsed query:
> {quote}FunctionScoreQuery(FunctionScoreQuery(+*:*, scored by 
> boost(product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0)
> {quote}
> And the translated boost is:
> {quote}org.apache.lucene.queries.function.valuesource.ProductFloatFunction:product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0))
> {quote}
> When taking a look at the search result, among other the following products 
> are included (see the JSON comments for an analysis of each result):
> {code:javascript}
>  {
> "id":"someProducts/Online/test711",
> "name_text_de":"Original Sony Vaio Netzteil",
> "code_string":"test711",
> // CORRECT, both "Netzteil" and "Sony" are included in the name
> "score":6.0},
>   {
> "id":"someProducts/Online/taxTestingProductThree",
> "name_text_de":"Steuertestprodukt Zwei",
> "code_string":"taxTestingProductThree",
> // CORRECT, neither "Netzteil" nor "Sony" are included in 

[jira] [Commented] (SOLR-13126) Inconsistent score in debug and result with multiple multiplicative boosts

2019-02-13 Thread Mikhail Khludnev (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16767647#comment-16767647
 ] 

Mikhail Khludnev commented on SOLR-13126:
-

[~roskakori], thanks for the representative test. I'm just wondering about 
configs. Why these ones? 
[~romseygeek], wdyt abt [^SOLR-13126.patch]? 

> Inconsistent score in debug and result with multiple multiplicative boosts
> --
>
> Key: SOLR-13126
> URL: https://issues.apache.org/jira/browse/SOLR-13126
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: search
>Affects Versions: 7.5.0
> Environment: Reproduced with macOS 10.14.1, a quick test with Windows 
> 10 showed the same result.
>Reporter: Thomas Aglassinger
>Priority: Major
> Attachments: 
> 0001-use-deprecated-classes-to-fix-regression-introduced-.patch, 
> 0002-SOLR-13126-Added-test-case.patch, SOLR-13126.patch, debugQuery.json, 
> image-2019-02-13-16-17-56-272.png, screenshot-1.png, 
> solr_match_neither_nextteil_nor_sony.json, 
> solr_match_neither_nextteil_nor_sony.txt, solr_match_netzteil_and_sony.json, 
> solr_match_netzteil_and_sony.txt, solr_match_netzteil_only.json, 
> solr_match_netzteil_only.txt
>
>
> Under certain circumstances search results from queries with multiple 
> multiplicative boosts using the Solr functions {{product()}} and {{query()}} 
> result in a score that is inconsistent with the one from the debugQuery 
> information. Also only the debug score is correct while the actual search 
> results show a wrong score.
> This seems somewhat similar to the behaviour described in 
> https://issues.apache.org/jira/browse/LUCENE-7132, though this issue has been 
> resolved a while ago.
> A little background: we are using Solr as a search platform for the 
> e-commerce framework SAP Hybris. There the shop administrator can create 
> multiplicative boost rules (see below for an example) where a value like 2.0 
> means that an item gets boosted to 200%. This works fine in the demo shop 
> distributed by SAP but breaks in our shop. We encountered the issue when 
> Upgrading from Solr 7.2.1 / Hybris 6.7 to Solr 7.5 / Hybris 18.8.3 (which 
> would have been named Hybris 6.8 but the version naming schema changed).
> We reduced the Solr query generated by Hybris to the relevant parts and could 
> reproduce the issue in the Solr admin without any Hybris connection.
> I attached the JSON result of a test query but here's a description of the 
> parts that seemed most relevant to me.
> The {{responseHeader.params}} reads (slightly rearranged):
> {code:java}
> "q":"{!boost b=$ymb}(+{!lucene v=$yq})",
> "ymb":"product(query({!v=\"name_text_de\\:Netzteil\\^=2.0\"},1),query({!v=\"name_text_de\\:Sony\\^=3.0\"},1))",
> "yq":"*:*",
> "sort":"score desc",
> "debugQuery":"true",
> // Added to keep the output small but probably unrelated to the actual issue
> "fl":"score,id,code_string,name_text_de",
> "fq":"catalogId:\"someProducts\"",
> "rows":"10",
> {code}
> This example boosts the German product name (field {{name_text_de}}) in case 
> in contains certain terms:
>  * "Netzteil" (power supply) is boosted to 200%
>  * "Sony" is boosted to 300%
> Consequently a product containing both terms should be boosted to 600%.
> Also the query function has the value 1 specified as default in case the name 
> does not contain the respective term resulting in a pseudo boost that 
> preserves the score.
> According to the debug information the parser used is the LuceneQParser, 
> which translates this to the following parsed query:
> {quote}FunctionScoreQuery(FunctionScoreQuery(+*:*, scored by 
> boost(product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0)
> {quote}
> And the translated boost is:
> {quote}org.apache.lucene.queries.function.valuesource.ProductFloatFunction:product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0))
> {quote}
> When taking a look at the search result, among other the following products 
> are included (see the JSON comments for an analysis of each result):
> {code:javascript}
>  {
> "id":"someProducts/Online/test711",
> "name_text_de":"Original Sony Vaio Netzteil",
> "code_string":"test711",
> // CORRECT, both "Netzteil" and "Sony" are included in the name
> "score":6.0},
>   {
> "id":"someProducts/Online/taxTestingProductThree",
> "name_text_de":"Steuertestprodukt Zwei",
> "code_string":"taxTestingProductThree",
> // CORRECT, neither "Netzteil" nor "Sony" are included in the name
> "score":1.0},
>   {
> 

[jira] [Commented] (SOLR-13126) Inconsistent score in debug and result with multiple multiplicative boosts

2019-02-13 Thread Tom Burgmans (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16767308#comment-16767308
 ] 

Tom Burgmans commented on SOLR-13126:
-

I could reproduce this with 2 cases:

*Multiplicative boost with _field_ function query:*
{noformat}
http://localhost:8983/solr/test/select?defType=edismax=sku,score,[explain 
style=text]=:=sum(field(price),4)=dictionary_id:cheetah
{noformat}
!image-2019-02-13-16-17-56-272.png|width=1090,height=439!

*Multiplicative boost with _query_ function query:*
{noformat}
http://localhost:8983/solr/test/select?defType=edismax=sku,min_term,score,[explain%20style=text]=min_term:shelter=sum(3,query($BoostQuery))={!edismax%20boost=3%20bf=}min_term:(%22xxxdoesnexistxxx%22)=dictionary_id:cheetah
{noformat}
!screenshot-1.png|width=1258,height=640!  

Our analysis is that the EXPLAIN provides the correct value but the actual 
score is wrong.

Since we rely heavily on multiplicative boost by various function queries, this 
is a showstopper for us to upgrade from Solr 6.

> Inconsistent score in debug and result with multiple multiplicative boosts
> --
>
> Key: SOLR-13126
> URL: https://issues.apache.org/jira/browse/SOLR-13126
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: search
>Affects Versions: 7.5.0
> Environment: Reproduced with macOS 10.14.1, a quick test with Windows 
> 10 showed the same result.
>Reporter: Thomas Aglassinger
>Priority: Major
> Attachments: 
> 0001-use-deprecated-classes-to-fix-regression-introduced-.patch, 
> 0002-SOLR-13126-Added-test-case.patch, debugQuery.json, 
> image-2019-02-13-16-17-56-272.png, screenshot-1.png, 
> solr_match_neither_nextteil_nor_sony.json, 
> solr_match_neither_nextteil_nor_sony.txt, solr_match_netzteil_and_sony.json, 
> solr_match_netzteil_and_sony.txt, solr_match_netzteil_only.json, 
> solr_match_netzteil_only.txt
>
>
> Under certain circumstances search results from queries with multiple 
> multiplicative boosts using the Solr functions {{product()}} and {{query()}} 
> result in a score that is inconsistent with the one from the debugQuery 
> information. Also only the debug score is correct while the actual search 
> results show a wrong score.
> This seems somewhat similar to the behaviour described in 
> https://issues.apache.org/jira/browse/LUCENE-7132, though this issue has been 
> resolved a while ago.
> A little background: we are using Solr as a search platform for the 
> e-commerce framework SAP Hybris. There the shop administrator can create 
> multiplicative boost rules (see below for an example) where a value like 2.0 
> means that an item gets boosted to 200%. This works fine in the demo shop 
> distributed by SAP but breaks in our shop. We encountered the issue when 
> Upgrading from Solr 7.2.1 / Hybris 6.7 to Solr 7.5 / Hybris 18.8.3 (which 
> would have been named Hybris 6.8 but the version naming schema changed).
> We reduced the Solr query generated by Hybris to the relevant parts and could 
> reproduce the issue in the Solr admin without any Hybris connection.
> I attached the JSON result of a test query but here's a description of the 
> parts that seemed most relevant to me.
> The {{responseHeader.params}} reads (slightly rearranged):
> {code:java}
> "q":"{!boost b=$ymb}(+{!lucene v=$yq})",
> "ymb":"product(query({!v=\"name_text_de\\:Netzteil\\^=2.0\"},1),query({!v=\"name_text_de\\:Sony\\^=3.0\"},1))",
> "yq":"*:*",
> "sort":"score desc",
> "debugQuery":"true",
> // Added to keep the output small but probably unrelated to the actual issue
> "fl":"score,id,code_string,name_text_de",
> "fq":"catalogId:\"someProducts\"",
> "rows":"10",
> {code}
> This example boosts the German product name (field {{name_text_de}}) in case 
> in contains certain terms:
>  * "Netzteil" (power supply) is boosted to 200%
>  * "Sony" is boosted to 300%
> Consequently a product containing both terms should be boosted to 600%.
> Also the query function has the value 1 specified as default in case the name 
> does not contain the respective term resulting in a pseudo boost that 
> preserves the score.
> According to the debug information the parser used is the LuceneQParser, 
> which translates this to the following parsed query:
> {quote}FunctionScoreQuery(FunctionScoreQuery(+*:*, scored by 
> boost(product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0)
> {quote}
> And the translated boost is:
> {quote}org.apache.lucene.queries.function.valuesource.ProductFloatFunction:product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0))
> {quote}
> When taking a look at the 

[jira] [Commented] (SOLR-13126) Inconsistent score in debug and result with multiple multiplicative boosts

2019-01-16 Thread Thomas Aglassinger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744256#comment-16744256
 ] 

Thomas Aglassinger commented on SOLR-13126:
---

We digged in further and seem to have found the culprit. The test case in the 
attached patch {{0002-SOLR-13126-Added-test-case.patch}} reproduces the bug. 
The last working version is Solr 7.2.1.

Using {{git bisect}} we found out that the issue got introduced with 
LUCENE-8099 (a refactoring). There's two changes that break the scoring in 
different ways:
 * [LUCENE-8099: Deprecate CustomScoreQuery, BoostedQuery, 
BoostingQuery|https://github.com/apache/lucene-solr/commit/b01e6023e1cd3c62260b38c05c8d145ba143a2ac]
 * [LUCENE-8099: Replace BoostQParserPlugin.boostQuery() with 
FunctionScoreQuery.boostByValue()|https://github.com/apache/lucene-solr/commit/0744fea821366a853b8e239e766b9786ef96cb27]

The attached patch 
{{0001-use-deprecated-classes-to-fix-regression-introduced-.patch}} includes an 
experimental fix by reverting some parts of the code to its previous version 
based on a deprecated class the refactoring of LUCENE-8099 tried to replace 
(among other things).

This is a rough initial patch with the following known issues:
 # The patch goes towards solr 7.5.0. This is the version we currently 
experience the issues with and attempt to get back to working for production 
use. Ideally of course the patch would go towards the master and then merged 
back to earlier versions.
 # The fix uses a deprecated class. Ideally it would fix the refactored classes 
from LUCENE-8099.

Nevertheless the test case is generic enough to run on all branches, including 
the current master.

> Inconsistent score in debug and result with multiple multiplicative boosts
> --
>
> Key: SOLR-13126
> URL: https://issues.apache.org/jira/browse/SOLR-13126
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: search
>Affects Versions: 7.5.0
> Environment: Reproduced with macOS 10.14.1, a quick test with Windows 
> 10 showed the same result.
>Reporter: Thomas Aglassinger
>Priority: Major
> Attachments: debugQuery.json, 
> solr_match_neither_nextteil_nor_sony.json, 
> solr_match_neither_nextteil_nor_sony.txt, solr_match_netzteil_and_sony.json, 
> solr_match_netzteil_and_sony.txt, solr_match_netzteil_only.json, 
> solr_match_netzteil_only.txt
>
>
> Under certain circumstances search results from queries with multiple 
> multiplicative boosts using the Solr functions {{product()}} and {{query()}} 
> result in a score that is inconsistent with the one from the debugQuery 
> information. Also only the debug score is correct while the actual search 
> results show a wrong score.
> This seems somewhat similar to the behaviour described in 
> https://issues.apache.org/jira/browse/LUCENE-7132, though this issue has been 
> resolved a while ago.
> A little background: we are using Solr as a search platform for the 
> e-commerce framework SAP Hybris. There the shop administrator can create 
> multiplicative boost rules (see below for an example) where a value like 2.0 
> means that an item gets boosted to 200%. This works fine in the demo shop 
> distributed by SAP but breaks in our shop. We encountered the issue when 
> Upgrading from Solr 7.2.1 / Hybris 6.7 to Solr 7.5 / Hybris 18.8.3 (which 
> would have been named Hybris 6.8 but the version naming schema changed).
> We reduced the Solr query generated by Hybris to the relevant parts and could 
> reproduce the issue in the Solr admin without any Hybris connection.
> I attached the JSON result of a test query but here's a description of the 
> parts that seemed most relevant to me.
> The {{responseHeader.params}} reads (slightly rearranged):
> {code:java}
> "q":"{!boost b=$ymb}(+{!lucene v=$yq})",
> "ymb":"product(query({!v=\"name_text_de\\:Netzteil\\^=2.0\"},1),query({!v=\"name_text_de\\:Sony\\^=3.0\"},1))",
> "yq":"*:*",
> "sort":"score desc",
> "debugQuery":"true",
> // Added to keep the output small but probably unrelated to the actual issue
> "fl":"score,id,code_string,name_text_de",
> "fq":"catalogId:\"someProducts\"",
> "rows":"10",
> {code}
> This example boosts the German product name (field {{name_text_de}}) in case 
> in contains certain terms:
>  * "Netzteil" (power supply) is boosted to 200%
>  * "Sony" is boosted to 300%
> Consequently a product containing both terms should be boosted to 600%.
> Also the query function has the value 1 specified as default in case the name 
> does not contain the respective term resulting in a pseudo boost that 
> preserves the score.
> According to the debug information the parser used is the LuceneQParser, 
> which translates this to the following parsed 

[jira] [Commented] (SOLR-13126) Inconsistent score in debug and result with multiple multiplicative boosts

2019-01-14 Thread Thomas Aglassinger (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742343#comment-16742343
 ] 

Thomas Aglassinger commented on SOLR-13126:
---

We've been digging into this and managed to somewhat track the issue down 
although unfortunately our knowledge of the inner workings of Solr and Lucene 
in particular is not sufficient to fix it and provide a patch.

We did however add logging statements that showcase the difference in the 
scoring for some trivial queries. To make the logging easier to read we 
refactored several anonymous classes to inner classes with expressive names and 
added several {{toString()}} functions. The log messages are deliberately 
written with level warning so we can easily separate them from Solr's own info 
and debug messages.

If it helps we can make these changes available although it's not feasible to 
merge them because they are only debug hacks.

Here's what we found out so far:

As described in the initial issue description we can reproduce that the score 
of a query result is computed correctly in the explain segments but incorrectly 
in the actual result if only one of two multiplicative boost conditions match. 
We now further simplified our query by splitting it into 3 separate queries 
with a filter query on one specific document. The cases are:

 # name matches both boost (netzteil and sony): Original Sony Vaio Netzteil
 # name matches one boost (netzteil but not sony): GS-Netzteil 20W schwarz
 # name matches no boost (neither netzteil nor sony): Camcorderband DV 100min 
(2)

Attached you find the log files for these queries and the JSON of the queries 
themselves. This time we did not enable debugQuery in order to log only the 
incorrect score of the actual result.

Each request was executed on a freshly restarted server (local, no replication, 
no shards) to ensure caching does not pollute the findings.

We made the following observations:
 # Both matches: lucene detects both matches with {{QueryDocValues.exists()}} 
and then computes scores for them using QueryDocValues.floatValue(). This seems 
to be called eventually by the scorer utilized by the result of 
{{org.apache.lucene.search.DoubleValues#withDefault()}} based on a formerly 
anonymous class renamed to DoubleValues_DoubleValuesWithDefault()
 # Single match: {{QueryDocValues.exists()}} detects one match and considers 
the other false (which seems correct). After that however it only seems to work 
with various variants of a constant score of 1.0, which in the end results in 
1.0. Notice that this query uses the same {{withDefault()}} as above but 
performs a very different computation mostly based on constant values. There is 
no call to {{QueryDocValues.floatVal()}}
 # No match: {{QueryDocValues.exists()}} does not find anything and results in 
a score of 1.0 as expected.
 # All logs seems to compute the score for a document with the ID -1, which 
utilizes {{QueryDocValues.floatVal()}}. As far as we understand this seems to 
be some initialization step independent of the actual query that happens only 
for the first query sent to the server.

Interestingly when you compare the logs for single and no match the are almost 
identical apart from the {{QueryDocValues.exists()}}, an additional 
{{BooleanWeight()}} and various {{toString()}} hashes.

Our expectation would have been that queries for single and both matches would 
have produced a fairly similar log using similar scorers but different scores 
(2.0 vs 6.0).

As we can reproduce these results consistently in a small testing environment 
we currently see the following options to proceed further:
 # With some hints on where to further dig into the source code we might be 
able to find the real culprit causing the inconsistent score. Any pointers?
 # We could make the solrconfig.xml, schema.xml and the core files for Solr 7.5 
available for someone else to debug who has a better grasp of the inner 
workings. Again, this is small test environment with only a few documents, and 
we could probably reduce this further (e.g. by removing Solr fields unrelated 
to this issue).

Any help would be much appreciated,
 Thomas

> Inconsistent score in debug and result with multiple multiplicative boosts
> --
>
> Key: SOLR-13126
> URL: https://issues.apache.org/jira/browse/SOLR-13126
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: search
>Affects Versions: 7.5.0
> Environment: Reproduced with macOS 10.14.1, a quick test with Windows 
> 10 showed the same result.
>Reporter: Thomas Aglassinger
>Priority: Major
> Attachments: debugQuery.json, 
> solr_match_neither_nextteil_nor_sony.json, 
>