[jira] [Commented] (SOLR-13126) Inconsistent score in debug and result with multiple multiplicative boosts
[ https://issues.apache.org/jira/browse/SOLR-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768378#comment-16768378 ] Thomas Aglassinger commented on SOLR-13126: --- [~mkhludnev], I still should have my test environment around but I'm currently in a scrum sprint working on some other tasks. But I should be able to take a look at it early on next week and integrate a patched Solr in a SAP commerce environment for testing. BTW our hackish fix has been in production use for a couple of weeks and has been working fine so far. AFAIU your fix does the same in a cleaner way, so the general direction of the suggested solution seems promising. > Inconsistent score in debug and result with multiple multiplicative boosts > -- > > Key: SOLR-13126 > URL: https://issues.apache.org/jira/browse/SOLR-13126 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Affects Versions: 7.5.0 > Environment: Reproduced with macOS 10.14.1, a quick test with Windows > 10 showed the same result. >Reporter: Thomas Aglassinger >Priority: Major > Attachments: > 0001-use-deprecated-classes-to-fix-regression-introduced-.patch, > 0002-SOLR-13126-Added-test-case.patch, 2019-02-14_1715.png, SOLR-13126.patch, > debugQuery.json, image-2019-02-13-16-17-56-272.png, screenshot-1.png, > solr_match_neither_nextteil_nor_sony.json, > solr_match_neither_nextteil_nor_sony.txt, solr_match_netzteil_and_sony.json, > solr_match_netzteil_and_sony.txt, solr_match_netzteil_only.json, > solr_match_netzteil_only.txt > > > Under certain circumstances search results from queries with multiple > multiplicative boosts using the Solr functions {{product()}} and {{query()}} > result in a score that is inconsistent with the one from the debugQuery > information. Also only the debug score is correct while the actual search > results show a wrong score. > This seems somewhat similar to the behaviour described in > https://issues.apache.org/jira/browse/LUCENE-7132, though this issue has been > resolved a while ago. > A little background: we are using Solr as a search platform for the > e-commerce framework SAP Hybris. There the shop administrator can create > multiplicative boost rules (see below for an example) where a value like 2.0 > means that an item gets boosted to 200%. This works fine in the demo shop > distributed by SAP but breaks in our shop. We encountered the issue when > Upgrading from Solr 7.2.1 / Hybris 6.7 to Solr 7.5 / Hybris 18.8.3 (which > would have been named Hybris 6.8 but the version naming schema changed). > We reduced the Solr query generated by Hybris to the relevant parts and could > reproduce the issue in the Solr admin without any Hybris connection. > I attached the JSON result of a test query but here's a description of the > parts that seemed most relevant to me. > The {{responseHeader.params}} reads (slightly rearranged): > {code:java} > "q":"{!boost b=$ymb}(+{!lucene v=$yq})", > "ymb":"product(query({!v=\"name_text_de\\:Netzteil\\^=2.0\"},1),query({!v=\"name_text_de\\:Sony\\^=3.0\"},1))", > "yq":"*:*", > "sort":"score desc", > "debugQuery":"true", > // Added to keep the output small but probably unrelated to the actual issue > "fl":"score,id,code_string,name_text_de", > "fq":"catalogId:\"someProducts\"", > "rows":"10", > {code} > This example boosts the German product name (field {{name_text_de}}) in case > in contains certain terms: > * "Netzteil" (power supply) is boosted to 200% > * "Sony" is boosted to 300% > Consequently a product containing both terms should be boosted to 600%. > Also the query function has the value 1 specified as default in case the name > does not contain the respective term resulting in a pseudo boost that > preserves the score. > According to the debug information the parser used is the LuceneQParser, > which translates this to the following parsed query: > {quote}FunctionScoreQuery(FunctionScoreQuery(+*:*, scored by > boost(product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0) > {quote} > And the translated boost is: > {quote}org.apache.lucene.queries.function.valuesource.ProductFloatFunction:product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0)) > {quote} > When taking a look at the search result, among other the following products > are included (see the JSON comments for an analysis of each result): > {code:javascript} > { > "id":"someProducts/Online/test711", > "name_text_de":"Original Sony Vaio Netzteil", > "code_string":"test711", > // CORRECT, both "Netzteil"
[jira] [Commented] (SOLR-13126) Inconsistent score in debug and result with multiple multiplicative boosts
[ https://issues.apache.org/jira/browse/SOLR-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768301#comment-16768301 ] Aliaksandr Asiptsou commented on SOLR-13126: Hi [~mkhludnev] , We've verified our case (see comment from [~Tom Burgmans] https://issues.apache.org/jira/browse/SOLR-13126?focusedCommentId=16767308=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16767308) and found it working fine with the latest patch. query: {code:java} http://localhost:8983/solr/OTSTests_A/select?defType=edismax=*,score,[explain%20style=text]=title:tax=sum(3,query($BoostQuery))={!edismax%20boost=3%20bf=}min_term:(%22xxxdoesnexistxxx%22) {code} result: !2019-02-14_1715.png! The values in _score_ and _explain_ are now the same. > Inconsistent score in debug and result with multiple multiplicative boosts > -- > > Key: SOLR-13126 > URL: https://issues.apache.org/jira/browse/SOLR-13126 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Affects Versions: 7.5.0 > Environment: Reproduced with macOS 10.14.1, a quick test with Windows > 10 showed the same result. >Reporter: Thomas Aglassinger >Priority: Major > Attachments: > 0001-use-deprecated-classes-to-fix-regression-introduced-.patch, > 0002-SOLR-13126-Added-test-case.patch, 2019-02-14_1715.png, SOLR-13126.patch, > debugQuery.json, image-2019-02-13-16-17-56-272.png, screenshot-1.png, > solr_match_neither_nextteil_nor_sony.json, > solr_match_neither_nextteil_nor_sony.txt, solr_match_netzteil_and_sony.json, > solr_match_netzteil_and_sony.txt, solr_match_netzteil_only.json, > solr_match_netzteil_only.txt > > > Under certain circumstances search results from queries with multiple > multiplicative boosts using the Solr functions {{product()}} and {{query()}} > result in a score that is inconsistent with the one from the debugQuery > information. Also only the debug score is correct while the actual search > results show a wrong score. > This seems somewhat similar to the behaviour described in > https://issues.apache.org/jira/browse/LUCENE-7132, though this issue has been > resolved a while ago. > A little background: we are using Solr as a search platform for the > e-commerce framework SAP Hybris. There the shop administrator can create > multiplicative boost rules (see below for an example) where a value like 2.0 > means that an item gets boosted to 200%. This works fine in the demo shop > distributed by SAP but breaks in our shop. We encountered the issue when > Upgrading from Solr 7.2.1 / Hybris 6.7 to Solr 7.5 / Hybris 18.8.3 (which > would have been named Hybris 6.8 but the version naming schema changed). > We reduced the Solr query generated by Hybris to the relevant parts and could > reproduce the issue in the Solr admin without any Hybris connection. > I attached the JSON result of a test query but here's a description of the > parts that seemed most relevant to me. > The {{responseHeader.params}} reads (slightly rearranged): > {code:java} > "q":"{!boost b=$ymb}(+{!lucene v=$yq})", > "ymb":"product(query({!v=\"name_text_de\\:Netzteil\\^=2.0\"},1),query({!v=\"name_text_de\\:Sony\\^=3.0\"},1))", > "yq":"*:*", > "sort":"score desc", > "debugQuery":"true", > // Added to keep the output small but probably unrelated to the actual issue > "fl":"score,id,code_string,name_text_de", > "fq":"catalogId:\"someProducts\"", > "rows":"10", > {code} > This example boosts the German product name (field {{name_text_de}}) in case > in contains certain terms: > * "Netzteil" (power supply) is boosted to 200% > * "Sony" is boosted to 300% > Consequently a product containing both terms should be boosted to 600%. > Also the query function has the value 1 specified as default in case the name > does not contain the respective term resulting in a pseudo boost that > preserves the score. > According to the debug information the parser used is the LuceneQParser, > which translates this to the following parsed query: > {quote}FunctionScoreQuery(FunctionScoreQuery(+*:*, scored by > boost(product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0) > {quote} > And the translated boost is: > {quote}org.apache.lucene.queries.function.valuesource.ProductFloatFunction:product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0)) > {quote} > When taking a look at the search result, among other the following products > are included (see the JSON comments for an analysis of each result): > {code:javascript} > { > "id":"someProducts/Online/test711", >
[jira] [Commented] (SOLR-13126) Inconsistent score in debug and result with multiple multiplicative boosts
[ https://issues.apache.org/jira/browse/SOLR-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768024#comment-16768024 ] Mikhail Khludnev commented on SOLR-13126: - [~roskakori], got it. Do you have an ability to verify the fix [^SOLR-13126.patch]? > Inconsistent score in debug and result with multiple multiplicative boosts > -- > > Key: SOLR-13126 > URL: https://issues.apache.org/jira/browse/SOLR-13126 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Affects Versions: 7.5.0 > Environment: Reproduced with macOS 10.14.1, a quick test with Windows > 10 showed the same result. >Reporter: Thomas Aglassinger >Priority: Major > Attachments: > 0001-use-deprecated-classes-to-fix-regression-introduced-.patch, > 0002-SOLR-13126-Added-test-case.patch, SOLR-13126.patch, debugQuery.json, > image-2019-02-13-16-17-56-272.png, screenshot-1.png, > solr_match_neither_nextteil_nor_sony.json, > solr_match_neither_nextteil_nor_sony.txt, solr_match_netzteil_and_sony.json, > solr_match_netzteil_and_sony.txt, solr_match_netzteil_only.json, > solr_match_netzteil_only.txt > > > Under certain circumstances search results from queries with multiple > multiplicative boosts using the Solr functions {{product()}} and {{query()}} > result in a score that is inconsistent with the one from the debugQuery > information. Also only the debug score is correct while the actual search > results show a wrong score. > This seems somewhat similar to the behaviour described in > https://issues.apache.org/jira/browse/LUCENE-7132, though this issue has been > resolved a while ago. > A little background: we are using Solr as a search platform for the > e-commerce framework SAP Hybris. There the shop administrator can create > multiplicative boost rules (see below for an example) where a value like 2.0 > means that an item gets boosted to 200%. This works fine in the demo shop > distributed by SAP but breaks in our shop. We encountered the issue when > Upgrading from Solr 7.2.1 / Hybris 6.7 to Solr 7.5 / Hybris 18.8.3 (which > would have been named Hybris 6.8 but the version naming schema changed). > We reduced the Solr query generated by Hybris to the relevant parts and could > reproduce the issue in the Solr admin without any Hybris connection. > I attached the JSON result of a test query but here's a description of the > parts that seemed most relevant to me. > The {{responseHeader.params}} reads (slightly rearranged): > {code:java} > "q":"{!boost b=$ymb}(+{!lucene v=$yq})", > "ymb":"product(query({!v=\"name_text_de\\:Netzteil\\^=2.0\"},1),query({!v=\"name_text_de\\:Sony\\^=3.0\"},1))", > "yq":"*:*", > "sort":"score desc", > "debugQuery":"true", > // Added to keep the output small but probably unrelated to the actual issue > "fl":"score,id,code_string,name_text_de", > "fq":"catalogId:\"someProducts\"", > "rows":"10", > {code} > This example boosts the German product name (field {{name_text_de}}) in case > in contains certain terms: > * "Netzteil" (power supply) is boosted to 200% > * "Sony" is boosted to 300% > Consequently a product containing both terms should be boosted to 600%. > Also the query function has the value 1 specified as default in case the name > does not contain the respective term resulting in a pseudo boost that > preserves the score. > According to the debug information the parser used is the LuceneQParser, > which translates this to the following parsed query: > {quote}FunctionScoreQuery(FunctionScoreQuery(+*:*, scored by > boost(product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0) > {quote} > And the translated boost is: > {quote}org.apache.lucene.queries.function.valuesource.ProductFloatFunction:product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0)) > {quote} > When taking a look at the search result, among other the following products > are included (see the JSON comments for an analysis of each result): > {code:javascript} > { > "id":"someProducts/Online/test711", > "name_text_de":"Original Sony Vaio Netzteil", > "code_string":"test711", > // CORRECT, both "Netzteil" and "Sony" are included in the name > "score":6.0}, > { > "id":"someProducts/Online/taxTestingProductThree", > "name_text_de":"Steuertestprodukt Zwei", > "code_string":"taxTestingProductThree", > // CORRECT, neither "Netzteil" nor "Sony" are included in the name > "score":1.0}, > { > "id":"someProducts/Online/79785630", >
[jira] [Commented] (SOLR-13126) Inconsistent score in debug and result with multiple multiplicative boosts
[ https://issues.apache.org/jira/browse/SOLR-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768021#comment-16768021 ] Thomas Aglassinger commented on SOLR-13126: --- [~mkhludnev]: The values and queries for the test cases are simplified versions of the situation we encountered in production. The actual query is a lot more complex and generated by an [e-commerce framework from SAP|https://www.sap.com/products/crm/e-commerce-platforms/technical-information.html]. Does is this answer your question or do you need more information on a specific part? > Inconsistent score in debug and result with multiple multiplicative boosts > -- > > Key: SOLR-13126 > URL: https://issues.apache.org/jira/browse/SOLR-13126 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Affects Versions: 7.5.0 > Environment: Reproduced with macOS 10.14.1, a quick test with Windows > 10 showed the same result. >Reporter: Thomas Aglassinger >Priority: Major > Attachments: > 0001-use-deprecated-classes-to-fix-regression-introduced-.patch, > 0002-SOLR-13126-Added-test-case.patch, SOLR-13126.patch, debugQuery.json, > image-2019-02-13-16-17-56-272.png, screenshot-1.png, > solr_match_neither_nextteil_nor_sony.json, > solr_match_neither_nextteil_nor_sony.txt, solr_match_netzteil_and_sony.json, > solr_match_netzteil_and_sony.txt, solr_match_netzteil_only.json, > solr_match_netzteil_only.txt > > > Under certain circumstances search results from queries with multiple > multiplicative boosts using the Solr functions {{product()}} and {{query()}} > result in a score that is inconsistent with the one from the debugQuery > information. Also only the debug score is correct while the actual search > results show a wrong score. > This seems somewhat similar to the behaviour described in > https://issues.apache.org/jira/browse/LUCENE-7132, though this issue has been > resolved a while ago. > A little background: we are using Solr as a search platform for the > e-commerce framework SAP Hybris. There the shop administrator can create > multiplicative boost rules (see below for an example) where a value like 2.0 > means that an item gets boosted to 200%. This works fine in the demo shop > distributed by SAP but breaks in our shop. We encountered the issue when > Upgrading from Solr 7.2.1 / Hybris 6.7 to Solr 7.5 / Hybris 18.8.3 (which > would have been named Hybris 6.8 but the version naming schema changed). > We reduced the Solr query generated by Hybris to the relevant parts and could > reproduce the issue in the Solr admin without any Hybris connection. > I attached the JSON result of a test query but here's a description of the > parts that seemed most relevant to me. > The {{responseHeader.params}} reads (slightly rearranged): > {code:java} > "q":"{!boost b=$ymb}(+{!lucene v=$yq})", > "ymb":"product(query({!v=\"name_text_de\\:Netzteil\\^=2.0\"},1),query({!v=\"name_text_de\\:Sony\\^=3.0\"},1))", > "yq":"*:*", > "sort":"score desc", > "debugQuery":"true", > // Added to keep the output small but probably unrelated to the actual issue > "fl":"score,id,code_string,name_text_de", > "fq":"catalogId:\"someProducts\"", > "rows":"10", > {code} > This example boosts the German product name (field {{name_text_de}}) in case > in contains certain terms: > * "Netzteil" (power supply) is boosted to 200% > * "Sony" is boosted to 300% > Consequently a product containing both terms should be boosted to 600%. > Also the query function has the value 1 specified as default in case the name > does not contain the respective term resulting in a pseudo boost that > preserves the score. > According to the debug information the parser used is the LuceneQParser, > which translates this to the following parsed query: > {quote}FunctionScoreQuery(FunctionScoreQuery(+*:*, scored by > boost(product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0) > {quote} > And the translated boost is: > {quote}org.apache.lucene.queries.function.valuesource.ProductFloatFunction:product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0)) > {quote} > When taking a look at the search result, among other the following products > are included (see the JSON comments for an analysis of each result): > {code:javascript} > { > "id":"someProducts/Online/test711", > "name_text_de":"Original Sony Vaio Netzteil", > "code_string":"test711", > // CORRECT, both "Netzteil" and "Sony" are included in the name > "score":6.0}, > { >
[jira] [Commented] (SOLR-13126) Inconsistent score in debug and result with multiple multiplicative boosts
[ https://issues.apache.org/jira/browse/SOLR-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768001#comment-16768001 ] Mikhail Khludnev commented on SOLR-13126: - It seems to me that LUCENE-8099 replaced the plain multiplicative boost at JS with conditional logic {{DoubleValues.withDefault()}}, which drops boost score when one one of multiplied queries doesn't match to a doc. see [^SOLR-13126.patch] > Inconsistent score in debug and result with multiple multiplicative boosts > -- > > Key: SOLR-13126 > URL: https://issues.apache.org/jira/browse/SOLR-13126 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Affects Versions: 7.5.0 > Environment: Reproduced with macOS 10.14.1, a quick test with Windows > 10 showed the same result. >Reporter: Thomas Aglassinger >Priority: Major > Attachments: > 0001-use-deprecated-classes-to-fix-regression-introduced-.patch, > 0002-SOLR-13126-Added-test-case.patch, SOLR-13126.patch, debugQuery.json, > image-2019-02-13-16-17-56-272.png, screenshot-1.png, > solr_match_neither_nextteil_nor_sony.json, > solr_match_neither_nextteil_nor_sony.txt, solr_match_netzteil_and_sony.json, > solr_match_netzteil_and_sony.txt, solr_match_netzteil_only.json, > solr_match_netzteil_only.txt > > > Under certain circumstances search results from queries with multiple > multiplicative boosts using the Solr functions {{product()}} and {{query()}} > result in a score that is inconsistent with the one from the debugQuery > information. Also only the debug score is correct while the actual search > results show a wrong score. > This seems somewhat similar to the behaviour described in > https://issues.apache.org/jira/browse/LUCENE-7132, though this issue has been > resolved a while ago. > A little background: we are using Solr as a search platform for the > e-commerce framework SAP Hybris. There the shop administrator can create > multiplicative boost rules (see below for an example) where a value like 2.0 > means that an item gets boosted to 200%. This works fine in the demo shop > distributed by SAP but breaks in our shop. We encountered the issue when > Upgrading from Solr 7.2.1 / Hybris 6.7 to Solr 7.5 / Hybris 18.8.3 (which > would have been named Hybris 6.8 but the version naming schema changed). > We reduced the Solr query generated by Hybris to the relevant parts and could > reproduce the issue in the Solr admin without any Hybris connection. > I attached the JSON result of a test query but here's a description of the > parts that seemed most relevant to me. > The {{responseHeader.params}} reads (slightly rearranged): > {code:java} > "q":"{!boost b=$ymb}(+{!lucene v=$yq})", > "ymb":"product(query({!v=\"name_text_de\\:Netzteil\\^=2.0\"},1),query({!v=\"name_text_de\\:Sony\\^=3.0\"},1))", > "yq":"*:*", > "sort":"score desc", > "debugQuery":"true", > // Added to keep the output small but probably unrelated to the actual issue > "fl":"score,id,code_string,name_text_de", > "fq":"catalogId:\"someProducts\"", > "rows":"10", > {code} > This example boosts the German product name (field {{name_text_de}}) in case > in contains certain terms: > * "Netzteil" (power supply) is boosted to 200% > * "Sony" is boosted to 300% > Consequently a product containing both terms should be boosted to 600%. > Also the query function has the value 1 specified as default in case the name > does not contain the respective term resulting in a pseudo boost that > preserves the score. > According to the debug information the parser used is the LuceneQParser, > which translates this to the following parsed query: > {quote}FunctionScoreQuery(FunctionScoreQuery(+*:*, scored by > boost(product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0) > {quote} > And the translated boost is: > {quote}org.apache.lucene.queries.function.valuesource.ProductFloatFunction:product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0)) > {quote} > When taking a look at the search result, among other the following products > are included (see the JSON comments for an analysis of each result): > {code:javascript} > { > "id":"someProducts/Online/test711", > "name_text_de":"Original Sony Vaio Netzteil", > "code_string":"test711", > // CORRECT, both "Netzteil" and "Sony" are included in the name > "score":6.0}, > { > "id":"someProducts/Online/taxTestingProductThree", > "name_text_de":"Steuertestprodukt Zwei", > "code_string":"taxTestingProductThree", > // CORRECT, neither
[jira] [Commented] (SOLR-13126) Inconsistent score in debug and result with multiple multiplicative boosts
[ https://issues.apache.org/jira/browse/SOLR-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16767853#comment-16767853 ] Tobias Ibounig commented on SOLR-13126: --- [~mkhludnev] Testcase was derived from SampleTest where these configs are used, we just didn't change them. https://github.com/apache/lucene-solr/blob/master/solr/core/src/test/org/apache/solr/SampleTest.java > Inconsistent score in debug and result with multiple multiplicative boosts > -- > > Key: SOLR-13126 > URL: https://issues.apache.org/jira/browse/SOLR-13126 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Affects Versions: 7.5.0 > Environment: Reproduced with macOS 10.14.1, a quick test with Windows > 10 showed the same result. >Reporter: Thomas Aglassinger >Priority: Major > Attachments: > 0001-use-deprecated-classes-to-fix-regression-introduced-.patch, > 0002-SOLR-13126-Added-test-case.patch, SOLR-13126.patch, debugQuery.json, > image-2019-02-13-16-17-56-272.png, screenshot-1.png, > solr_match_neither_nextteil_nor_sony.json, > solr_match_neither_nextteil_nor_sony.txt, solr_match_netzteil_and_sony.json, > solr_match_netzteil_and_sony.txt, solr_match_netzteil_only.json, > solr_match_netzteil_only.txt > > > Under certain circumstances search results from queries with multiple > multiplicative boosts using the Solr functions {{product()}} and {{query()}} > result in a score that is inconsistent with the one from the debugQuery > information. Also only the debug score is correct while the actual search > results show a wrong score. > This seems somewhat similar to the behaviour described in > https://issues.apache.org/jira/browse/LUCENE-7132, though this issue has been > resolved a while ago. > A little background: we are using Solr as a search platform for the > e-commerce framework SAP Hybris. There the shop administrator can create > multiplicative boost rules (see below for an example) where a value like 2.0 > means that an item gets boosted to 200%. This works fine in the demo shop > distributed by SAP but breaks in our shop. We encountered the issue when > Upgrading from Solr 7.2.1 / Hybris 6.7 to Solr 7.5 / Hybris 18.8.3 (which > would have been named Hybris 6.8 but the version naming schema changed). > We reduced the Solr query generated by Hybris to the relevant parts and could > reproduce the issue in the Solr admin without any Hybris connection. > I attached the JSON result of a test query but here's a description of the > parts that seemed most relevant to me. > The {{responseHeader.params}} reads (slightly rearranged): > {code:java} > "q":"{!boost b=$ymb}(+{!lucene v=$yq})", > "ymb":"product(query({!v=\"name_text_de\\:Netzteil\\^=2.0\"},1),query({!v=\"name_text_de\\:Sony\\^=3.0\"},1))", > "yq":"*:*", > "sort":"score desc", > "debugQuery":"true", > // Added to keep the output small but probably unrelated to the actual issue > "fl":"score,id,code_string,name_text_de", > "fq":"catalogId:\"someProducts\"", > "rows":"10", > {code} > This example boosts the German product name (field {{name_text_de}}) in case > in contains certain terms: > * "Netzteil" (power supply) is boosted to 200% > * "Sony" is boosted to 300% > Consequently a product containing both terms should be boosted to 600%. > Also the query function has the value 1 specified as default in case the name > does not contain the respective term resulting in a pseudo boost that > preserves the score. > According to the debug information the parser used is the LuceneQParser, > which translates this to the following parsed query: > {quote}FunctionScoreQuery(FunctionScoreQuery(+*:*, scored by > boost(product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0) > {quote} > And the translated boost is: > {quote}org.apache.lucene.queries.function.valuesource.ProductFloatFunction:product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0)) > {quote} > When taking a look at the search result, among other the following products > are included (see the JSON comments for an analysis of each result): > {code:javascript} > { > "id":"someProducts/Online/test711", > "name_text_de":"Original Sony Vaio Netzteil", > "code_string":"test711", > // CORRECT, both "Netzteil" and "Sony" are included in the name > "score":6.0}, > { > "id":"someProducts/Online/taxTestingProductThree", > "name_text_de":"Steuertestprodukt Zwei", > "code_string":"taxTestingProductThree", > // CORRECT, neither "Netzteil" nor "Sony" are included in
[jira] [Commented] (SOLR-13126) Inconsistent score in debug and result with multiple multiplicative boosts
[ https://issues.apache.org/jira/browse/SOLR-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16767647#comment-16767647 ] Mikhail Khludnev commented on SOLR-13126: - [~roskakori], thanks for the representative test. I'm just wondering about configs. Why these ones? [~romseygeek], wdyt abt [^SOLR-13126.patch]? > Inconsistent score in debug and result with multiple multiplicative boosts > -- > > Key: SOLR-13126 > URL: https://issues.apache.org/jira/browse/SOLR-13126 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Affects Versions: 7.5.0 > Environment: Reproduced with macOS 10.14.1, a quick test with Windows > 10 showed the same result. >Reporter: Thomas Aglassinger >Priority: Major > Attachments: > 0001-use-deprecated-classes-to-fix-regression-introduced-.patch, > 0002-SOLR-13126-Added-test-case.patch, SOLR-13126.patch, debugQuery.json, > image-2019-02-13-16-17-56-272.png, screenshot-1.png, > solr_match_neither_nextteil_nor_sony.json, > solr_match_neither_nextteil_nor_sony.txt, solr_match_netzteil_and_sony.json, > solr_match_netzteil_and_sony.txt, solr_match_netzteil_only.json, > solr_match_netzteil_only.txt > > > Under certain circumstances search results from queries with multiple > multiplicative boosts using the Solr functions {{product()}} and {{query()}} > result in a score that is inconsistent with the one from the debugQuery > information. Also only the debug score is correct while the actual search > results show a wrong score. > This seems somewhat similar to the behaviour described in > https://issues.apache.org/jira/browse/LUCENE-7132, though this issue has been > resolved a while ago. > A little background: we are using Solr as a search platform for the > e-commerce framework SAP Hybris. There the shop administrator can create > multiplicative boost rules (see below for an example) where a value like 2.0 > means that an item gets boosted to 200%. This works fine in the demo shop > distributed by SAP but breaks in our shop. We encountered the issue when > Upgrading from Solr 7.2.1 / Hybris 6.7 to Solr 7.5 / Hybris 18.8.3 (which > would have been named Hybris 6.8 but the version naming schema changed). > We reduced the Solr query generated by Hybris to the relevant parts and could > reproduce the issue in the Solr admin without any Hybris connection. > I attached the JSON result of a test query but here's a description of the > parts that seemed most relevant to me. > The {{responseHeader.params}} reads (slightly rearranged): > {code:java} > "q":"{!boost b=$ymb}(+{!lucene v=$yq})", > "ymb":"product(query({!v=\"name_text_de\\:Netzteil\\^=2.0\"},1),query({!v=\"name_text_de\\:Sony\\^=3.0\"},1))", > "yq":"*:*", > "sort":"score desc", > "debugQuery":"true", > // Added to keep the output small but probably unrelated to the actual issue > "fl":"score,id,code_string,name_text_de", > "fq":"catalogId:\"someProducts\"", > "rows":"10", > {code} > This example boosts the German product name (field {{name_text_de}}) in case > in contains certain terms: > * "Netzteil" (power supply) is boosted to 200% > * "Sony" is boosted to 300% > Consequently a product containing both terms should be boosted to 600%. > Also the query function has the value 1 specified as default in case the name > does not contain the respective term resulting in a pseudo boost that > preserves the score. > According to the debug information the parser used is the LuceneQParser, > which translates this to the following parsed query: > {quote}FunctionScoreQuery(FunctionScoreQuery(+*:*, scored by > boost(product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0) > {quote} > And the translated boost is: > {quote}org.apache.lucene.queries.function.valuesource.ProductFloatFunction:product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0)) > {quote} > When taking a look at the search result, among other the following products > are included (see the JSON comments for an analysis of each result): > {code:javascript} > { > "id":"someProducts/Online/test711", > "name_text_de":"Original Sony Vaio Netzteil", > "code_string":"test711", > // CORRECT, both "Netzteil" and "Sony" are included in the name > "score":6.0}, > { > "id":"someProducts/Online/taxTestingProductThree", > "name_text_de":"Steuertestprodukt Zwei", > "code_string":"taxTestingProductThree", > // CORRECT, neither "Netzteil" nor "Sony" are included in the name > "score":1.0}, > { >
[jira] [Commented] (SOLR-13126) Inconsistent score in debug and result with multiple multiplicative boosts
[ https://issues.apache.org/jira/browse/SOLR-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16767308#comment-16767308 ] Tom Burgmans commented on SOLR-13126: - I could reproduce this with 2 cases: *Multiplicative boost with _field_ function query:* {noformat} http://localhost:8983/solr/test/select?defType=edismax=sku,score,[explain style=text]=:=sum(field(price),4)=dictionary_id:cheetah {noformat} !image-2019-02-13-16-17-56-272.png|width=1090,height=439! *Multiplicative boost with _query_ function query:* {noformat} http://localhost:8983/solr/test/select?defType=edismax=sku,min_term,score,[explain%20style=text]=min_term:shelter=sum(3,query($BoostQuery))={!edismax%20boost=3%20bf=}min_term:(%22xxxdoesnexistxxx%22)=dictionary_id:cheetah {noformat} !screenshot-1.png|width=1258,height=640! Our analysis is that the EXPLAIN provides the correct value but the actual score is wrong. Since we rely heavily on multiplicative boost by various function queries, this is a showstopper for us to upgrade from Solr 6. > Inconsistent score in debug and result with multiple multiplicative boosts > -- > > Key: SOLR-13126 > URL: https://issues.apache.org/jira/browse/SOLR-13126 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Affects Versions: 7.5.0 > Environment: Reproduced with macOS 10.14.1, a quick test with Windows > 10 showed the same result. >Reporter: Thomas Aglassinger >Priority: Major > Attachments: > 0001-use-deprecated-classes-to-fix-regression-introduced-.patch, > 0002-SOLR-13126-Added-test-case.patch, debugQuery.json, > image-2019-02-13-16-17-56-272.png, screenshot-1.png, > solr_match_neither_nextteil_nor_sony.json, > solr_match_neither_nextteil_nor_sony.txt, solr_match_netzteil_and_sony.json, > solr_match_netzteil_and_sony.txt, solr_match_netzteil_only.json, > solr_match_netzteil_only.txt > > > Under certain circumstances search results from queries with multiple > multiplicative boosts using the Solr functions {{product()}} and {{query()}} > result in a score that is inconsistent with the one from the debugQuery > information. Also only the debug score is correct while the actual search > results show a wrong score. > This seems somewhat similar to the behaviour described in > https://issues.apache.org/jira/browse/LUCENE-7132, though this issue has been > resolved a while ago. > A little background: we are using Solr as a search platform for the > e-commerce framework SAP Hybris. There the shop administrator can create > multiplicative boost rules (see below for an example) where a value like 2.0 > means that an item gets boosted to 200%. This works fine in the demo shop > distributed by SAP but breaks in our shop. We encountered the issue when > Upgrading from Solr 7.2.1 / Hybris 6.7 to Solr 7.5 / Hybris 18.8.3 (which > would have been named Hybris 6.8 but the version naming schema changed). > We reduced the Solr query generated by Hybris to the relevant parts and could > reproduce the issue in the Solr admin without any Hybris connection. > I attached the JSON result of a test query but here's a description of the > parts that seemed most relevant to me. > The {{responseHeader.params}} reads (slightly rearranged): > {code:java} > "q":"{!boost b=$ymb}(+{!lucene v=$yq})", > "ymb":"product(query({!v=\"name_text_de\\:Netzteil\\^=2.0\"},1),query({!v=\"name_text_de\\:Sony\\^=3.0\"},1))", > "yq":"*:*", > "sort":"score desc", > "debugQuery":"true", > // Added to keep the output small but probably unrelated to the actual issue > "fl":"score,id,code_string,name_text_de", > "fq":"catalogId:\"someProducts\"", > "rows":"10", > {code} > This example boosts the German product name (field {{name_text_de}}) in case > in contains certain terms: > * "Netzteil" (power supply) is boosted to 200% > * "Sony" is boosted to 300% > Consequently a product containing both terms should be boosted to 600%. > Also the query function has the value 1 specified as default in case the name > does not contain the respective term resulting in a pseudo boost that > preserves the score. > According to the debug information the parser used is the LuceneQParser, > which translates this to the following parsed query: > {quote}FunctionScoreQuery(FunctionScoreQuery(+*:*, scored by > boost(product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0) > {quote} > And the translated boost is: > {quote}org.apache.lucene.queries.function.valuesource.ProductFloatFunction:product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0)) > {quote} > When taking a look at the
[jira] [Commented] (SOLR-13126) Inconsistent score in debug and result with multiple multiplicative boosts
[ https://issues.apache.org/jira/browse/SOLR-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744256#comment-16744256 ] Thomas Aglassinger commented on SOLR-13126: --- We digged in further and seem to have found the culprit. The test case in the attached patch {{0002-SOLR-13126-Added-test-case.patch}} reproduces the bug. The last working version is Solr 7.2.1. Using {{git bisect}} we found out that the issue got introduced with LUCENE-8099 (a refactoring). There's two changes that break the scoring in different ways: * [LUCENE-8099: Deprecate CustomScoreQuery, BoostedQuery, BoostingQuery|https://github.com/apache/lucene-solr/commit/b01e6023e1cd3c62260b38c05c8d145ba143a2ac] * [LUCENE-8099: Replace BoostQParserPlugin.boostQuery() with FunctionScoreQuery.boostByValue()|https://github.com/apache/lucene-solr/commit/0744fea821366a853b8e239e766b9786ef96cb27] The attached patch {{0001-use-deprecated-classes-to-fix-regression-introduced-.patch}} includes an experimental fix by reverting some parts of the code to its previous version based on a deprecated class the refactoring of LUCENE-8099 tried to replace (among other things). This is a rough initial patch with the following known issues: # The patch goes towards solr 7.5.0. This is the version we currently experience the issues with and attempt to get back to working for production use. Ideally of course the patch would go towards the master and then merged back to earlier versions. # The fix uses a deprecated class. Ideally it would fix the refactored classes from LUCENE-8099. Nevertheless the test case is generic enough to run on all branches, including the current master. > Inconsistent score in debug and result with multiple multiplicative boosts > -- > > Key: SOLR-13126 > URL: https://issues.apache.org/jira/browse/SOLR-13126 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Affects Versions: 7.5.0 > Environment: Reproduced with macOS 10.14.1, a quick test with Windows > 10 showed the same result. >Reporter: Thomas Aglassinger >Priority: Major > Attachments: debugQuery.json, > solr_match_neither_nextteil_nor_sony.json, > solr_match_neither_nextteil_nor_sony.txt, solr_match_netzteil_and_sony.json, > solr_match_netzteil_and_sony.txt, solr_match_netzteil_only.json, > solr_match_netzteil_only.txt > > > Under certain circumstances search results from queries with multiple > multiplicative boosts using the Solr functions {{product()}} and {{query()}} > result in a score that is inconsistent with the one from the debugQuery > information. Also only the debug score is correct while the actual search > results show a wrong score. > This seems somewhat similar to the behaviour described in > https://issues.apache.org/jira/browse/LUCENE-7132, though this issue has been > resolved a while ago. > A little background: we are using Solr as a search platform for the > e-commerce framework SAP Hybris. There the shop administrator can create > multiplicative boost rules (see below for an example) where a value like 2.0 > means that an item gets boosted to 200%. This works fine in the demo shop > distributed by SAP but breaks in our shop. We encountered the issue when > Upgrading from Solr 7.2.1 / Hybris 6.7 to Solr 7.5 / Hybris 18.8.3 (which > would have been named Hybris 6.8 but the version naming schema changed). > We reduced the Solr query generated by Hybris to the relevant parts and could > reproduce the issue in the Solr admin without any Hybris connection. > I attached the JSON result of a test query but here's a description of the > parts that seemed most relevant to me. > The {{responseHeader.params}} reads (slightly rearranged): > {code:java} > "q":"{!boost b=$ymb}(+{!lucene v=$yq})", > "ymb":"product(query({!v=\"name_text_de\\:Netzteil\\^=2.0\"},1),query({!v=\"name_text_de\\:Sony\\^=3.0\"},1))", > "yq":"*:*", > "sort":"score desc", > "debugQuery":"true", > // Added to keep the output small but probably unrelated to the actual issue > "fl":"score,id,code_string,name_text_de", > "fq":"catalogId:\"someProducts\"", > "rows":"10", > {code} > This example boosts the German product name (field {{name_text_de}}) in case > in contains certain terms: > * "Netzteil" (power supply) is boosted to 200% > * "Sony" is boosted to 300% > Consequently a product containing both terms should be boosted to 600%. > Also the query function has the value 1 specified as default in case the name > does not contain the respective term resulting in a pseudo boost that > preserves the score. > According to the debug information the parser used is the LuceneQParser, > which translates this to the following parsed
[jira] [Commented] (SOLR-13126) Inconsistent score in debug and result with multiple multiplicative boosts
[ https://issues.apache.org/jira/browse/SOLR-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742343#comment-16742343 ] Thomas Aglassinger commented on SOLR-13126: --- We've been digging into this and managed to somewhat track the issue down although unfortunately our knowledge of the inner workings of Solr and Lucene in particular is not sufficient to fix it and provide a patch. We did however add logging statements that showcase the difference in the scoring for some trivial queries. To make the logging easier to read we refactored several anonymous classes to inner classes with expressive names and added several {{toString()}} functions. The log messages are deliberately written with level warning so we can easily separate them from Solr's own info and debug messages. If it helps we can make these changes available although it's not feasible to merge them because they are only debug hacks. Here's what we found out so far: As described in the initial issue description we can reproduce that the score of a query result is computed correctly in the explain segments but incorrectly in the actual result if only one of two multiplicative boost conditions match. We now further simplified our query by splitting it into 3 separate queries with a filter query on one specific document. The cases are: # name matches both boost (netzteil and sony): Original Sony Vaio Netzteil # name matches one boost (netzteil but not sony): GS-Netzteil 20W schwarz # name matches no boost (neither netzteil nor sony): Camcorderband DV 100min (2) Attached you find the log files for these queries and the JSON of the queries themselves. This time we did not enable debugQuery in order to log only the incorrect score of the actual result. Each request was executed on a freshly restarted server (local, no replication, no shards) to ensure caching does not pollute the findings. We made the following observations: # Both matches: lucene detects both matches with {{QueryDocValues.exists()}} and then computes scores for them using QueryDocValues.floatValue(). This seems to be called eventually by the scorer utilized by the result of {{org.apache.lucene.search.DoubleValues#withDefault()}} based on a formerly anonymous class renamed to DoubleValues_DoubleValuesWithDefault() # Single match: {{QueryDocValues.exists()}} detects one match and considers the other false (which seems correct). After that however it only seems to work with various variants of a constant score of 1.0, which in the end results in 1.0. Notice that this query uses the same {{withDefault()}} as above but performs a very different computation mostly based on constant values. There is no call to {{QueryDocValues.floatVal()}} # No match: {{QueryDocValues.exists()}} does not find anything and results in a score of 1.0 as expected. # All logs seems to compute the score for a document with the ID -1, which utilizes {{QueryDocValues.floatVal()}}. As far as we understand this seems to be some initialization step independent of the actual query that happens only for the first query sent to the server. Interestingly when you compare the logs for single and no match the are almost identical apart from the {{QueryDocValues.exists()}}, an additional {{BooleanWeight()}} and various {{toString()}} hashes. Our expectation would have been that queries for single and both matches would have produced a fairly similar log using similar scorers but different scores (2.0 vs 6.0). As we can reproduce these results consistently in a small testing environment we currently see the following options to proceed further: # With some hints on where to further dig into the source code we might be able to find the real culprit causing the inconsistent score. Any pointers? # We could make the solrconfig.xml, schema.xml and the core files for Solr 7.5 available for someone else to debug who has a better grasp of the inner workings. Again, this is small test environment with only a few documents, and we could probably reduce this further (e.g. by removing Solr fields unrelated to this issue). Any help would be much appreciated, Thomas > Inconsistent score in debug and result with multiple multiplicative boosts > -- > > Key: SOLR-13126 > URL: https://issues.apache.org/jira/browse/SOLR-13126 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Affects Versions: 7.5.0 > Environment: Reproduced with macOS 10.14.1, a quick test with Windows > 10 showed the same result. >Reporter: Thomas Aglassinger >Priority: Major > Attachments: debugQuery.json, > solr_match_neither_nextteil_nor_sony.json, >