[jira] [Commented] (SOLR-11711) Improve mincount & limit usage in pivot & field facets
[ https://issues.apache.org/jira/browse/SOLR-11711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16284136#comment-16284136 ] Houston Putman commented on SOLR-11711: --- The deprecation fix is in now. > Improve mincount & limit usage in pivot & field facets > -- > > Key: SOLR-11711 > URL: https://issues.apache.org/jira/browse/SOLR-11711 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: faceting >Affects Versions: master (8.0) >Reporter: Houston Putman >Assignee: Hoss Man > Labels: pull-request-available > Fix For: 5.6, 6.7, 7.2 > > > Currently while sending pivot facet requests to each shard, the > {{facet.pivot.mincount}} is set to {{0}} if the facet is sorted by count with > a specified limit > 0. However with a mincount of 0, the pivot facet will use > exponentially more wasted memory for every pivot field added. This is because > there will be a total of {{limit^(# of pivots)}} pivot values created in > memory, even though the vast majority of them will have counts of 0, and are > therefore useless. > Imagine the scenario of a pivot facet with 3 levels, and > {{facet.limit=1000}}. There will be a billion pivot values created, and there > will almost definitely be nowhere near a billion pivot values with counts > 0. > This likely due to the reasoning mentioned in [this comment in the original > distributed pivot facet > ticket|https://issues.apache.org/jira/browse/SOLR-2894?focusedCommentId=13979898]. > Basically it was thought that the refinement code would need to know that a > count was 0 for a shard so that a refinement request wasn't sent to that > shard. However this is checked in the code, [in this part of the refinement > candidate > checking|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.1.0/solr/core/src/java/org/apache/solr/handler/component/PivotFacetField.java#L275]. > Therefore if the {{pivot.mincount}} was set to 1, the non-existent values > would either: > * Not be known, because the {{facet.limit}} was smaller than the number of > facet values with positive counts. This isn't an issue, because they wouldn't > have been returned with {{pivot.mincount}} set to 0. > * Would be known, because the {{facet.limit}} would be larger than the number > of facet values returned. therefore this conditional would return false > (since we are only talking about pivot facets sorted by count). > The solution, is to use the same pivot mincount as would be used if no limit > was specified. > This also relates to a similar problem in field faceting that was "fixed" in > [SOLR-8988|https://issues.apache.org/jira/browse/SOLR-8988#13324]. The > solution was to add a flag, {{facet.distrib.mco}}, which would enable not > choosing a mincount of 0 when unnessesary. Since this flag can only increase > performance, and doesn't break any queries I have removed it as an option and > replaced the code to use the feature always. > There was one code change necessary to fix the MCO option, since the > refinement candidate selection logic had a bug. The bug only occured with a > minCount > 0 and limit > 0 specified. When a shard replied with less than the > limit requested, it would assume the next maximum count on that shard was the > {{mincount}}, where it would actually be the {{mincount-1}} (because a facet > value with a count of mincount would have been returned). Therefore the MCO > didn't cause any errors, but with a mincount of 1 the refinement logic always > assumed that the shard had more values with a count of 1. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11711) Improve mincount & limit usage in pivot & field facets
[ https://issues.apache.org/jira/browse/SOLR-11711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16284126#comment-16284126 ] Houston Putman commented on SOLR-11711: --- Thanks for taking a look and running those tests! I'll add back in the {{FACET_DISTRIB_MCO}} option and deprecate it for 7x. What are your thoughts to backporting this fix to 6x and 5x? > Improve mincount & limit usage in pivot & field facets > -- > > Key: SOLR-11711 > URL: https://issues.apache.org/jira/browse/SOLR-11711 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: faceting >Affects Versions: master (8.0) >Reporter: Houston Putman >Assignee: Hoss Man > Labels: pull-request-available > Fix For: 5.6, 6.7, 7.2 > > > Currently while sending pivot facet requests to each shard, the > {{facet.pivot.mincount}} is set to {{0}} if the facet is sorted by count with > a specified limit > 0. However with a mincount of 0, the pivot facet will use > exponentially more wasted memory for every pivot field added. This is because > there will be a total of {{limit^(# of pivots)}} pivot values created in > memory, even though the vast majority of them will have counts of 0, and are > therefore useless. > Imagine the scenario of a pivot facet with 3 levels, and > {{facet.limit=1000}}. There will be a billion pivot values created, and there > will almost definitely be nowhere near a billion pivot values with counts > 0. > This likely due to the reasoning mentioned in [this comment in the original > distributed pivot facet > ticket|https://issues.apache.org/jira/browse/SOLR-2894?focusedCommentId=13979898]. > Basically it was thought that the refinement code would need to know that a > count was 0 for a shard so that a refinement request wasn't sent to that > shard. However this is checked in the code, [in this part of the refinement > candidate > checking|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.1.0/solr/core/src/java/org/apache/solr/handler/component/PivotFacetField.java#L275]. > Therefore if the {{pivot.mincount}} was set to 1, the non-existent values > would either: > * Not be known, because the {{facet.limit}} was smaller than the number of > facet values with positive counts. This isn't an issue, because they wouldn't > have been returned with {{pivot.mincount}} set to 0. > * Would be known, because the {{facet.limit}} would be larger than the number > of facet values returned. therefore this conditional would return false > (since we are only talking about pivot facets sorted by count). > The solution, is to use the same pivot mincount as would be used if no limit > was specified. > This also relates to a similar problem in field faceting that was "fixed" in > [SOLR-8988|https://issues.apache.org/jira/browse/SOLR-8988#13324]. The > solution was to add a flag, {{facet.distrib.mco}}, which would enable not > choosing a mincount of 0 when unnessesary. Since this flag can only increase > performance, and doesn't break any queries I have removed it as an option and > replaced the code to use the feature always. > There was one code change necessary to fix the MCO option, since the > refinement candidate selection logic had a bug. The bug only occured with a > minCount > 0 and limit > 0 specified. When a shard replied with less than the > limit requested, it would assume the next maximum count on that shard was the > {{mincount}}, where it would actually be the {{mincount-1}} (because a facet > value with a count of mincount would have been returned). Therefore the MCO > didn't cause any errors, but with a mincount of 1 the refinement logic always > assumed that the shard had more values with a count of 1. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11711) Improve mincount & limit usage in pivot & field facets
[ https://issues.apache.org/jira/browse/SOLR-11711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16284045#comment-16284045 ] Hoss Man commented on SOLR-11711: - I think you assessment makes sense (and thank you for all the due dilligence and back linking to the relevant comments/jiras!) ... I'm hammering on the randomized tests now just to sanity check that we're not missing something obvious, but overall i'm +1 to the patch. My one objection is to the immediate removal of the {{FACET_DISTRIB_MCO}} constant from FacetParams.java. The patch we commit & backport to 7x should only deprecate that param and remove it's _usage_ in existing code, that way users who upgrade will get a deprecation warning when compiling their solrj code, but not a compilation failure. once the backport is done we can do a separate commit to remove it from master. if you feel inclined to revise your patch/pr to deal with the deprecation i'll aim for committing/backporting monday baring test failures -- but if you don't have time no worries: it's a trivial thing for me to make myself locally before committing > Improve mincount & limit usage in pivot & field facets > -- > > Key: SOLR-11711 > URL: https://issues.apache.org/jira/browse/SOLR-11711 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: faceting >Affects Versions: master (8.0) >Reporter: Houston Putman > Labels: pull-request-available > Fix For: 5.6, 6.7, 7.2 > > > Currently while sending pivot facet requests to each shard, the > {{facet.pivot.mincount}} is set to {{0}} if the facet is sorted by count with > a specified limit > 0. However with a mincount of 0, the pivot facet will use > exponentially more wasted memory for every pivot field added. This is because > there will be a total of {{limit^(# of pivots)}} pivot values created in > memory, even though the vast majority of them will have counts of 0, and are > therefore useless. > Imagine the scenario of a pivot facet with 3 levels, and > {{facet.limit=1000}}. There will be a billion pivot values created, and there > will almost definitely be nowhere near a billion pivot values with counts > 0. > This likely due to the reasoning mentioned in [this comment in the original > distributed pivot facet > ticket|https://issues.apache.org/jira/browse/SOLR-2894?focusedCommentId=13979898]. > Basically it was thought that the refinement code would need to know that a > count was 0 for a shard so that a refinement request wasn't sent to that > shard. However this is checked in the code, [in this part of the refinement > candidate > checking|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.1.0/solr/core/src/java/org/apache/solr/handler/component/PivotFacetField.java#L275]. > Therefore if the {{pivot.mincount}} was set to 1, the non-existent values > would either: > * Not be known, because the {{facet.limit}} was smaller than the number of > facet values with positive counts. This isn't an issue, because they wouldn't > have been returned with {{pivot.mincount}} set to 0. > * Would be known, because the {{facet.limit}} would be larger than the number > of facet values returned. therefore this conditional would return false > (since we are only talking about pivot facets sorted by count). > The solution, is to use the same pivot mincount as would be used if no limit > was specified. > This also relates to a similar problem in field faceting that was "fixed" in > [SOLR-8988|https://issues.apache.org/jira/browse/SOLR-8988#13324]. The > solution was to add a flag, {{facet.distrib.mco}}, which would enable not > choosing a mincount of 0 when unnessesary. Since this flag can only increase > performance, and doesn't break any queries I have removed it as an option and > replaced the code to use the feature always. > There was one code change necessary to fix the MCO option, since the > refinement candidate selection logic had a bug. The bug only occured with a > minCount > 0 and limit > 0 specified. When a shard replied with less than the > limit requested, it would assume the next maximum count on that shard was the > {{mincount}}, where it would actually be the {{mincount-1}} (because a facet > value with a count of mincount would have been returned). Therefore the MCO > didn't cause any errors, but with a mincount of 1 the refinement logic always > assumed that the shard had more values with a count of 1. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11711) Improve mincount & limit usage in pivot & field facets
[ https://issues.apache.org/jira/browse/SOLR-11711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16280669#comment-16280669 ] Houston Putman commented on SOLR-11711: --- Hey [~hossman], any thoughts on this patch? Particularly the field facet part. You mentioned in [this comment|https://issues.apache.org/jira/browse/SOLR-8988?focusedCommentId=15241993&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15241993] why you would be wary of changing the behavior due to the comment in the code. [~k317h] and I believe that the fixing the {{last = Math.max(0, initialMincount - 1);}} line will address why anyone was seeing performance degredation with the {{facet.distrib.mco}} option given. And we can't find another reason why additional refinement would be needed. > Improve mincount & limit usage in pivot & field facets > -- > > Key: SOLR-11711 > URL: https://issues.apache.org/jira/browse/SOLR-11711 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: faceting >Affects Versions: master (8.0) >Reporter: Houston Putman > Labels: pull-request-available > Fix For: 5.6, 6.7, 7.2 > > > Currently while sending pivot facet requests to each shard, the > {{facet.pivot.mincount}} is set to {{0}} if the facet is sorted by count with > a specified limit > 0. However with a mincount of 0, the pivot facet will use > exponentially more wasted memory for every pivot field added. This is because > there will be a total of {{limit^(# of pivots)}} pivot values created in > memory, even though the vast majority of them will have counts of 0, and are > therefore useless. > Imagine the scenario of a pivot facet with 3 levels, and > {{facet.limit=1000}}. There will be a billion pivot values created, and there > will almost definitely be nowhere near a billion pivot values with counts > 0. > This likely due to the reasoning mentioned in [this comment in the original > distributed pivot facet > ticket|https://issues.apache.org/jira/browse/SOLR-2894?focusedCommentId=13979898]. > Basically it was thought that the refinement code would need to know that a > count was 0 for a shard so that a refinement request wasn't sent to that > shard. However this is checked in the code, [in this part of the refinement > candidate > checking|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.1.0/solr/core/src/java/org/apache/solr/handler/component/PivotFacetField.java#L275]. > Therefore if the {{pivot.mincount}} was set to 1, the non-existent values > would either: > * Not be known, because the {{facet.limit}} was smaller than the number of > facet values with positive counts. This isn't an issue, because they wouldn't > have been returned with {{pivot.mincount}} set to 0. > * Would be known, because the {{facet.limit}} would be larger than the number > of facet values returned. therefore this conditional would return false > (since we are only talking about pivot facets sorted by count). > The solution, is to use the same pivot mincount as would be used if no limit > was specified. > This also relates to a similar problem in field faceting that was "fixed" in > [SOLR-8988|https://issues.apache.org/jira/browse/SOLR-8988#13324]. The > solution was to add a flag, {{facet.distrib.mco}}, which would enable not > choosing a mincount of 0 when unnessesary. Since this flag can only increase > performance, and doesn't break any queries I have removed it as an option and > replaced the code to use the feature always. > There was one code change necessary to fix the MCO option, since the > refinement candidate selection logic had a bug. The bug only occured with a > minCount > 0 and limit > 0 specified. When a shard replied with less than the > limit requested, it would assume the next maximum count on that shard was the > {{mincount}}, where it would actually be the {{mincount-1}} (because a facet > value with a count of mincount would have been returned). Therefore the MCO > didn't cause any errors, but with a mincount of 1 the refinement logic always > assumed that the shard had more values with a count of 1. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org