[jira] [Commented] (SOLR-11711) distributed pivot & field facets can processes excessive docs unneccessarily due to internal mincount=0

2017-12-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16287988#comment-16287988
 ] 

ASF GitHub Bot commented on SOLR-11711:
---

Github user HoustonPutman closed the pull request at:

https://github.com/apache/lucene-solr/pull/279


> distributed pivot & field facets can processes excessive docs unneccessarily 
> due to internal mincount=0
> ---
>
> Key: SOLR-11711
> URL: https://issues.apache.org/jira/browse/SOLR-11711
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: faceting
>Affects Versions: master (8.0)
>Reporter: Houston Putman
>Assignee: Hoss Man
>  Labels: pull-request-available
> Fix For: master (8.0), 7.3
>
>
> Currently while sending pivot facet requests to each shard, the 
> {{facet.pivot.mincount}} is set to {{0}} if the facet is sorted by count with 
> a specified limit > 0. However with a mincount of 0, the pivot facet will use 
> exponentially more wasted memory for every pivot field added. This is because 
> there will be a total of {{limit^(# of pivots)}} pivot values created in 
> memory, even though the vast majority of them will have counts of 0, and are 
> therefore useless.
> Imagine the scenario of a pivot facet with 3 levels, and 
> {{facet.limit=1000}}. There will be a billion pivot values created, and there 
> will almost definitely be nowhere near a billion pivot values with counts > 0.
> This likely due to the reasoning mentioned in [this comment in the original 
> distributed pivot facet 
> ticket|https://issues.apache.org/jira/browse/SOLR-2894?focusedCommentId=13979898].
>  Basically it was thought that the refinement code would need to know that a 
> count was 0 for a shard so that a refinement request wasn't sent to that 
> shard. However this is checked in the code, [in this part of the refinement 
> candidate 
> checking|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.1.0/solr/core/src/java/org/apache/solr/handler/component/PivotFacetField.java#L275].
>  Therefore if the {{pivot.mincount}} was set to 1, the non-existent values 
> would either:
> * Not be known, because the {{facet.limit}} was smaller than the number of 
> facet values with positive counts. This isn't an issue, because they wouldn't 
> have been returned with {{pivot.mincount}} set to 0.
> * Would be known, because the {{facet.limit}} would be larger than the number 
> of facet values returned. therefore this conditional would return false 
> (since we are only talking about pivot facets sorted by count).
> The solution, is to use the same pivot mincount as would be used if no limit 
> was specified. 
> This also relates to a similar problem in field faceting that was "fixed" in 
> [SOLR-8988|https://issues.apache.org/jira/browse/SOLR-8988#13324]. The 
> solution was to add a flag, {{facet.distrib.mco}}, which would enable not 
> choosing a mincount of 0 when unnessesary. Since this flag can only increase 
> performance, and doesn't break any queries I have removed it as an option and 
> replaced the code to use the feature always. 
> There was one code change necessary to fix the MCO option, since the 
> refinement candidate selection logic had a bug. The bug only occured with a 
> minCount > 0 and limit > 0 specified. When a shard replied with less than the 
> limit requested, it would assume the next maximum count on that shard was the 
> {{mincount}}, where it would actually be the {{mincount-1}} (because a facet 
> value with a count of mincount would have been returned). Therefore the MCO 
> didn't cause any errors, but with a mincount of 1 the refinement logic always 
> assumed that the shard had more values with a count of 1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11711) distributed pivot & field facets can processes excessive docs unneccessarily due to internal mincount=0

2017-12-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286315#comment-16286315
 ] 

ASF subversion and git services commented on SOLR-11711:


Commit 41113ecbb62694e7e07bd236ecd0b5169bd62547 in lucene-solr's branch 
refs/heads/branch_7x from Chris Hostetter
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=41113ec ]

SOLR-11711: Fixed distributed processing of facet.field/facet.pivot sub 
requests to prevent requesting unneccessary and excessive '0' count terms from 
each shard

(cherry picked from commit efc2f32ea05029edff9144f163d0619d091d1ba3)


> distributed pivot & field facets can processes excessive docs unneccessarily 
> due to internal mincount=0
> ---
>
> Key: SOLR-11711
> URL: https://issues.apache.org/jira/browse/SOLR-11711
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: faceting
>Affects Versions: master (8.0)
>Reporter: Houston Putman
>Assignee: Hoss Man
>  Labels: pull-request-available
>
> Currently while sending pivot facet requests to each shard, the 
> {{facet.pivot.mincount}} is set to {{0}} if the facet is sorted by count with 
> a specified limit > 0. However with a mincount of 0, the pivot facet will use 
> exponentially more wasted memory for every pivot field added. This is because 
> there will be a total of {{limit^(# of pivots)}} pivot values created in 
> memory, even though the vast majority of them will have counts of 0, and are 
> therefore useless.
> Imagine the scenario of a pivot facet with 3 levels, and 
> {{facet.limit=1000}}. There will be a billion pivot values created, and there 
> will almost definitely be nowhere near a billion pivot values with counts > 0.
> This likely due to the reasoning mentioned in [this comment in the original 
> distributed pivot facet 
> ticket|https://issues.apache.org/jira/browse/SOLR-2894?focusedCommentId=13979898].
>  Basically it was thought that the refinement code would need to know that a 
> count was 0 for a shard so that a refinement request wasn't sent to that 
> shard. However this is checked in the code, [in this part of the refinement 
> candidate 
> checking|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.1.0/solr/core/src/java/org/apache/solr/handler/component/PivotFacetField.java#L275].
>  Therefore if the {{pivot.mincount}} was set to 1, the non-existent values 
> would either:
> * Not be known, because the {{facet.limit}} was smaller than the number of 
> facet values with positive counts. This isn't an issue, because they wouldn't 
> have been returned with {{pivot.mincount}} set to 0.
> * Would be known, because the {{facet.limit}} would be larger than the number 
> of facet values returned. therefore this conditional would return false 
> (since we are only talking about pivot facets sorted by count).
> The solution, is to use the same pivot mincount as would be used if no limit 
> was specified. 
> This also relates to a similar problem in field faceting that was "fixed" in 
> [SOLR-8988|https://issues.apache.org/jira/browse/SOLR-8988#13324]. The 
> solution was to add a flag, {{facet.distrib.mco}}, which would enable not 
> choosing a mincount of 0 when unnessesary. Since this flag can only increase 
> performance, and doesn't break any queries I have removed it as an option and 
> replaced the code to use the feature always. 
> There was one code change necessary to fix the MCO option, since the 
> refinement candidate selection logic had a bug. The bug only occured with a 
> minCount > 0 and limit > 0 specified. When a shard replied with less than the 
> limit requested, it would assume the next maximum count on that shard was the 
> {{mincount}}, where it would actually be the {{mincount-1}} (because a facet 
> value with a count of mincount would have been returned). Therefore the MCO 
> didn't cause any errors, but with a mincount of 1 the refinement logic always 
> assumed that the shard had more values with a count of 1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11711) distributed pivot & field facets can processes excessive docs unneccessarily due to internal mincount=0

2017-12-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286316#comment-16286316
 ] 

ASF subversion and git services commented on SOLR-11711:


Commit efc2f32ea05029edff9144f163d0619d091d1ba3 in lucene-solr's branch 
refs/heads/master from Chris Hostetter
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=efc2f32 ]

SOLR-11711: Fixed distributed processing of facet.field/facet.pivot sub 
requests to prevent requesting unneccessary and excessive '0' count terms from 
each shard


> distributed pivot & field facets can processes excessive docs unneccessarily 
> due to internal mincount=0
> ---
>
> Key: SOLR-11711
> URL: https://issues.apache.org/jira/browse/SOLR-11711
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: faceting
>Affects Versions: master (8.0)
>Reporter: Houston Putman
>Assignee: Hoss Man
>  Labels: pull-request-available
>
> Currently while sending pivot facet requests to each shard, the 
> {{facet.pivot.mincount}} is set to {{0}} if the facet is sorted by count with 
> a specified limit > 0. However with a mincount of 0, the pivot facet will use 
> exponentially more wasted memory for every pivot field added. This is because 
> there will be a total of {{limit^(# of pivots)}} pivot values created in 
> memory, even though the vast majority of them will have counts of 0, and are 
> therefore useless.
> Imagine the scenario of a pivot facet with 3 levels, and 
> {{facet.limit=1000}}. There will be a billion pivot values created, and there 
> will almost definitely be nowhere near a billion pivot values with counts > 0.
> This likely due to the reasoning mentioned in [this comment in the original 
> distributed pivot facet 
> ticket|https://issues.apache.org/jira/browse/SOLR-2894?focusedCommentId=13979898].
>  Basically it was thought that the refinement code would need to know that a 
> count was 0 for a shard so that a refinement request wasn't sent to that 
> shard. However this is checked in the code, [in this part of the refinement 
> candidate 
> checking|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.1.0/solr/core/src/java/org/apache/solr/handler/component/PivotFacetField.java#L275].
>  Therefore if the {{pivot.mincount}} was set to 1, the non-existent values 
> would either:
> * Not be known, because the {{facet.limit}} was smaller than the number of 
> facet values with positive counts. This isn't an issue, because they wouldn't 
> have been returned with {{pivot.mincount}} set to 0.
> * Would be known, because the {{facet.limit}} would be larger than the number 
> of facet values returned. therefore this conditional would return false 
> (since we are only talking about pivot facets sorted by count).
> The solution, is to use the same pivot mincount as would be used if no limit 
> was specified. 
> This also relates to a similar problem in field faceting that was "fixed" in 
> [SOLR-8988|https://issues.apache.org/jira/browse/SOLR-8988#13324]. The 
> solution was to add a flag, {{facet.distrib.mco}}, which would enable not 
> choosing a mincount of 0 when unnessesary. Since this flag can only increase 
> performance, and doesn't break any queries I have removed it as an option and 
> replaced the code to use the feature always. 
> There was one code change necessary to fix the MCO option, since the 
> refinement candidate selection logic had a bug. The bug only occured with a 
> minCount > 0 and limit > 0 specified. When a shard replied with less than the 
> limit requested, it would assume the next maximum count on that shard was the 
> {{mincount}}, where it would actually be the {{mincount-1}} (because a facet 
> value with a count of mincount would have been returned). Therefore the MCO 
> didn't cause any errors, but with a mincount of 1 the refinement logic always 
> assumed that the shard had more values with a count of 1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org