[ 
https://issues.apache.org/jira/browse/SOLR-11729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-11729:
----------------------------
    Description: 
When FacetComponent first got support for distributed search, the default 
"effective shard limit" done on shards followed the formula...

{code}
limit = (int)(dff.initialLimit * 1.5) + 10;
{code}

...over time, this became configurable with the introduction of some expert 
level tuning options: {{facet.overrequest.ratio}} & {{facet.overrequest.count}} 
-- but the defaults (and basic formula) remain the same to this day...

{code}
      this.overrequestRatio
        = params.getFieldDouble(field, FacetParams.FACET_OVERREQUEST_RATIO, 
1.5);
      this.overrequestCount 
        = params.getFieldInt(field, FacetParams.FACET_OVERREQUEST_COUNT, 10);
...
  private int doOverRequestMath(int limit, double ratio, int count) {
    // NOTE: normally, "1.0F < ratio"
    //
    // if the user chooses a ratio < 1, we allow it and don't "bottom out" at
    // the original limit until *after* we've also added the count.
    int adjustedLimit = (int) (limit * ratio) + count;
    return Math.max(limit, adjustedLimit);
  }
{code}

However...


When {{json.facet}} multi-shard refinement was added, the code was written 
slightly diff:

* there is an explicit {{overrequest:N}} (count) option
* if {{-1 == overrequest}} (which is the default) then an "effective shard 
limit" is computed using the same basic formula as in FacetComponet -- _*but 
the constants are different*_...
** {{effectiveLimit = (long) (effectiveLimit * 1.1 + 4);}}
* For any (non "-1") user specified {{overrequest}} value, it's added verbatim 
to the {{limit}} (which may have been user specified, or may just be the 
default)
** {{effectiveLimit += freq.overrequest;}}


Given the design of the {{json.facet}} syntax, I can understand why the code 
path for an "advanced" user specified {{overrequest:N}} option avoids using any 
(implicit) ratio calculation and just does the straight addition of {{limit += 
overrequest}}.

What I'm not clear on is the choice of the constants {{1.1}} and {{4}} in the 
common (default) case, and why those differ from the historically used {{1.5}} 
and {{10}}.

----

It may seem like a small thing to worry about, but it can/will cause odd 
inconsistencies when people try to migrate simple {{facet.field=foo}} (or 
{{facet.pivot=foo,bar}}) queries to {{json.facet}} -- I have also seen it give 
people attempting these types of migrations the (mistaken) impression that 
discrepancies they are seeing are because {{refine:true}} is not be working.

For this reason, I propose we change the (default) {{overrequest:-1}} behavior 
to use the same constants as the equivilent FacetComponent code...

{code}
if (fcontext.isShard()) {
  if (freq.overrequest == -1) {
    // add over-request if this is a shard request and if we have a small 
offset (large offsets will already be gathering many more buckets than needed)
    if (freq.offset < 10) {
      effectiveLimit = (long) (effectiveLimit * 1.5 + 10);
    }
    ...
{code}


  was:
When FacetComponent first got support for distributed search, the default 
"effective shard limit" done on shards followed the formula...

{code}
limit = (int)(dff.initialLimit * 1.5) + 10;
{code}

...over time, this became configurable with the introduction of some expert 
level tuning options: {{facet.overrequest.ratio}} & {{facet.overrequest.count}} 
-- but the defaults (and basic formula) remain the same to this day...

{code}
      this.overrequestRatio
        = params.getFieldDouble(field, FacetParams.FACET_OVERREQUEST_RATIO, 
1.5);
      this.overrequestCount 
        = params.getFieldInt(field, FacetParams.FACET_OVERREQUEST_COUNT, 10);
...
  private int doOverRequestMath(int limit, double ratio, int count) {
    // NOTE: normally, "1.0F < ratio"
    //
    // if the user chooses a ratio < 1, we allow it and don't "bottom out" at
    // the original limit until *after* we've also added the count.
    int adjustedLimit = (int) (limit * ratio) + count;
    return Math.max(limit, adjustedLimit);
  }
{code}

However...


When {{json.facet}} multi-shard refinement was added, the code was written 
slightly diff:

* there is an explicit {{overrequest:N}} (count) option
* if {{-1 == overrequest}} (which is the default) then an "effective shard 
limit" is computed using the same basic formula as in FacetComponet -- _*but 
the constants are different*_...
** {{effectiveLimit = (long) (effectiveLimit * 1.1 + 4);}}
* For any (non "-1") user specified {{overrequest}} value, it's added verbatim 
to the {{limit}} (which may have been user specified, or may just be the 
default)
** {{effectiveLimit += freq.overrequest;}}


Given the design of the {{json.facet}} syntax, I can understand why the code 
path for an "advanced" user specified {{overrequest:N}} option avoids using any 
(implicit) ratio calculation and just does the straight addition of {{limit += 
overrequest}}.

What I'm not clear on is the choice of the constants {{1.1}} and {{4}} in the 
common (default) case, and why those differ from the historically used {{1.5}} 
and {{6}}.

----

It may seem like a small thing to worry about, but it can/will cause odd 
inconsistencies when people try to migrate simple {{facet.field=foo}} (or 
{{facet.pivot=foo,bar}}) queries to {{json.facet}} -- I have also seen it give 
people attempting these types of migrations the (mistaken) impression that 
discrepancies they are seeing are because {{refine:true}} is not be working.

For this reason, I propose we change the (default) {{overrequest:-1}} behavior 
to use the same constants as the equivilent FacetComponent code...

{code}
if (fcontext.isShard()) {
  if (freq.overrequest == -1) {
    // add over-request if this is a shard request and if we have a small 
offset (large offsets will already be gathering many more buckets than needed)
    if (freq.offset < 10) {
      effectiveLimit = (long) (effectiveLimit * 1.5 + 6);
    }
    ...
{code}



(fixed stupid -6- => 10 typo in description ... had 6 on the brain because it's 
the limit value used in a test i was looking at)

> Increase default overrequest ratio/count in json.facet to match existing 
> defaults for facet.overrequest.ratio & facet.overrequest.count ?
> -----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-11729
>                 URL: https://issues.apache.org/jira/browse/SOLR-11729
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>
> When FacetComponent first got support for distributed search, the default 
> "effective shard limit" done on shards followed the formula...
> {code}
> limit = (int)(dff.initialLimit * 1.5) + 10;
> {code}
> ...over time, this became configurable with the introduction of some expert 
> level tuning options: {{facet.overrequest.ratio}} & 
> {{facet.overrequest.count}} -- but the defaults (and basic formula) remain 
> the same to this day...
> {code}
>       this.overrequestRatio
>         = params.getFieldDouble(field, FacetParams.FACET_OVERREQUEST_RATIO, 
> 1.5);
>       this.overrequestCount 
>         = params.getFieldInt(field, FacetParams.FACET_OVERREQUEST_COUNT, 10);
> ...
>   private int doOverRequestMath(int limit, double ratio, int count) {
>     // NOTE: normally, "1.0F < ratio"
>     //
>     // if the user chooses a ratio < 1, we allow it and don't "bottom out" at
>     // the original limit until *after* we've also added the count.
>     int adjustedLimit = (int) (limit * ratio) + count;
>     return Math.max(limit, adjustedLimit);
>   }
> {code}
> However...
> When {{json.facet}} multi-shard refinement was added, the code was written 
> slightly diff:
> * there is an explicit {{overrequest:N}} (count) option
> * if {{-1 == overrequest}} (which is the default) then an "effective shard 
> limit" is computed using the same basic formula as in FacetComponet -- _*but 
> the constants are different*_...
> ** {{effectiveLimit = (long) (effectiveLimit * 1.1 + 4);}}
> * For any (non "-1") user specified {{overrequest}} value, it's added 
> verbatim to the {{limit}} (which may have been user specified, or may just be 
> the default)
> ** {{effectiveLimit += freq.overrequest;}}
> Given the design of the {{json.facet}} syntax, I can understand why the code 
> path for an "advanced" user specified {{overrequest:N}} option avoids using 
> any (implicit) ratio calculation and just does the straight addition of 
> {{limit += overrequest}}.
> What I'm not clear on is the choice of the constants {{1.1}} and {{4}} in the 
> common (default) case, and why those differ from the historically used 
> {{1.5}} and {{10}}.
> ----
> It may seem like a small thing to worry about, but it can/will cause odd 
> inconsistencies when people try to migrate simple {{facet.field=foo}} (or 
> {{facet.pivot=foo,bar}}) queries to {{json.facet}} -- I have also seen it 
> give people attempting these types of migrations the (mistaken) impression 
> that discrepancies they are seeing are because {{refine:true}} is not be 
> working.
> For this reason, I propose we change the (default) {{overrequest:-1}} 
> behavior to use the same constants as the equivilent FacetComponent code...
> {code}
> if (fcontext.isShard()) {
>   if (freq.overrequest == -1) {
>     // add over-request if this is a shard request and if we have a small 
> offset (large offsets will already be gathering many more buckets than needed)
>     if (freq.offset < 10) {
>       effectiveLimit = (long) (effectiveLimit * 1.5 + 10);
>     }
>     ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to