[ https://issues.apache.org/jira/browse/SOLR-11729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hoss Man updated SOLR-11729: ---------------------------- Description: When FacetComponent first got support for distributed search, the default "effective shard limit" done on shards followed the formula... {code} limit = (int)(dff.initialLimit * 1.5) + 10; {code} ...over time, this became configurable with the introduction of some expert level tuning options: {{facet.overrequest.ratio}} & {{facet.overrequest.count}} -- but the defaults (and basic formula) remain the same to this day... {code} this.overrequestRatio = params.getFieldDouble(field, FacetParams.FACET_OVERREQUEST_RATIO, 1.5); this.overrequestCount = params.getFieldInt(field, FacetParams.FACET_OVERREQUEST_COUNT, 10); ... private int doOverRequestMath(int limit, double ratio, int count) { // NOTE: normally, "1.0F < ratio" // // if the user chooses a ratio < 1, we allow it and don't "bottom out" at // the original limit until *after* we've also added the count. int adjustedLimit = (int) (limit * ratio) + count; return Math.max(limit, adjustedLimit); } {code} However... When {{json.facet}} multi-shard refinement was added, the code was written slightly diff: * there is an explicit {{overrequest:N}} (count) option * if {{-1 == overrequest}} (which is the default) then an "effective shard limit" is computed using the same basic formula as in FacetComponet -- _*but the constants are different*_... ** {{effectiveLimit = (long) (effectiveLimit * 1.1 + 4);}} * For any (non "-1") user specified {{overrequest}} value, it's added verbatim to the {{limit}} (which may have been user specified, or may just be the default) ** {{effectiveLimit += freq.overrequest;}} Given the design of the {{json.facet}} syntax, I can understand why the code path for an "advanced" user specified {{overrequest:N}} option avoids using any (implicit) ratio calculation and just does the straight addition of {{limit += overrequest}}. What I'm not clear on is the choice of the constants {{1.1}} and {{4}} in the common (default) case, and why those differ from the historically used {{1.5}} and {{10}}. ---- It may seem like a small thing to worry about, but it can/will cause odd inconsistencies when people try to migrate simple {{facet.field=foo}} (or {{facet.pivot=foo,bar}}) queries to {{json.facet}} -- I have also seen it give people attempting these types of migrations the (mistaken) impression that discrepancies they are seeing are because {{refine:true}} is not be working. For this reason, I propose we change the (default) {{overrequest:-1}} behavior to use the same constants as the equivilent FacetComponent code... {code} if (fcontext.isShard()) { if (freq.overrequest == -1) { // add over-request if this is a shard request and if we have a small offset (large offsets will already be gathering many more buckets than needed) if (freq.offset < 10) { effectiveLimit = (long) (effectiveLimit * 1.5 + 10); } ... {code} was: When FacetComponent first got support for distributed search, the default "effective shard limit" done on shards followed the formula... {code} limit = (int)(dff.initialLimit * 1.5) + 10; {code} ...over time, this became configurable with the introduction of some expert level tuning options: {{facet.overrequest.ratio}} & {{facet.overrequest.count}} -- but the defaults (and basic formula) remain the same to this day... {code} this.overrequestRatio = params.getFieldDouble(field, FacetParams.FACET_OVERREQUEST_RATIO, 1.5); this.overrequestCount = params.getFieldInt(field, FacetParams.FACET_OVERREQUEST_COUNT, 10); ... private int doOverRequestMath(int limit, double ratio, int count) { // NOTE: normally, "1.0F < ratio" // // if the user chooses a ratio < 1, we allow it and don't "bottom out" at // the original limit until *after* we've also added the count. int adjustedLimit = (int) (limit * ratio) + count; return Math.max(limit, adjustedLimit); } {code} However... When {{json.facet}} multi-shard refinement was added, the code was written slightly diff: * there is an explicit {{overrequest:N}} (count) option * if {{-1 == overrequest}} (which is the default) then an "effective shard limit" is computed using the same basic formula as in FacetComponet -- _*but the constants are different*_... ** {{effectiveLimit = (long) (effectiveLimit * 1.1 + 4);}} * For any (non "-1") user specified {{overrequest}} value, it's added verbatim to the {{limit}} (which may have been user specified, or may just be the default) ** {{effectiveLimit += freq.overrequest;}} Given the design of the {{json.facet}} syntax, I can understand why the code path for an "advanced" user specified {{overrequest:N}} option avoids using any (implicit) ratio calculation and just does the straight addition of {{limit += overrequest}}. What I'm not clear on is the choice of the constants {{1.1}} and {{4}} in the common (default) case, and why those differ from the historically used {{1.5}} and {{6}}. ---- It may seem like a small thing to worry about, but it can/will cause odd inconsistencies when people try to migrate simple {{facet.field=foo}} (or {{facet.pivot=foo,bar}}) queries to {{json.facet}} -- I have also seen it give people attempting these types of migrations the (mistaken) impression that discrepancies they are seeing are because {{refine:true}} is not be working. For this reason, I propose we change the (default) {{overrequest:-1}} behavior to use the same constants as the equivilent FacetComponent code... {code} if (fcontext.isShard()) { if (freq.overrequest == -1) { // add over-request if this is a shard request and if we have a small offset (large offsets will already be gathering many more buckets than needed) if (freq.offset < 10) { effectiveLimit = (long) (effectiveLimit * 1.5 + 6); } ... {code} (fixed stupid -6- => 10 typo in description ... had 6 on the brain because it's the limit value used in a test i was looking at) > Increase default overrequest ratio/count in json.facet to match existing > defaults for facet.overrequest.ratio & facet.overrequest.count ? > ----------------------------------------------------------------------------------------------------------------------------------------- > > Key: SOLR-11729 > URL: https://issues.apache.org/jira/browse/SOLR-11729 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Hoss Man > > When FacetComponent first got support for distributed search, the default > "effective shard limit" done on shards followed the formula... > {code} > limit = (int)(dff.initialLimit * 1.5) + 10; > {code} > ...over time, this became configurable with the introduction of some expert > level tuning options: {{facet.overrequest.ratio}} & > {{facet.overrequest.count}} -- but the defaults (and basic formula) remain > the same to this day... > {code} > this.overrequestRatio > = params.getFieldDouble(field, FacetParams.FACET_OVERREQUEST_RATIO, > 1.5); > this.overrequestCount > = params.getFieldInt(field, FacetParams.FACET_OVERREQUEST_COUNT, 10); > ... > private int doOverRequestMath(int limit, double ratio, int count) { > // NOTE: normally, "1.0F < ratio" > // > // if the user chooses a ratio < 1, we allow it and don't "bottom out" at > // the original limit until *after* we've also added the count. > int adjustedLimit = (int) (limit * ratio) + count; > return Math.max(limit, adjustedLimit); > } > {code} > However... > When {{json.facet}} multi-shard refinement was added, the code was written > slightly diff: > * there is an explicit {{overrequest:N}} (count) option > * if {{-1 == overrequest}} (which is the default) then an "effective shard > limit" is computed using the same basic formula as in FacetComponet -- _*but > the constants are different*_... > ** {{effectiveLimit = (long) (effectiveLimit * 1.1 + 4);}} > * For any (non "-1") user specified {{overrequest}} value, it's added > verbatim to the {{limit}} (which may have been user specified, or may just be > the default) > ** {{effectiveLimit += freq.overrequest;}} > Given the design of the {{json.facet}} syntax, I can understand why the code > path for an "advanced" user specified {{overrequest:N}} option avoids using > any (implicit) ratio calculation and just does the straight addition of > {{limit += overrequest}}. > What I'm not clear on is the choice of the constants {{1.1}} and {{4}} in the > common (default) case, and why those differ from the historically used > {{1.5}} and {{10}}. > ---- > It may seem like a small thing to worry about, but it can/will cause odd > inconsistencies when people try to migrate simple {{facet.field=foo}} (or > {{facet.pivot=foo,bar}}) queries to {{json.facet}} -- I have also seen it > give people attempting these types of migrations the (mistaken) impression > that discrepancies they are seeing are because {{refine:true}} is not be > working. > For this reason, I propose we change the (default) {{overrequest:-1}} > behavior to use the same constants as the equivilent FacetComponent code... > {code} > if (fcontext.isShard()) { > if (freq.overrequest == -1) { > // add over-request if this is a shard request and if we have a small > offset (large offsets will already be gathering many more buckets than needed) > if (freq.offset < 10) { > effectiveLimit = (long) (effectiveLimit * 1.5 + 10); > } > ... > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org