[jira] [Commented] (SOLR-6349) LocalParams for enabling/disabling individual stats
[ https://issues.apache.org/jira/browse/SOLR-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355303#comment-14355303 ] ASF subversion and git services commented on SOLR-6349: --- Commit 1665635 from hoss...@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1665635 ] SOLR-6349 + SOLR-6682: test workaround since (deprecated) stats.facet doesn't garuntee order of list LocalParams for enabling/disabling individual stats --- Key: SOLR-6349 URL: https://issues.apache.org/jira/browse/SOLR-6349 Project: Solr Issue Type: Sub-task Reporter: Hoss Man Attachments: SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349___bad_idea_broken.patch, make-data-and-queries.pl, make-data-and-queries.pl, make-data-and-queries.pl Stats component currently computes all stats (except for one) every time because they are relatively cheap, and in some cases dependent on eachother for distrib computation -- but if we start layering stats on other things it becomes unnecessarily expensive to compute all the stats when they just want the sum (and it will definitely become excessively verbose in the responses). The plan here is to use local params to make this configurable. All of the existing stat options could be modeled as a simple boolean param, but future params (like percentiles) might take in a more complex param value... Example: {noformat} stats.field={!min=true max=true percentiles='99,99.999'}price stats.field={!mean=true}weight {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6349) LocalParams for enabling/disabling individual stats
[ https://issues.apache.org/jira/browse/SOLR-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355316#comment-14355316 ] ASF subversion and git services commented on SOLR-6349: --- Commit 1665639 from hoss...@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1665639 ] SOLR-6349 + SOLR-6682: Added support for stats.field localparams to enable/disable individual stats; Fix response when using EnumField with StatsComponent (merge r1665579, r1665635) LocalParams for enabling/disabling individual stats --- Key: SOLR-6349 URL: https://issues.apache.org/jira/browse/SOLR-6349 Project: Solr Issue Type: Sub-task Reporter: Hoss Man Attachments: SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349___bad_idea_broken.patch, make-data-and-queries.pl, make-data-and-queries.pl, make-data-and-queries.pl Stats component currently computes all stats (except for one) every time because they are relatively cheap, and in some cases dependent on eachother for distrib computation -- but if we start layering stats on other things it becomes unnecessarily expensive to compute all the stats when they just want the sum (and it will definitely become excessively verbose in the responses). The plan here is to use local params to make this configurable. All of the existing stat options could be modeled as a simple boolean param, but future params (like percentiles) might take in a more complex param value... Example: {noformat} stats.field={!min=true max=true percentiles='99,99.999'}price stats.field={!mean=true}weight {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6349) LocalParams for enabling/disabling individual stats
[ https://issues.apache.org/jira/browse/SOLR-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355077#comment-14355077 ] ASF subversion and git services commented on SOLR-6349: --- Commit 1665579 from hoss...@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1665579 ] SOLR-6349 + SOLR-6682: Added support for stats.field localparams to enable/disable individual stats; Fix response when using EnumField with StatsComponent LocalParams for enabling/disabling individual stats --- Key: SOLR-6349 URL: https://issues.apache.org/jira/browse/SOLR-6349 Project: Solr Issue Type: Sub-task Reporter: Hoss Man Attachments: SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349___bad_idea_broken.patch, make-data-and-queries.pl, make-data-and-queries.pl, make-data-and-queries.pl Stats component currently computes all stats (except for one) every time because they are relatively cheap, and in some cases dependent on eachother for distrib computation -- but if we start layering stats on other things it becomes unnecessarily expensive to compute all the stats when they just want the sum (and it will definitely become excessively verbose in the responses). The plan here is to use local params to make this configurable. All of the existing stat options could be modeled as a simple boolean param, but future params (like percentiles) might take in a more complex param value... Example: {noformat} stats.field={!min=true max=true percentiles='99,99.999'}price stats.field={!mean=true}weight {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6349) LocalParams for enabling/disabling individual stats
[ https://issues.apache.org/jira/browse/SOLR-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355845#comment-14355845 ] ASF subversion and git services commented on SOLR-6349: --- Commit 1665730 from hoss...@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1665730 ] SOLR-6349: real fix for out of order stats facet's LocalParams for enabling/disabling individual stats --- Key: SOLR-6349 URL: https://issues.apache.org/jira/browse/SOLR-6349 Project: Solr Issue Type: Sub-task Reporter: Hoss Man Assignee: Hoss Man Fix For: Trunk, 5.1 Attachments: SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349___bad_idea_broken.patch, make-data-and-queries.pl, make-data-and-queries.pl, make-data-and-queries.pl Stats component currently computes all stats (except for one) every time because they are relatively cheap, and in some cases dependent on eachother for distrib computation -- but if we start layering stats on other things it becomes unnecessarily expensive to compute all the stats when they just want the sum (and it will definitely become excessively verbose in the responses). The plan here is to use local params to make this configurable. All of the existing stat options could be modeled as a simple boolean param, but future params (like percentiles) might take in a more complex param value... Example: {noformat} stats.field={!min=true max=true percentiles='99,99.999'}price stats.field={!mean=true}weight {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6349) LocalParams for enabling/disabling individual stats
[ https://issues.apache.org/jira/browse/SOLR-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355856#comment-14355856 ] ASF subversion and git services commented on SOLR-6349: --- Commit 1665733 from hoss...@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1665733 ] SOLR-6349: real fix for out of order stats facet's (merge r1665730) LocalParams for enabling/disabling individual stats --- Key: SOLR-6349 URL: https://issues.apache.org/jira/browse/SOLR-6349 Project: Solr Issue Type: Sub-task Reporter: Hoss Man Assignee: Hoss Man Fix For: Trunk, 5.1 Attachments: SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349___bad_idea_broken.patch, make-data-and-queries.pl, make-data-and-queries.pl, make-data-and-queries.pl Stats component currently computes all stats (except for one) every time because they are relatively cheap, and in some cases dependent on eachother for distrib computation -- but if we start layering stats on other things it becomes unnecessarily expensive to compute all the stats when they just want the sum (and it will definitely become excessively verbose in the responses). The plan here is to use local params to make this configurable. All of the existing stat options could be modeled as a simple boolean param, but future params (like percentiles) might take in a more complex param value... Example: {noformat} stats.field={!min=true max=true percentiles='99,99.999'}price stats.field={!mean=true}weight {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6349) LocalParams for enabling/disabling individual stats
[ https://issues.apache.org/jira/browse/SOLR-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273607#comment-14273607 ] Oliver Mannion commented on SOLR-6349: -- Can this patch allow output of countDistinct but not distinctValues? We have this requirement, as the distinctValues field is not needed and its inclusion increases the response size dramatically. LocalParams for enabling/disabling individual stats --- Key: SOLR-6349 URL: https://issues.apache.org/jira/browse/SOLR-6349 Project: Solr Issue Type: Sub-task Reporter: Hoss Man Attachments: SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349___bad_idea_broken.patch Stats component currently computes all stats (except for one) every time because they are relatively cheap, and in some cases dependent on eachother for distrib computation -- but if we start layering stats on other things it becomes unnecessarily expensive to compute all the stats when they just want the sum (and it will definitely become excessively verbose in the responses). The plan here is to use local params to make this configurable. All of the existing stat options could be modeled as a simple boolean param, but future params (like percentiles) might take in a more complex param value... Example: {noformat} stats.field={!min=true max=true percentiles='99,99.999'}price stats.field={!mean=true}weight {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6349) LocalParams for enabling/disabling individual stats
[ https://issues.apache.org/jira/browse/SOLR-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273832#comment-14273832 ] Hoss Man commented on SOLR-6349: bq. Can this patch allow output of countDistinct but not distinctValues? i don't think we should tackle that as part of this issue - it's already fairly complicated w/o introducing new permutations of options. i think the best approach would be to leave calcDistinct alone as it is now but deprecate/discourage it andmove towards adding an entirely new stats option for computing an aproximated count using hyperloglog (i opened a new issue for this: SOLR-6968) LocalParams for enabling/disabling individual stats --- Key: SOLR-6349 URL: https://issues.apache.org/jira/browse/SOLR-6349 Project: Solr Issue Type: Sub-task Reporter: Hoss Man Attachments: SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349___bad_idea_broken.patch Stats component currently computes all stats (except for one) every time because they are relatively cheap, and in some cases dependent on eachother for distrib computation -- but if we start layering stats on other things it becomes unnecessarily expensive to compute all the stats when they just want the sum (and it will definitely become excessively verbose in the responses). The plan here is to use local params to make this configurable. All of the existing stat options could be modeled as a simple boolean param, but future params (like percentiles) might take in a more complex param value... Example: {noformat} stats.field={!min=true max=true percentiles='99,99.999'}price stats.field={!mean=true}weight {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6349) LocalParams for enabling/disabling individual stats
[ https://issues.apache.org/jira/browse/SOLR-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180712#comment-14180712 ] Hoss Man commented on SOLR-6349: Xu: I havne't had a chance to review your patch, but it sounds like awesome progress -- in respose to your specific question... bq. do we consider calcDistinct is a local parameter like min/max, or it should be the top-level parameter. (Which keeps the existing behavior for things like stats.field=foostats.calcDistinct=true) ...the key is that we need to support both. as i mentioned previously... bq. deprecate stats.calcdistinct but use it as a default for the new corresponding localparam(s) and note the psuedo code i postd in my last comment... {code} if (statsInResponse.isEmpty()) { statsInResponse.addAll(LEGACY_DEFAULT_STATS); // static final EnumSet statsToCalculate.addAll(LEGACY_DEFAULT_STATS_DEPENDS); // static {} built EnumSet looping over LEGACY_DEFAULT_STATS } if (params.getFieldBool(f, STATS_CALCDISTINCT, false)) { // top level req params statsInResponse.add(Stat.calcDistinct); statsToCalculate.add(Stat.calcDistinct); } {code} to give some concrete examples, if we assume that foo is a numeric field, (where {{stats.field=foo}} currently returns min/max/missing/sum/count/mean/sumOfSquares/stddev) then these are the results you should get in various situations... {noformat} 1) stats.field=foo or stats.field=foostats.calcDistinct=false or stats.field=foof.foo.stats.calcDistinct=false or stats.field={!min=true max=true missing=true sum=true count=true mean=true sumOfSquares=true stddev=true}foo or stats.field={!min=true max=true missing=true sum=true count=true mean=true sumOfSquares=true stddev=true}foof.foo.stats.calcDistinct=false or stats.field={!min=true max=true missing=true sum=true count=true mean=true sumOfSquares=true stddev=true calcDistinct=false}foof.foo.stats.calcDistinct=true = min + max + missing + sum + count + mean + sumOfSquares + stddev 2) stats.field=foostats.calcDistinct=true or stats.field=foof.foo.stats.calcDistinct=true or stats.field={!min=true max=true missing=true sum=true count=true mean=true sumOfSquares=true stddev=true calcDistinct=true}foo or stats.field={!min=true max=true missing=true sum=true count=true mean=true sumOfSquares=true stddev=true calcDistinct=true}foostats.calcDistinct=false or stats.field={!min=true max=true missing=true sum=true count=true mean=true sumOfSquares=true stddev=true calcDistinct=true}foof.foo.stats.calcDistinct=false = min + max + missing + sum + count + mean + sumOfSquares + stddev + calcDistinct 3) stats.field={!calcDistinct=true}foo or stats.field={!calcDistinct=true}foostats.calcDistinct=false or stats.field={!calcDistinct=true}foof.foo.stats.calcDistinct=false = calcDistinct 3) stats.field={!min=true}foostats.calcDistinct=true or stats.field={!min=true calcDistinct=true}foostats.calcDistinct=false or stats.field={!min=true calcDistinct=true}foof.foo.stats.calcDistinct=false = min + calcDistinct {noformat} does that make sense? LocalParams for enabling/disabling individual stats --- Key: SOLR-6349 URL: https://issues.apache.org/jira/browse/SOLR-6349 Project: Solr Issue Type: Sub-task Reporter: Hoss Man Attachments: SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349___bad_idea_broken.patch Stats component currently computes all stats (except for one) every time because they are relatively cheap, and in some cases dependent on eachother for distrib computation -- but if we start layering stats on other things it becomes unnecessarily expensive to compute all the stats when they just want the sum (and it will definitely become excessively verbose in the responses). The plan here is to use local params to make this configurable. All of the existing stat options could be modeled as a simple boolean param, but future params (like percentiles) might take in a more complex param value... Example: {noformat} stats.field={!min=true max=true percentiles='99,99.999'}price stats.field={!mean=true}weight {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6349) LocalParams for enabling/disabling individual stats
[ https://issues.apache.org/jira/browse/SOLR-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180718#comment-14180718 ] Xu Zhang commented on SOLR-6349: Thanks so much, totally agree. I will try to update Tomas's change and tests this evening. LocalParams for enabling/disabling individual stats --- Key: SOLR-6349 URL: https://issues.apache.org/jira/browse/SOLR-6349 Project: Solr Issue Type: Sub-task Reporter: Hoss Man Attachments: SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349___bad_idea_broken.patch Stats component currently computes all stats (except for one) every time because they are relatively cheap, and in some cases dependent on eachother for distrib computation -- but if we start layering stats on other things it becomes unnecessarily expensive to compute all the stats when they just want the sum (and it will definitely become excessively verbose in the responses). The plan here is to use local params to make this configurable. All of the existing stat options could be modeled as a simple boolean param, but future params (like percentiles) might take in a more complex param value... Example: {noformat} stats.field={!min=true max=true percentiles='99,99.999'}price stats.field={!mean=true}weight {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6349) LocalParams for enabling/disabling individual stats
[ https://issues.apache.org/jira/browse/SOLR-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14177373#comment-14177373 ] Tomás Fernández Löbbe commented on SOLR-6349: - I'll iterate this patch to improve what you are mentioning. I'll probably won't have time this week to work on this, so I'll take it next week, or if someone else wants to continue, please feel free. LocalParams for enabling/disabling individual stats --- Key: SOLR-6349 URL: https://issues.apache.org/jira/browse/SOLR-6349 Project: Solr Issue Type: Sub-task Reporter: Hoss Man Attachments: SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch, SOLR-6349-xu.patch, SOLR-6349___bad_idea_broken.patch Stats component currently computes all stats (except for one) every time because they are relatively cheap, and in some cases dependent on eachother for distrib computation -- but if we start layering stats on other things it becomes unnecessarily expensive to compute all the stats when they just want the sum (and it will definitely become excessively verbose in the responses). The plan here is to use local params to make this configurable. All of the existing stat options could be modeled as a simple boolean param, but future params (like percentiles) might take in a more complex param value... Example: {noformat} stats.field={!min=true max=true percentiles='99,99.999'}price stats.field={!mean=true}weight {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6349) LocalParams for enabling/disabling individual stats
[ https://issues.apache.org/jira/browse/SOLR-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173931#comment-14173931 ] Tomás Fernández Löbbe commented on SOLR-6349: - Thanks Xu bq. I tried to improve it a little bit by using bit wise operation, but no big difference. Yes, my understanding is that there should not be much of a difference in this case. I think EnumSet is a bit more clear [~hossman_luc...@fucit.org] Any thoughts on this path? I'll change the other stat types soon LocalParams for enabling/disabling individual stats --- Key: SOLR-6349 URL: https://issues.apache.org/jira/browse/SOLR-6349 Project: Solr Issue Type: Sub-task Reporter: Hoss Man Attachments: SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch, SOLR-6349-xu.patch, SOLR-6349___bad_idea_broken.patch Stats component currently computes all stats (except for one) every time because they are relatively cheap, and in some cases dependent on eachother for distrib computation -- but if we start layering stats on other things it becomes unnecessarily expensive to compute all the stats when they just want the sum (and it will definitely become excessively verbose in the responses). The plan here is to use local params to make this configurable. All of the existing stat options could be modeled as a simple boolean param, but future params (like percentiles) might take in a more complex param value... Example: {noformat} stats.field={!min=true max=true percentiles='99,99.999'}price stats.field={!mean=true}weight {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6349) LocalParams for enabling/disabling individual stats
[ https://issues.apache.org/jira/browse/SOLR-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174476#comment-14174476 ] Hoss Man commented on SOLR-6349: Tomas: thanks a lot for working on this. I have mixed feelings about your approach... * -1 to the way StatsField nows has to directly know about every possible stat for every possible data type, the stat dependency logic is spread between the StatsValue classes and the new Enum in StatsField. * -0 to making it harder to add new types of stats in the future (one of the nice side effects of my hypothetical idea was that if we did go an SPI route or something like that it would have made it trivial for people to add new types of stats as plugins, even depending on other stats for distributed logic) * -1 for the need to have that ...{code} if (statsField.calculateStat(X)) { X = calculateX() } {code}...pattern you mentioned in so much code -- that's one of the reasons i abandomed my last patch (and before i abandoned it, i was focusingon trying to ensure that it was at least always a comarison with a final boolean in the hops that the JVM could optimize the if away) * +99 to the fact that you have something that actually works as opposed to my half-thought out idea that i never tried to implement. On balance, i think we should move forward with your idea -- if nothing else, then as a straw man to help us flesh out the exact expected behavior write more tests. If down the road we come up with a better internal implementation, then so be it. A few specific comments on what you've got so far... * kind of weird the way your patch's API uses a variety of Stat[], EnumSetStat and SetStat ... probably best just to use EnumSet everywhere? * be careful about your {{statsInResponse.isEmpty() || statsInResponse.contains(stat)}} logic ... we need to make sure we don't break existing behavior for things like {{stats.field=foostats.calcDistinct=true}} ** speaking of calcDistinct, it needs to be accounted for in your new enum so we can start supporting it as a localparam and deprecate the top level {{stats.calcDistinct}}, maybe along the lines of...{code} if (statsInResponse.isEmpty()) { statsInResponse.addAll(LEGACY_DEFAULT_STATS); // static final EnumSet statsToCalculate.addAll(LEGACY_DEFAULT_STATS_DEPENDS); // static {} built EnumSet looping over LEGACY_DEFAULT_STATS } if (params.getFieldBool(f, STATS_CALCDISTINCT, false)) { // top level req params statsInResponse.add(Stat.calcDistinct); statsToCalculate.add(Stat.calcDistinct); } {code} * i'm not positive - but i don't think your patch currently accounts for the idea that in a distributed request, we may need to calculate return a stats dependencies, but not compute return the stat itself (ie: for mean, we need each shard to compute a sum count, but we don't wnat each shard to compute or return the per-shard mean) ** we should be abl to easily test this behavior by faking an isShard request to a single node and asserting which keys we do/don't get back * rather then a static dependsOn(Stat) method with a case statement, why not make it a property of of the enum objects themselves? ** maybe something like this, which also shows one idea for dealing with the stat doesn't depend on itself in distributed calculations...{code} static enum Stat { min(true), max(true), missing(true), sum(true), count(true), mean(false, sum, count), sumOfSquares(true), stddev(false,sum,count,sumOfSquares); private final ListStat distribDeps; Stat(boolean selfDep, Stat... deps) { distribDeps = new ArrayListStat(deps.length+1); distribDeps.addAll(Arrays.asList(deps)); if (selfDep) { distribDeps.add(this); } } public EnumSetStat getDistribDeps() { return EnumSet.copyOf(this.distribDeps); } } {code} * please help me fight against the trend of distributed tests that only do comparisons against single node w/o asserting specific results, ie...{noformat} +// only ask for min and mean +query(q,*:*, sort,i1+ desc, stats, true, + stats.field, {!min=true mean=true} + i1); + +// only ask for min, mean and stddev +query(q,*:*, sort,i1+ desc, stats, true, + stats.field, {!min=true mean=true stddev=true} + i1); + +String[] stats = new String[]{min, max, sum, sumOfSquares, stddev, mean, missing, count}; + +for (String stat:stats) { + for (String innerStat:stats) { +query(q,*:*, sort,i1+ desc, stats, true, +stats.field, {! + stat + =true + innerStat + =true} + i1); + } +} + {noformat}...all of those can should include xpaths asserting that the count() of keys in the stats is only N, and that the expected keys exist (and in the case of the first 2: we should be able to assert the expected value) bq. The failing tests were
[jira] [Commented] (SOLR-6349) LocalParams for enabling/disabling individual stats
[ https://issues.apache.org/jira/browse/SOLR-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171405#comment-14171405 ] Tomás Fernández Löbbe commented on SOLR-6349: - What about doing something like this: StatsFields knows about the stats to calculate and the stats include in response. For the basic “standalone” statistics, those sets are equal, but when a statistic requires others, the “calculate” set includes the depended stats. All stats in the “calculate” set are calculated. All stats in the “include in response” set are responded. In the case of a shard request, all stats in the “calculate” set are also included in the response. StatsField gets the local params from the request, the StatsValues ask the statsField instance before calculating a stat / including in the response. The bad part of it is that all the stats need the {code:java} if (statsField.calculateStat(X)) { X = calculateX() } {code} The good part is that we don’t need to create classes for every single stat, that may have to be generic in some way depending on the field type. I’m uploading a patch with the idea, it’s not even close to complete, just to show the idea (see only the NumericStatsValues implementation). There are tests failing in StatsComponent that I haven’t looked at yet. LocalParams for enabling/disabling individual stats --- Key: SOLR-6349 URL: https://issues.apache.org/jira/browse/SOLR-6349 Project: Solr Issue Type: Sub-task Reporter: Hoss Man Attachments: SOLR-6349-tflobbe.patch, SOLR-6349___bad_idea_broken.patch Stats component currently computes all stats (except for one) every time because they are relatively cheap, and in some cases dependent on eachother for distrib computation -- but if we start layering stats on other things it becomes unnecessarily expensive to compute all the stats when they just want the sum (and it will definitely become excessively verbose in the responses). The plan here is to use local params to make this configurable. All of the existing stat options could be modeled as a simple boolean param, but future params (like percentiles) might take in a more complex param value... Example: {noformat} stats.field={!min=true max=true percentiles='99,99.999'}price stats.field={!mean=true}weight {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6349) LocalParams for enabling/disabling individual stats
[ https://issues.apache.org/jira/browse/SOLR-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14165856#comment-14165856 ] Hoss Man commented on SOLR-6349: Side note... one good bit of discovery that came out of my failed patch was a realization of why the StatsComponent currently doesn't write out any stats values in some cases -- as discussed in this comment in SOLR-6351... https://issues.apache.org/jira/browse/SOLR-6351?focusedCommentId=14160658page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14160658 {quote} ... my thinking, was that: * the behavior when hanging stats off pivots should mirror that of regular stats * if you ask for a stats block , you should always get that block, so the client doesn't have to conditionally check if it's there before looking at the values. * the included stat values matter even if no doc has the stats.field, becuase one of the stats is in fact missing and that if you ask for stats, you should be able to look at that missing count. (and it should match up with your doc set size if the field is completely missing, etc...) but looking at an example of this now, i see that for simple field stats (w/o pivots), that's not even how it currently works – consider this URL using hte example data... http://localhost:8983/solr/select?rows=0q=name:utfstats.field=foo_dstats.field=popularitystats=true ... {quote} I found that the source of this behavior comes from this code in StatsComponent... {code} if (isShard == true || (Long) stv.get(count) 0) { stats_fields.add(statsField.getOutputKey(), stv); } else { stats_fields.add(statsField.getOutputKey(), null); } {code} ...obviously if we're going to start making stats optional, we can't make the response data conditional on wether count is greater then 0, because there might not be a count. So we'll need to revamp this, and i'm more and more convinced my comment in SOLR-6351 is the right way to go... {quote} I thought (and still think) that the correct behavior for this query would be to get a stats block back for those fields where things like min/max/mean are null, count==0, and missing=1 ... but that's not how it currently works. {quote} LocalParams for enabling/disabling individual stats --- Key: SOLR-6349 URL: https://issues.apache.org/jira/browse/SOLR-6349 Project: Solr Issue Type: Sub-task Reporter: Hoss Man Attachments: SOLR-6349___bad_idea_broken.patch Stats component currently computes all stats (except for one) every time because they are relatively cheap, and in some cases dependent on eachother for distrib computation -- but if we start layering stats on other things it becomes unnecessarily expensive to compute all the stats when they just want the sum (and it will definitely become excessively verbose in the responses). The plan here is to use local params to make this configurable. All of the existing stat options could be modeled as a simple boolean param, but future params (like percentiles) might take in a more complex param value... Example: {noformat} stats.field={!min=true max=true percentiles='99,99.999'}price stats.field={!mean=true}weight {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6349) LocalParams for enabling/disabling individual stats
[ https://issues.apache.org/jira/browse/SOLR-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091534#comment-14091534 ] Hoss Man commented on SOLR-6349: Proposed implementation... * Change StatsValuesFactory.createStatsValues (and the constructors for the various StatsValues impls) to take in the local params from the {{stats.field}} * each StatsValue impl should validate the stat params it's asked to compute * all stats should default to disabled, but we need a special backcompat case that if *no* stats are specified in local param, all current default stats are computed ** we can't be lazy and just check 0==localparams.size() - need to check the actuals stats params because of local params like ex and key * for the distributed logic where things get a bit more complex (ie: distrib mean needs sum+count from all shards; distrib stddev needs sum+count+sumOfSquares from each shard) we can go two possible routes: ** A) StatsValue needs a new method to be *asked* by StatsComponent what local params it needs when sending shard requests *** in this case the localparams of the shard requests have diff localparams and the processing of those shard stats requests can be ignorant of the fact that they are distributed ** B) StatsValue (via the factory method) needs to be informed when it's computing stats for an isShard request, so it can internally decide what per-shard values to return based on the input *** in this case, the localparams are not modified per shard, but since isShard=true the StatsValue may return diff metrics then the ones requested so that the coordinator gets what it needs to aggregate. ** I think i'm leaning towards option B - particularly because it simplifies the idea of how to deal with situations like percentiles where the per-shard info isn't really a stat that should have it's own localparma folks migth ask for. * deprecate {{stats.calcdistinct}} but use it as a default for the new corresponding localparam(s) LocalParams for enabling/disabling individual stats --- Key: SOLR-6349 URL: https://issues.apache.org/jira/browse/SOLR-6349 Project: Solr Issue Type: Sub-task Reporter: Hoss Man Stats component currently computes all stats (except for one) every time because they are relatively cheap, and in some cases dependent on eachother for distrib computation -- but if we start layering stats on other things it becomes unnecessarily expensive to compute all the stats when they just want the sum (and it will definitely become excessively verbose in the responses). The plan here is to use local params to make this configurable. All of the existing stat options could be modeled as a simple boolean param, but future params (like percentiles) might take in a more complex param value... Example: {noformat} stats.field={!min=true max=true percentiles='99,99.999'}price stats.field={!mean=true}weight {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org