[jira] [Commented] (SOLR-6349) LocalParams for enabling/disabling individual stats

2015-03-10 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355303#comment-14355303
 ] 

ASF subversion and git services commented on SOLR-6349:
---

Commit 1665635 from hoss...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1665635 ]

SOLR-6349 + SOLR-6682: test workaround since (deprecated) stats.facet doesn't 
garuntee order of list

 LocalParams for enabling/disabling individual stats
 ---

 Key: SOLR-6349
 URL: https://issues.apache.org/jira/browse/SOLR-6349
 Project: Solr
  Issue Type: Sub-task
Reporter: Hoss Man
 Attachments: SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch, 
 SOLR-6349-tflobbe.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, 
 SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349.patch, SOLR-6349.patch, 
 SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, 
 SOLR-6349.patch, SOLR-6349___bad_idea_broken.patch, make-data-and-queries.pl, 
 make-data-and-queries.pl, make-data-and-queries.pl


 Stats component currently computes all stats (except for one) every time 
 because they are relatively cheap, and in some cases dependent on eachother 
 for distrib computation -- but if we start layering stats on other things it 
 becomes unnecessarily expensive to compute all the stats when they just want 
 the sum (and it will definitely become excessively verbose in the 
 responses).  
 The plan here is to use local params to make this configurable.  All of the 
 existing stat options could be modeled as a simple boolean param, but future 
 params (like percentiles) might take in a more complex param value...
 Example:
 {noformat}
 stats.field={!min=true max=true percentiles='99,99.999'}price
 stats.field={!mean=true}weight
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6349) LocalParams for enabling/disabling individual stats

2015-03-10 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355316#comment-14355316
 ] 

ASF subversion and git services commented on SOLR-6349:
---

Commit 1665639 from hoss...@apache.org in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1665639 ]

SOLR-6349 + SOLR-6682: Added support for stats.field localparams to 
enable/disable individual stats; Fix response when using EnumField with 
StatsComponent (merge r1665579, r1665635)

 LocalParams for enabling/disabling individual stats
 ---

 Key: SOLR-6349
 URL: https://issues.apache.org/jira/browse/SOLR-6349
 Project: Solr
  Issue Type: Sub-task
Reporter: Hoss Man
 Attachments: SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch, 
 SOLR-6349-tflobbe.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, 
 SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349.patch, SOLR-6349.patch, 
 SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, 
 SOLR-6349.patch, SOLR-6349___bad_idea_broken.patch, make-data-and-queries.pl, 
 make-data-and-queries.pl, make-data-and-queries.pl


 Stats component currently computes all stats (except for one) every time 
 because they are relatively cheap, and in some cases dependent on eachother 
 for distrib computation -- but if we start layering stats on other things it 
 becomes unnecessarily expensive to compute all the stats when they just want 
 the sum (and it will definitely become excessively verbose in the 
 responses).  
 The plan here is to use local params to make this configurable.  All of the 
 existing stat options could be modeled as a simple boolean param, but future 
 params (like percentiles) might take in a more complex param value...
 Example:
 {noformat}
 stats.field={!min=true max=true percentiles='99,99.999'}price
 stats.field={!mean=true}weight
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6349) LocalParams for enabling/disabling individual stats

2015-03-10 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355077#comment-14355077
 ] 

ASF subversion and git services commented on SOLR-6349:
---

Commit 1665579 from hoss...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1665579 ]

SOLR-6349 + SOLR-6682: Added support for stats.field localparams to 
enable/disable individual stats; Fix response when using EnumField with 
StatsComponent

 LocalParams for enabling/disabling individual stats
 ---

 Key: SOLR-6349
 URL: https://issues.apache.org/jira/browse/SOLR-6349
 Project: Solr
  Issue Type: Sub-task
Reporter: Hoss Man
 Attachments: SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch, 
 SOLR-6349-tflobbe.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, 
 SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349.patch, SOLR-6349.patch, 
 SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, 
 SOLR-6349.patch, SOLR-6349___bad_idea_broken.patch, make-data-and-queries.pl, 
 make-data-and-queries.pl, make-data-and-queries.pl


 Stats component currently computes all stats (except for one) every time 
 because they are relatively cheap, and in some cases dependent on eachother 
 for distrib computation -- but if we start layering stats on other things it 
 becomes unnecessarily expensive to compute all the stats when they just want 
 the sum (and it will definitely become excessively verbose in the 
 responses).  
 The plan here is to use local params to make this configurable.  All of the 
 existing stat options could be modeled as a simple boolean param, but future 
 params (like percentiles) might take in a more complex param value...
 Example:
 {noformat}
 stats.field={!min=true max=true percentiles='99,99.999'}price
 stats.field={!mean=true}weight
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6349) LocalParams for enabling/disabling individual stats

2015-03-10 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355845#comment-14355845
 ] 

ASF subversion and git services commented on SOLR-6349:
---

Commit 1665730 from hoss...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1665730 ]

SOLR-6349: real fix for out of order stats facet's

 LocalParams for enabling/disabling individual stats
 ---

 Key: SOLR-6349
 URL: https://issues.apache.org/jira/browse/SOLR-6349
 Project: Solr
  Issue Type: Sub-task
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: Trunk, 5.1

 Attachments: SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch, 
 SOLR-6349-tflobbe.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, 
 SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349.patch, SOLR-6349.patch, 
 SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, 
 SOLR-6349.patch, SOLR-6349___bad_idea_broken.patch, make-data-and-queries.pl, 
 make-data-and-queries.pl, make-data-and-queries.pl


 Stats component currently computes all stats (except for one) every time 
 because they are relatively cheap, and in some cases dependent on eachother 
 for distrib computation -- but if we start layering stats on other things it 
 becomes unnecessarily expensive to compute all the stats when they just want 
 the sum (and it will definitely become excessively verbose in the 
 responses).  
 The plan here is to use local params to make this configurable.  All of the 
 existing stat options could be modeled as a simple boolean param, but future 
 params (like percentiles) might take in a more complex param value...
 Example:
 {noformat}
 stats.field={!min=true max=true percentiles='99,99.999'}price
 stats.field={!mean=true}weight
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6349) LocalParams for enabling/disabling individual stats

2015-03-10 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355856#comment-14355856
 ] 

ASF subversion and git services commented on SOLR-6349:
---

Commit 1665733 from hoss...@apache.org in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1665733 ]

SOLR-6349: real fix for out of order stats facet's (merge r1665730)

 LocalParams for enabling/disabling individual stats
 ---

 Key: SOLR-6349
 URL: https://issues.apache.org/jira/browse/SOLR-6349
 Project: Solr
  Issue Type: Sub-task
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: Trunk, 5.1

 Attachments: SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch, 
 SOLR-6349-tflobbe.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, 
 SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349.patch, SOLR-6349.patch, 
 SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, 
 SOLR-6349.patch, SOLR-6349___bad_idea_broken.patch, make-data-and-queries.pl, 
 make-data-and-queries.pl, make-data-and-queries.pl


 Stats component currently computes all stats (except for one) every time 
 because they are relatively cheap, and in some cases dependent on eachother 
 for distrib computation -- but if we start layering stats on other things it 
 becomes unnecessarily expensive to compute all the stats when they just want 
 the sum (and it will definitely become excessively verbose in the 
 responses).  
 The plan here is to use local params to make this configurable.  All of the 
 existing stat options could be modeled as a simple boolean param, but future 
 params (like percentiles) might take in a more complex param value...
 Example:
 {noformat}
 stats.field={!min=true max=true percentiles='99,99.999'}price
 stats.field={!mean=true}weight
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6349) LocalParams for enabling/disabling individual stats

2015-01-12 Thread Oliver Mannion (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273607#comment-14273607
 ] 

Oliver Mannion commented on SOLR-6349:
--

Can this patch allow output of countDistinct but not distinctValues? We have 
this requirement, as the distinctValues field is not needed and its inclusion 
increases the response size dramatically. 

 LocalParams for enabling/disabling individual stats
 ---

 Key: SOLR-6349
 URL: https://issues.apache.org/jira/browse/SOLR-6349
 Project: Solr
  Issue Type: Sub-task
Reporter: Hoss Man
 Attachments: SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch, 
 SOLR-6349-tflobbe.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, 
 SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349___bad_idea_broken.patch


 Stats component currently computes all stats (except for one) every time 
 because they are relatively cheap, and in some cases dependent on eachother 
 for distrib computation -- but if we start layering stats on other things it 
 becomes unnecessarily expensive to compute all the stats when they just want 
 the sum (and it will definitely become excessively verbose in the 
 responses).  
 The plan here is to use local params to make this configurable.  All of the 
 existing stat options could be modeled as a simple boolean param, but future 
 params (like percentiles) might take in a more complex param value...
 Example:
 {noformat}
 stats.field={!min=true max=true percentiles='99,99.999'}price
 stats.field={!mean=true}weight
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6349) LocalParams for enabling/disabling individual stats

2015-01-12 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273832#comment-14273832
 ] 

Hoss Man commented on SOLR-6349:


bq. Can this patch allow output of countDistinct but not distinctValues?

i don't think we should tackle that as part of this issue - it's already fairly 
complicated w/o introducing new permutations of options.

i think the best approach would be to leave calcDistinct alone as it is now 
but deprecate/discourage it andmove towards adding an entirely new stats option 
for computing an aproximated count using hyperloglog (i opened a new issue for 
this: SOLR-6968)

 LocalParams for enabling/disabling individual stats
 ---

 Key: SOLR-6349
 URL: https://issues.apache.org/jira/browse/SOLR-6349
 Project: Solr
  Issue Type: Sub-task
Reporter: Hoss Man
 Attachments: SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch, 
 SOLR-6349-tflobbe.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, 
 SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349___bad_idea_broken.patch


 Stats component currently computes all stats (except for one) every time 
 because they are relatively cheap, and in some cases dependent on eachother 
 for distrib computation -- but if we start layering stats on other things it 
 becomes unnecessarily expensive to compute all the stats when they just want 
 the sum (and it will definitely become excessively verbose in the 
 responses).  
 The plan here is to use local params to make this configurable.  All of the 
 existing stat options could be modeled as a simple boolean param, but future 
 params (like percentiles) might take in a more complex param value...
 Example:
 {noformat}
 stats.field={!min=true max=true percentiles='99,99.999'}price
 stats.field={!mean=true}weight
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6349) LocalParams for enabling/disabling individual stats

2014-10-22 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180712#comment-14180712
 ] 

Hoss Man commented on SOLR-6349:


Xu:

I havne't had a chance to review your patch, but it sounds like awesome 
progress -- in respose to your specific question...

bq. do we consider calcDistinct is a local parameter like min/max, or it should 
be the top-level parameter. (Which keeps the existing behavior for things like 
stats.field=foostats.calcDistinct=true)

...the key is that we need to support both.  as i mentioned previously...

bq. deprecate stats.calcdistinct but use it as a default for the new 
corresponding localparam(s)

and note the psuedo code i postd in my last comment...

{code}
if (statsInResponse.isEmpty()) {
  statsInResponse.addAll(LEGACY_DEFAULT_STATS); // static final EnumSet
  statsToCalculate.addAll(LEGACY_DEFAULT_STATS_DEPENDS); // static {} built 
EnumSet looping over LEGACY_DEFAULT_STATS 
}
if (params.getFieldBool(f, STATS_CALCDISTINCT, false)) { // top level req params
  statsInResponse.add(Stat.calcDistinct);
  statsToCalculate.add(Stat.calcDistinct);
}
{code}

to give some concrete examples, if we assume that foo is a numeric field, 
(where {{stats.field=foo}} currently returns 
min/max/missing/sum/count/mean/sumOfSquares/stddev) then these are the results 
you should get in various situations...


{noformat}
1) stats.field=foo
or stats.field=foostats.calcDistinct=false
or stats.field=foof.foo.stats.calcDistinct=false
or stats.field={!min=true max=true missing=true sum=true count=true mean=true 
sumOfSquares=true stddev=true}foo
or stats.field={!min=true max=true missing=true sum=true count=true mean=true 
sumOfSquares=true stddev=true}foof.foo.stats.calcDistinct=false
or stats.field={!min=true max=true missing=true sum=true count=true mean=true 
sumOfSquares=true stddev=true 
calcDistinct=false}foof.foo.stats.calcDistinct=true

= min + max + missing + sum + count + mean + sumOfSquares + stddev



2) stats.field=foostats.calcDistinct=true
or stats.field=foof.foo.stats.calcDistinct=true
or stats.field={!min=true max=true missing=true sum=true count=true mean=true 
sumOfSquares=true stddev=true calcDistinct=true}foo
or stats.field={!min=true max=true missing=true sum=true count=true mean=true 
sumOfSquares=true stddev=true calcDistinct=true}foostats.calcDistinct=false
or stats.field={!min=true max=true missing=true sum=true count=true mean=true 
sumOfSquares=true stddev=true 
calcDistinct=true}foof.foo.stats.calcDistinct=false

= min + max + missing + sum + count + mean + sumOfSquares + stddev + 
calcDistinct



3) stats.field={!calcDistinct=true}foo
or stats.field={!calcDistinct=true}foostats.calcDistinct=false
or stats.field={!calcDistinct=true}foof.foo.stats.calcDistinct=false

= calcDistinct



3) stats.field={!min=true}foostats.calcDistinct=true
or stats.field={!min=true calcDistinct=true}foostats.calcDistinct=false
or stats.field={!min=true calcDistinct=true}foof.foo.stats.calcDistinct=false

= min + calcDistinct

{noformat}

does that make sense?


 LocalParams for enabling/disabling individual stats
 ---

 Key: SOLR-6349
 URL: https://issues.apache.org/jira/browse/SOLR-6349
 Project: Solr
  Issue Type: Sub-task
Reporter: Hoss Man
 Attachments: SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch, 
 SOLR-6349-tflobbe.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, 
 SOLR-6349___bad_idea_broken.patch


 Stats component currently computes all stats (except for one) every time 
 because they are relatively cheap, and in some cases dependent on eachother 
 for distrib computation -- but if we start layering stats on other things it 
 becomes unnecessarily expensive to compute all the stats when they just want 
 the sum (and it will definitely become excessively verbose in the 
 responses).  
 The plan here is to use local params to make this configurable.  All of the 
 existing stat options could be modeled as a simple boolean param, but future 
 params (like percentiles) might take in a more complex param value...
 Example:
 {noformat}
 stats.field={!min=true max=true percentiles='99,99.999'}price
 stats.field={!mean=true}weight
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6349) LocalParams for enabling/disabling individual stats

2014-10-22 Thread Xu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180718#comment-14180718
 ] 

Xu Zhang commented on SOLR-6349:


Thanks so much, totally agree.

 I will try to update Tomas's change and tests this evening. 

 LocalParams for enabling/disabling individual stats
 ---

 Key: SOLR-6349
 URL: https://issues.apache.org/jira/browse/SOLR-6349
 Project: Solr
  Issue Type: Sub-task
Reporter: Hoss Man
 Attachments: SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch, 
 SOLR-6349-tflobbe.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, 
 SOLR-6349___bad_idea_broken.patch


 Stats component currently computes all stats (except for one) every time 
 because they are relatively cheap, and in some cases dependent on eachother 
 for distrib computation -- but if we start layering stats on other things it 
 becomes unnecessarily expensive to compute all the stats when they just want 
 the sum (and it will definitely become excessively verbose in the 
 responses).  
 The plan here is to use local params to make this configurable.  All of the 
 existing stat options could be modeled as a simple boolean param, but future 
 params (like percentiles) might take in a more complex param value...
 Example:
 {noformat}
 stats.field={!min=true max=true percentiles='99,99.999'}price
 stats.field={!mean=true}weight
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6349) LocalParams for enabling/disabling individual stats

2014-10-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14177373#comment-14177373
 ] 

Tomás Fernández Löbbe commented on SOLR-6349:
-

I'll iterate this patch to improve what you are mentioning. I'll probably won't 
have time this week to work on this, so I'll take it next week, or if someone 
else wants to continue, please feel free.

 LocalParams for enabling/disabling individual stats
 ---

 Key: SOLR-6349
 URL: https://issues.apache.org/jira/browse/SOLR-6349
 Project: Solr
  Issue Type: Sub-task
Reporter: Hoss Man
 Attachments: SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch, 
 SOLR-6349-tflobbe.patch, SOLR-6349-xu.patch, SOLR-6349___bad_idea_broken.patch


 Stats component currently computes all stats (except for one) every time 
 because they are relatively cheap, and in some cases dependent on eachother 
 for distrib computation -- but if we start layering stats on other things it 
 becomes unnecessarily expensive to compute all the stats when they just want 
 the sum (and it will definitely become excessively verbose in the 
 responses).  
 The plan here is to use local params to make this configurable.  All of the 
 existing stat options could be modeled as a simple boolean param, but future 
 params (like percentiles) might take in a more complex param value...
 Example:
 {noformat}
 stats.field={!min=true max=true percentiles='99,99.999'}price
 stats.field={!mean=true}weight
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6349) LocalParams for enabling/disabling individual stats

2014-10-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173931#comment-14173931
 ] 

Tomás Fernández Löbbe commented on SOLR-6349:
-

Thanks Xu
bq. I tried to improve it a little bit by using bit wise operation, but no big 
difference.
Yes, my understanding is that there should not be much of a difference in this 
case. I think EnumSet is a bit more clear

[~hossman_luc...@fucit.org] Any thoughts on this path? I'll change the other 
stat types soon

 LocalParams for enabling/disabling individual stats
 ---

 Key: SOLR-6349
 URL: https://issues.apache.org/jira/browse/SOLR-6349
 Project: Solr
  Issue Type: Sub-task
Reporter: Hoss Man
 Attachments: SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch, 
 SOLR-6349-tflobbe.patch, SOLR-6349-xu.patch, SOLR-6349___bad_idea_broken.patch


 Stats component currently computes all stats (except for one) every time 
 because they are relatively cheap, and in some cases dependent on eachother 
 for distrib computation -- but if we start layering stats on other things it 
 becomes unnecessarily expensive to compute all the stats when they just want 
 the sum (and it will definitely become excessively verbose in the 
 responses).  
 The plan here is to use local params to make this configurable.  All of the 
 existing stat options could be modeled as a simple boolean param, but future 
 params (like percentiles) might take in a more complex param value...
 Example:
 {noformat}
 stats.field={!min=true max=true percentiles='99,99.999'}price
 stats.field={!mean=true}weight
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6349) LocalParams for enabling/disabling individual stats

2014-10-16 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174476#comment-14174476
 ] 

Hoss Man commented on SOLR-6349:



Tomas: thanks a lot for working on this.

I have mixed feelings about your approach...

* -1 to the way StatsField nows has to directly know about every possible stat 
for every possible data type,  the stat dependency logic is spread between the 
StatsValue classes and the new Enum in StatsField.
* -0 to making it harder to add new types of stats in the future (one of the 
nice side effects of my hypothetical idea was that if we did go an SPI route or 
something like that it would have made it trivial for people to add new types 
of stats as plugins, even depending on other stats for distributed logic)
* -1 for the need to have that ...{code}
if (statsField.calculateStat(X)) { 
  X = calculateX() 
}
{code}...pattern you mentioned in so much code -- that's one of the reasons i 
abandomed my last patch (and before i abandoned it, i was focusingon trying to 
ensure that it was at least always a comarison with a final boolean in the hops 
that the JVM could optimize the if away)
* +99 to the fact that you have something that actually works as opposed to my 
half-thought out idea that i never tried to implement.


On balance, i think we should move forward with your idea -- if nothing else, 
then as a straw man to help us flesh out the exact expected behavior  write 
more tests. If down the road we come up with a better internal implementation, 
then so be it.

A few specific comments on what you've got so far...

* kind of weird the way your patch's API uses a variety of Stat[], 
EnumSetStat and SetStat ... probably best just to use EnumSet everywhere?
* be careful about your {{statsInResponse.isEmpty() || 
statsInResponse.contains(stat)}} logic ... we need to make sure we don't break 
existing behavior for things like {{stats.field=foostats.calcDistinct=true}}
** speaking of calcDistinct, it needs to be accounted for in your new enum so 
we can start supporting it as a localparam and deprecate the top level 
{{stats.calcDistinct}}, maybe along the lines of...{code}
if (statsInResponse.isEmpty()) {
  statsInResponse.addAll(LEGACY_DEFAULT_STATS); // static final EnumSet
  statsToCalculate.addAll(LEGACY_DEFAULT_STATS_DEPENDS); // static {} built 
EnumSet looping over LEGACY_DEFAULT_STATS 
}
if (params.getFieldBool(f, STATS_CALCDISTINCT, false)) { // top level req params
  statsInResponse.add(Stat.calcDistinct);
  statsToCalculate.add(Stat.calcDistinct);
}
{code}
* i'm not positive - but i don't think your patch currently accounts for the 
idea that in a distributed request, we may need to calculate  return a stats 
dependencies, but not compute  return the stat itself (ie: for mean, we need 
each shard to compute a sum  count, but we don't wnat each shard to compute or 
return the per-shard mean)
** we should be abl to easily test this behavior by faking an isShard request 
to a single node and asserting which keys we do/don't get back
* rather then a static dependsOn(Stat) method with a case statement, why not 
make it a property of of the enum objects themselves?
** maybe something like this, which also shows one idea for dealing with the 
stat doesn't depend on itself in distributed calculations...{code}
  static enum Stat {
min(true),
max(true),
missing(true),
sum(true),
count(true),
mean(false, sum, count),
sumOfSquares(true),
stddev(false,sum,count,sumOfSquares);

private final ListStat distribDeps;

Stat(boolean selfDep, Stat... deps) {
  distribDeps = new ArrayListStat(deps.length+1);
  distribDeps.addAll(Arrays.asList(deps));
  if (selfDep) { 
distribDeps.add(this);
  }
}
public EnumSetStat getDistribDeps() {
  return EnumSet.copyOf(this.distribDeps);
}
  }
{code}
* please help me fight against the trend of distributed tests that only do 
comparisons against single node w/o asserting specific results, ie...{noformat}
+// only ask for min and mean
+query(q,*:*, sort,i1+ desc, stats, true,
+  stats.field, {!min=true mean=true} + i1);
+
+// only ask for min, mean and stddev
+query(q,*:*, sort,i1+ desc, stats, true,
+  stats.field, {!min=true mean=true stddev=true} + i1);
+
+String[] stats = new String[]{min, max, sum, sumOfSquares, 
stddev, mean, missing, count};
+
+for (String stat:stats) {
+  for (String innerStat:stats) {
+query(q,*:*, sort,i1+ desc, stats, true,
+stats.field, {! + stat + =true  + innerStat + =true} + i1);
+  }
+}
+
{noformat}...all of those can  should include xpaths asserting that the 
count() of keys in the stats is only N, and that the expected keys exist (and 
in the case of the first 2: we should be able to assert the expected value)



bq. The failing tests were 

[jira] [Commented] (SOLR-6349) LocalParams for enabling/disabling individual stats

2014-10-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171405#comment-14171405
 ] 

Tomás Fernández Löbbe commented on SOLR-6349:
-

What about doing something like this: 
StatsFields knows about the stats to calculate and the stats include in 
response.
For the basic “standalone” statistics, those sets are equal, but when a 
statistic requires others, the “calculate” set includes the depended stats.
All stats in the “calculate” set are calculated. 
All stats in the “include in response” set are responded. 
In the case of a shard request, all stats in the “calculate” set are also 
included in the response. 
StatsField gets the local params from the request, the StatsValues ask the 
statsField instance before calculating a stat / including in the response.
The bad part of it is that all the stats need the 
{code:java}
if (statsField.calculateStat(X)) {
  X = calculateX()
}
{code}
The good part is that we don’t need to create classes for every single stat, 
that may have to be generic in some way depending on the field type.
I’m uploading a patch with the idea, it’s not even close to complete, just to 
show the idea (see only the NumericStatsValues implementation).
There are tests failing in StatsComponent that I haven’t looked at yet.

 LocalParams for enabling/disabling individual stats
 ---

 Key: SOLR-6349
 URL: https://issues.apache.org/jira/browse/SOLR-6349
 Project: Solr
  Issue Type: Sub-task
Reporter: Hoss Man
 Attachments: SOLR-6349-tflobbe.patch, 
 SOLR-6349___bad_idea_broken.patch


 Stats component currently computes all stats (except for one) every time 
 because they are relatively cheap, and in some cases dependent on eachother 
 for distrib computation -- but if we start layering stats on other things it 
 becomes unnecessarily expensive to compute all the stats when they just want 
 the sum (and it will definitely become excessively verbose in the 
 responses).  
 The plan here is to use local params to make this configurable.  All of the 
 existing stat options could be modeled as a simple boolean param, but future 
 params (like percentiles) might take in a more complex param value...
 Example:
 {noformat}
 stats.field={!min=true max=true percentiles='99,99.999'}price
 stats.field={!mean=true}weight
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6349) LocalParams for enabling/disabling individual stats

2014-10-09 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14165856#comment-14165856
 ] 

Hoss Man commented on SOLR-6349:



Side note...

one good bit of discovery that came out of my failed patch was a realization of 
why the StatsComponent currently doesn't write out any stats values in some 
cases -- as discussed in this comment in SOLR-6351...

https://issues.apache.org/jira/browse/SOLR-6351?focusedCommentId=14160658page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14160658

{quote}
...
my thinking, was that:

* the behavior when hanging stats off pivots should mirror that of regular stats
* if you ask for a stats block , you should always get that block, so the 
client doesn't have to conditionally check if it's there before looking at the 
values.
* the included stat values matter even if no doc has the stats.field, becuase 
one of the stats is in fact missing and that if you ask for stats, you should 
be able to look at that missing count. (and it should match up with your doc 
set size if the field is completely missing, etc...)

but looking at an example of this now, i see that for simple field stats (w/o 
pivots), that's not even how it currently works – consider this URL using hte 
example data...

http://localhost:8983/solr/select?rows=0q=name:utfstats.field=foo_dstats.field=popularitystats=true
...
{quote}

I found that the source of this behavior comes from this code in 
StatsComponent...


{code}
if (isShard == true || (Long) stv.get(count)  0) {
  stats_fields.add(statsField.getOutputKey(), stv);
} else {
  stats_fields.add(statsField.getOutputKey(), null);
}
{code}

...obviously if we're going to start making stats optional, we can't make the 
response data conditional on wether count is greater then 0, because there 
might not be a count.

So we'll need to revamp this, and i'm more and more convinced my comment in 
SOLR-6351 is the right way to go...

{quote}
I thought (and still think) that the correct behavior for this query would be 
to get a stats block back for those fields where things like min/max/mean are 
null, count==0, and missing=1 ... but that's not how it currently works.
{quote}





 LocalParams for enabling/disabling individual stats
 ---

 Key: SOLR-6349
 URL: https://issues.apache.org/jira/browse/SOLR-6349
 Project: Solr
  Issue Type: Sub-task
Reporter: Hoss Man
 Attachments: SOLR-6349___bad_idea_broken.patch


 Stats component currently computes all stats (except for one) every time 
 because they are relatively cheap, and in some cases dependent on eachother 
 for distrib computation -- but if we start layering stats on other things it 
 becomes unnecessarily expensive to compute all the stats when they just want 
 the sum (and it will definitely become excessively verbose in the 
 responses).  
 The plan here is to use local params to make this configurable.  All of the 
 existing stat options could be modeled as a simple boolean param, but future 
 params (like percentiles) might take in a more complex param value...
 Example:
 {noformat}
 stats.field={!min=true max=true percentiles='99,99.999'}price
 stats.field={!mean=true}weight
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6349) LocalParams for enabling/disabling individual stats

2014-08-08 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091534#comment-14091534
 ] 

Hoss Man commented on SOLR-6349:



Proposed implementation...

* Change StatsValuesFactory.createStatsValues (and the constructors for the 
various StatsValues impls) to take in the local params from the {{stats.field}}
* each StatsValue impl should validate the stat params it's asked to compute 
* all stats should default to disabled, but we need a special backcompat case 
that if *no* stats are specified in local param, all current default stats are 
computed
** we can't be lazy and just check 0==localparams.size() - need to check the 
actuals stats params because of local params like ex and key
* for the distributed logic where things get a bit more complex (ie: distrib 
mean needs sum+count from all shards; distrib stddev needs 
sum+count+sumOfSquares from each shard) we can go two possible routes:
** A) StatsValue needs a new method to be *asked* by StatsComponent what local 
params it needs when sending shard requests
*** in this case the localparams of the shard requests have diff localparams 
and the processing of those shard stats requests can be ignorant of the fact 
that they are distributed
** B) StatsValue (via the factory method) needs to be informed when it's 
computing stats for an isShard request, so it can internally decide what 
per-shard values to return based on the input
*** in this case, the localparams are not modified per shard, but since 
isShard=true the StatsValue may return diff metrics then the ones requested 
so that the coordinator gets what it needs to aggregate.
** I think i'm leaning towards option B - particularly because it simplifies 
the idea of how to deal with situations like percentiles where the per-shard 
info isn't really a stat that should have it's own localparma folks migth ask 
for.
* deprecate {{stats.calcdistinct}} but use it as a default for the new 
corresponding localparam(s)

 LocalParams for enabling/disabling individual stats
 ---

 Key: SOLR-6349
 URL: https://issues.apache.org/jira/browse/SOLR-6349
 Project: Solr
  Issue Type: Sub-task
Reporter: Hoss Man

 Stats component currently computes all stats (except for one) every time 
 because they are relatively cheap, and in some cases dependent on eachother 
 for distrib computation -- but if we start layering stats on other things it 
 becomes unnecessarily expensive to compute all the stats when they just want 
 the sum (and it will definitely become excessively verbose in the 
 responses).  
 The plan here is to use local params to make this configurable.  All of the 
 existing stat options could be modeled as a simple boolean param, but future 
 params (like percentiles) might take in a more complex param value...
 Example:
 {noformat}
 stats.field={!min=true max=true percentiles='99,99.999'}price
 stats.field={!mean=true}weight
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org