Re: statistics in hitlist

2018-03-16 Thread Joel Bernstein
With regression you're looking at how the change in one variable effects the change in another variable. So you need to have values that are changing. What you described is an average of field X which is not changing, regressed against the value of X. I think one approach to this is to regress

Re: statistics in hitlist

2018-03-16 Thread John Smith
Thanks for the link to the documentation, that will probably come in useful. I didn't see a way though, to get my avg function working? So instead of doing a linear regression on two fields, X and Y, in a hitlist, we need to do a linear regression on field X, and the average value of X. Is that

Re: statistics in hitlist

2018-03-15 Thread Joel Bernstein
I've been working on the user guide for the math expressions. Here is the page on regression: https://github.com/joel-bernstein/lucene-solr/blob/math_expressions_documentation/solr/solr-ref-guide/src/regression.adoc This page is part of the larger math expression documentation. The TOC is here:

Re: statistics in hitlist

2018-03-15 Thread Joel Bernstein
If you want to get everything in query you can do this: let(echo="d,e", a=search(tx_prod_production, q="oil_first_90_days_production:[1 TO *]", fq="isParent:true", rows="150", fl="id,oil_first_90_days_production,oil_last_30_days_production", sort="id asc"), b=col(a,

Re: statistics in hitlist

2018-03-15 Thread Erick Erickson
What does the fq clause look like? On Thu, Mar 15, 2018 at 11:51 AM, John Smith wrote: > Hi Joel, I did some more work on this statistics stuff today. Yes, we do > have nulls in our data; the document contains many fields, we don't always > have values for each field, but

Re: statistics in hitlist

2018-03-15 Thread John Smith
Hi Joel, I did some more work on this statistics stuff today. Yes, we do have nulls in our data; the document contains many fields, we don't always have values for each field, but we can't set the nulls to 0 either (or any other value, really) as that will mess up other calculations (such as when

Re: statistics in hitlist

2018-03-05 Thread Joel Bernstein
I suspect you've got nulls in your data. I just tested with null values and got the same error. For testing purposes try loading the data with default values of zero. Joel Bernstein http://joelsolr.blogspot.com/ On Mon, Mar 5, 2018 at 10:12 PM, Joel Bernstein wrote: >

Re: statistics in hitlist

2018-03-05 Thread Joel Bernstein
Let's break the expression down and build it up slowly. Let's start with: let(echo="true", a=random(tx_prod_production, q="*:*", fq="isParent:true", rows="15", fl="oil_first_90_days_production,oil_last_30_days_production"), b=col(a, oil_first_90_days_production)) This should return

Re: statistics in hitlist

2018-03-05 Thread John Smith
Thanks Joel for your help on this. What I've done so far: - unzip downloaded solr-7.2 - modify the _default "managed-schema" to add the random field type and the dynamic random field - start solr7 using "solr start -c" - indexed my data using pint/pdouble/boolean field types etc I can now run

Re: statistics in hitlist

2018-03-01 Thread Joel Bernstein
The field type will also need to be in the schema: Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Mar 1, 2018 at 8:00 PM, Joel Bernstein wrote: > You'll need to have this field in your schema: > > > > I'll check to see if the default schema used with solr start

Re: statistics in hitlist

2018-03-01 Thread Joel Bernstein
You'll need to have this field in your schema: I'll check to see if the default schema used with solr start -c has this field, if not I'll add it. Thanks for pointing this out. I checked and right now the random expression is only accepting one fq, but I consider this a bug. It should accept

Re: statistics in hitlist

2018-03-01 Thread John Smith
Joel, thanks for the pointers to the streaming feature. I had no idea solr had that (and also just discovered the very intersting sql feature! I will be sure to investigate that in more detail in the future). However I'm having some trouble getting basic streaming functions working. I've already

Re: statistics in hitlist

2018-02-23 Thread Joel Bernstein
This is going to be a complex answer because Solr actually now has multiple ways of doing regression analysis as part of the Streaming Expression statistical programming library. The basic documentation is here: https://lucene.apache.org/solr/guide/7_2/statistical-programming.html Here is a

Re: statistics in hitlist

2018-02-23 Thread John Smith
Hi Joel, thanks for the answer. I'm not really a stats guy, but the end result of all this is supposed to be obtaining R^2. Is there no way of obtaining this value, then (short of iterating over all the results in the hitlist and calculating it myself)? On Fri, Feb 23, 2018 at 12:26 PM, Joel

Re: statistics in hitlist

2018-02-23 Thread Joel Bernstein
Typically SSE is the sum of the squared errors of the prediction in a regression analysis. The stats component doesn't perform regression, although it might be a nice feature. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Feb 23, 2018 at 12:17 PM, John Smith

statistics in hitlist

2018-02-23 Thread John Smith
I'm using solr, and enabling stats as per this page: https://lucene.apache.org/solr/guide/6_6/the-stats-component.html I want to get more stat values though. Specifically I'm looking for r-squared (coefficient of determination). This value is not present in solr, however some of the pieces used