I suspect you've got nulls in your data. I just tested with null values and got the same error. For testing purposes try loading the data with default values of zero.
Joel Bernstein http://joelsolr.blogspot.com/ On Mon, Mar 5, 2018 at 10:12 PM, Joel Bernstein <joels...@gmail.com> wrote: > Let's break the expression down and build it up slowly. Let's start with: > > let(echo="true", > a=random(tx_prod_production, q="*:*", fq="isParent:true", rows="15", > fl="oil_first_90_days_production,oil_last_30_days_production"), > b=col(a, oil_first_90_days_production)) > > > This should return variables a and b. Let's see what the data looks like. > I changed the rows from 15 to 15000. If it all looks good we can expand the > rows and continue adding functions. > > > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Mon, Mar 5, 2018 at 4:11 PM, John Smith <localde...@gmail.com> wrote: > >> Thanks Joel for your help on this. >> >> What I've done so far: >> - unzip downloaded solr-7.2 >> - modify the _default "managed-schema" to add the random field type and >> the dynamic random field >> - start solr7 using "solr start -c" >> - indexed my data using pint/pdouble/boolean field types etc >> >> I can now run the random function all by itself, it returns random >> results as expected. So far so good! >> >> However... now trying to get the regression stuff working: >> >> let(a=random(tx_prod_production, q="*:*", fq="isParent:true", >> rows="15000", fl="oil_first_90_days_producti >> on,oil_last_30_days_production"), >> b=col(a, oil_first_90_days_production), >> c=col(a, oil_last_30_days_production), >> d=regress(b, c)) >> >> Posted directly into solr admin UI. Run the streaming expression and I >> get this error message: >> "EXCEPTION": "Failed to evaluate expression regress(b,c) - Numeric value >> expected but found type java.lang.String for value >> oil_first_90_days_production" >> >> It thinks my numeric field is defined as a string? But when I view the >> schema, those 2 fields are defined as ints: >> >> >> When I run a normal query and choose xml as output format, then it also >> puts "int" elements into the hitlist, so the schema appears to be correct >> it's just when using this regress function that something goes wrong and >> solr thinks the field is string. >> >> Any suggestions? >> Thanks! >> >> >> >> On Thu, Mar 1, 2018 at 9:12 PM, Joel Bernstein <joels...@gmail.com> >> wrote: >> >>> The field type will also need to be in the schema: >>> >>> <!-- The "RandomSortField" is not used to store or search any >>> >>> data. You can declare fields of this type it in your schema >>> >>> to generate pseudo-random orderings of your docs for sorting >>> >>> or function purposes. The ordering is generated based on the >>> field >>> >>> name and the version of the index. As long as the index version >>> >>> remains unchanged, and the same field name is reused, >>> >>> the ordering of the docs will be consistent. >>> >>> If you want different psuedo-random orderings of documents, >>> >>> for the same version of the index, use a dynamicField and >>> >>> change the field name in the request. >>> >>> --> >>> >>> <fieldType name="random" class="solr.RandomSortField" indexed="true" /> >>> >>> >>> Joel Bernstein >>> http://joelsolr.blogspot.com/ >>> >>> On Thu, Mar 1, 2018 at 8:00 PM, Joel Bernstein <joels...@gmail.com> >>> wrote: >>> >>> > You'll need to have this field in your schema: >>> > >>> > <dynamicField name="random_*" type="random" /> >>> > >>> > I'll check to see if the default schema used with solr start -c has >>> this >>> > field, if not I'll add it. Thanks for pointing this out. >>> > >>> > I checked and right now the random expression is only accepting one fq, >>> > but I consider this a bug. It should accept multiple. I'll create >>> ticket >>> > for getting this fixed. >>> > >>> > >>> > >>> > Joel Bernstein >>> > http://joelsolr.blogspot.com/ >>> > >>> > On Thu, Mar 1, 2018 at 4:55 PM, John Smith <localde...@gmail.com> >>> wrote: >>> > >>> >> Joel, thanks for the pointers to the streaming feature. I had no idea >>> solr >>> >> had that (and also just discovered the very intersting sql feature! I >>> will >>> >> be sure to investigate that in more detail in the future). >>> >> >>> >> However I'm having some trouble getting basic streaming functions >>> working. >>> >> I've already figured out that I had to move to "solr cloud" instead of >>> >> "solr standalone" because I was getting errors about "cannot find zk >>> >> instance" or whatever which went away when using "solr start -c" >>> instead. >>> >> >>> >> But now I'm trying to use the random function since that was one of >>> the >>> >> functions used in your example. >>> >> >>> >> random(tx_header, q="*:*", rows="100", fl="countyname") >>> >> >>> >> I posted that directly in the "stream" section of the solr admin UI. >>> This >>> >> is all on linux, with solr 7.1.0 and 7.2.1 (tried several versions in >>> case >>> >> it was a bug in one) >>> >> >>> >> I get back an error message: >>> >> *sort param could not be parsed as a query, and is not a field that >>> exists >>> >> in the index: random_-255009774* >>> >> >>> >> I'm not passing in any sort field anywhere. But the solr logs show >>> these >>> >> three log entries: >>> >> >>> >> 2018-03-01 21:41:18.954 INFO (qtp257513673-21) [c:tx_header s:shard1 >>> >> r:core_node2 x:tx_header_shard1_replica_n1] o.a.s.c.S.Request >>> >> [tx_header_shard1_replica_n1] webapp=/solr path=/select >>> >> params={q=*:*&_stateVer_=tx_header:6&fl=countyname >>> >> *&sort=random_-255009774+asc*&rows=100&wt=javabin&version=2} >>> status=400 >>> >> QTime=19 >>> >> >>> >> 2018-03-01 21:41:18.966 ERROR (qtp257513673-17) [c:tx_header s:shard1 >>> >> r:core_node2 x:tx_header_shard1_replica_n1] >>> o.a.s.c.s.i.CloudSolrClient >>> >> Request to collection [tx_header] failed due to (400) >>> >> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: >>> >> Error >>> >> from server at http://192.168.13.31:8983/solr/tx_header: sort param >>> could >>> >> not be parsed as a query, and is not a field that exists in the index: >>> >> random_-255009774, retry? 0 >>> >> >>> >> 2018-03-01 21:41:18.968 ERROR (qtp257513673-17) [c:tx_header s:shard1 >>> >> r:core_node2 x:tx_header_shard1_replica_n1] >>> o.a.s.c.s.i.s.ExceptionStream >>> >> java.io.IOException: >>> >> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: >>> >> Error >>> >> from server at http://192.168.13.31:8983/solr/tx_header: sort param >>> could >>> >> not be parsed as a query, and is not a field that exists in the index: >>> >> random_-255009774 >>> >> >>> >> >>> >> So basically it looks like solr is injecting the "sort=random_" stuff >>> into >>> >> my query and of course that is failing on the search since that >>> >> field/column doesn't exist in my schema. Everytime I run the random >>> >> function, I get a slightly different field name that it injects, but >>> they >>> >> all start with "random_" etc. >>> >> >>> >> I have tried adding my own sort field instead, hoping solr wouldn't >>> inject >>> >> one for me, but it still injected a random sort fieldname: >>> >> random(tx_header, q="*:*", rows="100", fl="countyname", >>> sort="countyname >>> >> asc") >>> >> >>> >> >>> >> Assuming I can fix that whole problem, my second question is: can I >>> add >>> >> multiple "fq=" parameters to the random function? I build a pretty >>> >> complicated query using many fq= fields, and then want to run some >>> stats >>> >> on >>> >> that hitlist; so somehow I have to pass in the query that made up the >>> >> exact >>> >> hitlist to these various functions, but when I used multiple "fq=" >>> values >>> >> it only seemed to use the last one I specified and just ignored all >>> the >>> >> previous fq's? >>> >> >>> >> Thanks in advance for any comments/suggestions...! >>> >> >>> >> >>> >> >>> >> >>> >> On Fri, Feb 23, 2018 at 5:59 PM, Joel Bernstein <joels...@gmail.com> >>> >> wrote: >>> >> >>> >> > This is going to be a complex answer because Solr actually now has >>> >> multiple >>> >> > ways of doing regression analysis as part of the Streaming >>> Expression >>> >> > statistical programming library. The basic documentation is here: >>> >> > >>> >> > https://lucene.apache.org/solr/guide/7_2/statistical-program >>> ming.html >>> >> > >>> >> > Here is a sample expression that performs a simple linear >>> regression in >>> >> > Solr 7.2: >>> >> > >>> >> > let(a=random(collection1, q="any query", rows="15000", fl="fieldA, >>> >> > fieldB"), >>> >> > b=col(a, fieldA), >>> >> > c=col(a, fieldB), >>> >> > d=regress(b, c)) >>> >> > >>> >> > >>> >> > The expression above takes a random sample of 15000 results from >>> >> > collection1. The result set will include fieldA and fieldB in each >>> >> record. >>> >> > The result set is stored in variable "a". >>> >> > >>> >> > Then the "col" function creates arrays of numbers from the results >>> >> stored >>> >> > in variable a. The values in fieldA are stored in the variable "b". >>> The >>> >> > values in fieldB are stored in variable "c". >>> >> > >>> >> > Then the regress function performs a simple linear regression on >>> arrays >>> >> > stored in variables "b" and "c". >>> >> > >>> >> > The output of the regress function is a map containing the >>> regression >>> >> > result. This result includes RSquared and other attributes of the >>> >> > regression model such as R (correlation), slope, y intercept etc... >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > Joel Bernstein >>> >> > http://joelsolr.blogspot.com/ >>> >> > >>> >> > On Fri, Feb 23, 2018 at 3:10 PM, John Smith <localde...@gmail.com> >>> >> wrote: >>> >> > >>> >> > > Hi Joel, thanks for the answer. I'm not really a stats guy, but >>> the >>> >> end >>> >> > > result of all this is supposed to be obtaining R^2. Is there no >>> way of >>> >> > > obtaining this value, then (short of iterating over all the >>> results in >>> >> > the >>> >> > > hitlist and calculating it myself)? >>> >> > > >>> >> > > On Fri, Feb 23, 2018 at 12:26 PM, Joel Bernstein < >>> joels...@gmail.com> >>> >> > > wrote: >>> >> > > >>> >> > > > Typically SSE is the sum of the squared errors of the >>> prediction in >>> >> a >>> >> > > > regression analysis. The stats component doesn't perform >>> regression, >>> >> > > > although it might be a nice feature. >>> >> > > > >>> >> > > > >>> >> > > > >>> >> > > > Joel Bernstein >>> >> > > > http://joelsolr.blogspot.com/ >>> >> > > > >>> >> > > > On Fri, Feb 23, 2018 at 12:17 PM, John Smith < >>> localde...@gmail.com> >>> >> > > wrote: >>> >> > > > >>> >> > > > > I'm using solr, and enabling stats as per this page: >>> >> > > > > https://lucene.apache.org/solr/guide/6_6/the-stats-component >>> .html >>> >> > > > > >>> >> > > > > I want to get more stat values though. Specifically I'm >>> looking >>> >> for >>> >> > > > > r-squared (coefficient of determination). This value is not >>> >> present >>> >> > in >>> >> > > > > solr, however some of the pieces used to calculate r^2 are in >>> the >>> >> > stats >>> >> > > > > element, for example: >>> >> > > > > >>> >> > > > > <double name="min">0.0</double> >>> >> > > > > <double name="max">10.0</double> >>> >> > > > > <long name="count">15</long> >>> >> > > > > <long name="missing">17</long> >>> >> > > > > <double name="sum">85.0</double> >>> >> > > > > <double name="sumOfSquares">603.0</double> >>> >> > > > > <double name="mean">5.666666666666667</double> >>> >> > > > > <double name="stddev">2.943920288775949</double> >>> >> > > > > >>> >> > > > > >>> >> > > > > So I have the sumOfSquares available (SST), and using this >>> >> > > calculation, I >>> >> > > > > can get R^2: >>> >> > > > > >>> >> > > > > R^2 = 1 - SSE/SST >>> >> > > > > >>> >> > > > > All I need then is SSE. Is there anyway I can get SSE from >>> those >>> >> > other >>> >> > > > > stats in solr? >>> >> > > > > >>> >> > > > > Thanks in advance! >>> >> > > > > >>> >> > > > >>> >> > > >>> >> > >>> >> >>> > >>> > >>> >> >> >