I suspect you've got nulls in your data. I just tested with null values and
got the same error. For testing purposes try loading the data with default
values of zero.


Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Mar 5, 2018 at 10:12 PM, Joel Bernstein <joels...@gmail.com> wrote:

> Let's break the expression down and build it up slowly. Let's start with:
>
> let(echo="true",
>      a=random(tx_prod_production, q="*:*", fq="isParent:true", rows="15",
> fl="oil_first_90_days_production,oil_last_30_days_production"),
>      b=col(a, oil_first_90_days_production))
>
>
> This should return variables a and b. Let's see what the data looks like.
> I changed the rows from 15 to 15000. If it all looks good we can expand the
> rows and continue adding functions.
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Mon, Mar 5, 2018 at 4:11 PM, John Smith <localde...@gmail.com> wrote:
>
>> Thanks Joel for your help on this.
>>
>> What I've done so far:
>> - unzip downloaded solr-7.2
>> - modify the _default "managed-schema" to add the random field type and
>> the dynamic random field
>> - start solr7 using "solr start -c"
>> - indexed my data using pint/pdouble/boolean field types etc
>>
>> I can now run the random function all by itself, it returns random
>> results as expected. So far so good!
>>
>> However... now trying to get the regression stuff working:
>>
>> let(a=random(tx_prod_production, q="*:*", fq="isParent:true",
>> rows="15000", fl="oil_first_90_days_producti
>> on,oil_last_30_days_production"),
>>     b=col(a, oil_first_90_days_production),
>>     c=col(a, oil_last_30_days_production),
>>     d=regress(b, c))
>>
>> Posted directly into solr admin UI. Run the streaming expression and I
>> get this error message:
>> "EXCEPTION": "Failed to evaluate expression regress(b,c) - Numeric value
>> expected but found type java.lang.String for value
>> oil_first_90_days_production"
>>
>> It thinks my numeric field is defined as a string? But when I view the
>> schema, those 2 fields are defined as ints:
>>
>>
>> When I run a normal query and choose xml as output format, then it also
>> puts "int" elements into the hitlist, so the schema appears to be correct
>> it's just when using this regress function that something goes wrong and
>> solr thinks the field is string.
>>
>> Any suggestions?
>> Thanks!
>> ​
>>
>>
>> On Thu, Mar 1, 2018 at 9:12 PM, Joel Bernstein <joels...@gmail.com>
>> wrote:
>>
>>> The field type will also need to be in the schema:
>>>
>>>  <!-- The "RandomSortField" is not used to store or search any
>>>
>>>          data.  You can declare fields of this type it in your schema
>>>
>>>          to generate pseudo-random orderings of your docs for sorting
>>>
>>>          or function purposes.  The ordering is generated based on the
>>> field
>>>
>>>          name and the version of the index. As long as the index version
>>>
>>>          remains unchanged, and the same field name is reused,
>>>
>>>          the ordering of the docs will be consistent.
>>>
>>>          If you want different psuedo-random orderings of documents,
>>>
>>>          for the same version of the index, use a dynamicField and
>>>
>>>          change the field name in the request.
>>>
>>>      -->
>>>
>>> <fieldType name="random" class="solr.RandomSortField" indexed="true" />
>>>
>>>
>>> Joel Bernstein
>>> http://joelsolr.blogspot.com/
>>>
>>> On Thu, Mar 1, 2018 at 8:00 PM, Joel Bernstein <joels...@gmail.com>
>>> wrote:
>>>
>>> > You'll need to have this field in your schema:
>>> >
>>> > <dynamicField name="random_*" type="random" />
>>> >
>>> > I'll check to see if the default schema used with solr start -c has
>>> this
>>> > field, if not I'll add it. Thanks for pointing this out.
>>> >
>>> > I checked and right now the random expression is only accepting one fq,
>>> > but I consider this a bug. It should accept multiple. I'll create
>>> ticket
>>> > for getting this fixed.
>>> >
>>> >
>>> >
>>> > Joel Bernstein
>>> > http://joelsolr.blogspot.com/
>>> >
>>> > On Thu, Mar 1, 2018 at 4:55 PM, John Smith <localde...@gmail.com>
>>> wrote:
>>> >
>>> >> Joel, thanks for the pointers to the streaming feature. I had no idea
>>> solr
>>> >> had that (and also just discovered the very intersting sql feature! I
>>> will
>>> >> be sure to investigate that in more detail in the future).
>>> >>
>>> >> However I'm having some trouble getting basic streaming functions
>>> working.
>>> >> I've already figured out that I had to move to "solr cloud" instead of
>>> >> "solr standalone" because I was getting errors about "cannot find zk
>>> >> instance" or whatever which went away when using "solr start -c"
>>> instead.
>>> >>
>>> >> But now I'm trying to use the random function since that was one of
>>> the
>>> >> functions used in your example.
>>> >>
>>> >> random(tx_header, q="*:*", rows="100", fl="countyname")
>>> >>
>>> >> I posted that directly in the "stream" section of the solr admin UI.
>>> This
>>> >> is all on linux, with solr 7.1.0 and 7.2.1 (tried several versions in
>>> case
>>> >> it was a bug in one)
>>> >>
>>> >> I get back an error message:
>>> >> *sort param could not be parsed as a query, and is not a field that
>>> exists
>>> >> in the index: random_-255009774*
>>> >>
>>> >> I'm not passing in any sort field anywhere. But the solr logs show
>>> these
>>> >> three log entries:
>>> >>
>>> >> 2018-03-01 21:41:18.954 INFO  (qtp257513673-21) [c:tx_header s:shard1
>>> >> r:core_node2 x:tx_header_shard1_replica_n1] o.a.s.c.S.Request
>>> >> [tx_header_shard1_replica_n1]  webapp=/solr path=/select
>>> >> params={q=*:*&_stateVer_=tx_header:6&fl=countyname
>>> >> *&sort=random_-255009774+asc*&rows=100&wt=javabin&version=2}
>>> status=400
>>> >> QTime=19
>>> >>
>>> >> 2018-03-01 21:41:18.966 ERROR (qtp257513673-17) [c:tx_header s:shard1
>>> >> r:core_node2 x:tx_header_shard1_replica_n1]
>>> o.a.s.c.s.i.CloudSolrClient
>>> >> Request to collection [tx_header] failed due to (400)
>>> >> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
>>> >> Error
>>> >> from server at http://192.168.13.31:8983/solr/tx_header: sort param
>>> could
>>> >> not be parsed as a query, and is not a field that exists in the index:
>>> >> random_-255009774, retry? 0
>>> >>
>>> >> 2018-03-01 21:41:18.968 ERROR (qtp257513673-17) [c:tx_header s:shard1
>>> >> r:core_node2 x:tx_header_shard1_replica_n1]
>>> o.a.s.c.s.i.s.ExceptionStream
>>> >> java.io.IOException:
>>> >> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
>>> >> Error
>>> >> from server at http://192.168.13.31:8983/solr/tx_header: sort param
>>> could
>>> >> not be parsed as a query, and is not a field that exists in the index:
>>> >> random_-255009774
>>> >>
>>> >>
>>> >> So basically it looks like solr is injecting the "sort=random_" stuff
>>> into
>>> >> my query and of course that is failing on the search since that
>>> >> field/column doesn't exist in my schema. Everytime I run the random
>>> >> function, I get a slightly different field name that it injects, but
>>> they
>>> >> all start with "random_" etc.
>>> >>
>>> >> I have tried adding my own sort field instead, hoping solr wouldn't
>>> inject
>>> >> one for me, but it still injected a random sort fieldname:
>>> >> random(tx_header, q="*:*", rows="100", fl="countyname",
>>> sort="countyname
>>> >> asc")
>>> >>
>>> >>
>>> >> Assuming I can fix that whole problem, my second question is: can I
>>> add
>>> >> multiple "fq=" parameters to the random function? I build a pretty
>>> >> complicated query using many fq= fields, and then want to run some
>>> stats
>>> >> on
>>> >> that hitlist; so somehow I have to pass in the query that made up the
>>> >> exact
>>> >> hitlist to these various functions, but when I used multiple "fq="
>>> values
>>> >> it only seemed to use the last one I specified and just ignored all
>>> the
>>> >> previous fq's?
>>> >>
>>> >> Thanks in advance for any comments/suggestions...!
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On Fri, Feb 23, 2018 at 5:59 PM, Joel Bernstein <joels...@gmail.com>
>>> >> wrote:
>>> >>
>>> >> > This is going to be a complex answer because Solr actually now has
>>> >> multiple
>>> >> > ways of doing regression analysis as part of the Streaming
>>> Expression
>>> >> > statistical programming library. The basic documentation is here:
>>> >> >
>>> >> > https://lucene.apache.org/solr/guide/7_2/statistical-program
>>> ming.html
>>> >> >
>>> >> > Here is a sample expression that performs a simple linear
>>> regression in
>>> >> > Solr 7.2:
>>> >> >
>>> >> > let(a=random(collection1, q="any query", rows="15000", fl="fieldA,
>>> >> > fieldB"),
>>> >> >     b=col(a, fieldA),
>>> >> >     c=col(a, fieldB),
>>> >> >     d=regress(b, c))
>>> >> >
>>> >> >
>>> >> > The expression above takes a random sample of 15000 results from
>>> >> > collection1. The result set will include fieldA and fieldB in each
>>> >> record.
>>> >> > The result set is stored in variable "a".
>>> >> >
>>> >> > Then the "col" function creates arrays of numbers from the results
>>> >> stored
>>> >> > in variable a. The values in fieldA are stored in the variable "b".
>>> The
>>> >> > values in fieldB are stored in variable "c".
>>> >> >
>>> >> > Then the regress function performs a simple linear regression on
>>> arrays
>>> >> > stored in variables "b" and "c".
>>> >> >
>>> >> > The output of the regress function is a map containing the
>>> regression
>>> >> > result. This result includes RSquared and other attributes of the
>>> >> > regression model such as R (correlation), slope, y intercept etc...
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > Joel Bernstein
>>> >> > http://joelsolr.blogspot.com/
>>> >> >
>>> >> > On Fri, Feb 23, 2018 at 3:10 PM, John Smith <localde...@gmail.com>
>>> >> wrote:
>>> >> >
>>> >> > > Hi Joel, thanks for the answer. I'm not really a stats guy, but
>>> the
>>> >> end
>>> >> > > result of all this is supposed to be obtaining R^2. Is there no
>>> way of
>>> >> > > obtaining this value, then (short of iterating over all the
>>> results in
>>> >> > the
>>> >> > > hitlist and calculating it myself)?
>>> >> > >
>>> >> > > On Fri, Feb 23, 2018 at 12:26 PM, Joel Bernstein <
>>> joels...@gmail.com>
>>> >> > > wrote:
>>> >> > >
>>> >> > > > Typically SSE is the sum of the squared errors of the
>>> prediction in
>>> >> a
>>> >> > > > regression analysis. The stats component doesn't perform
>>> regression,
>>> >> > > > although it might be a nice feature.
>>> >> > > >
>>> >> > > >
>>> >> > > >
>>> >> > > > Joel Bernstein
>>> >> > > > http://joelsolr.blogspot.com/
>>> >> > > >
>>> >> > > > On Fri, Feb 23, 2018 at 12:17 PM, John Smith <
>>> localde...@gmail.com>
>>> >> > > wrote:
>>> >> > > >
>>> >> > > > > I'm using solr, and enabling stats as per this page:
>>> >> > > > > https://lucene.apache.org/solr/guide/6_6/the-stats-component
>>> .html
>>> >> > > > >
>>> >> > > > > I want to get more stat values though. Specifically I'm
>>> looking
>>> >> for
>>> >> > > > > r-squared (coefficient of determination). This value is not
>>> >> present
>>> >> > in
>>> >> > > > > solr, however some of the pieces used to calculate r^2 are in
>>> the
>>> >> > stats
>>> >> > > > > element, for example:
>>> >> > > > >
>>> >> > > > > <double name="min">0.0</double>
>>> >> > > > > <double name="max">10.0</double>
>>> >> > > > > <long name="count">15</long>
>>> >> > > > > <long name="missing">17</long>
>>> >> > > > > <double name="sum">85.0</double>
>>> >> > > > > <double name="sumOfSquares">603.0</double>
>>> >> > > > > <double name="mean">5.666666666666667</double>
>>> >> > > > > <double name="stddev">2.943920288775949</double>
>>> >> > > > >
>>> >> > > > >
>>> >> > > > > So I have the sumOfSquares available (SST), and using this
>>> >> > > calculation, I
>>> >> > > > > can get R^2:
>>> >> > > > >
>>> >> > > > > R^2 = 1 - SSE/SST
>>> >> > > > >
>>> >> > > > > All I need then is SSE. Is there anyway I can get SSE from
>>> those
>>> >> > other
>>> >> > > > > stats in solr?
>>> >> > > > >
>>> >> > > > > Thanks in advance!
>>> >> > > > >
>>> >> > > >
>>> >> > >
>>> >> >
>>> >>
>>> >
>>> >
>>>
>>
>>
>

Reply via email to