Re: Classify stream expression questions

Joel Bernstein Mon, 14 Aug 2017 19:46:48 -0700

My math was off again ... If you have 20 results from 50 shards that would
produce the 1000 results.


Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Aug 14, 2017 at 10:17 PM, Joel Bernstein <joels...@gmail.com> wrote:

> Actually my math was off. You would need 200 shards to get to 1000 result.
> How many shards do you have?
>
> The expression you provided also didn't include the ClusterText field in
> field list of the search. So perhaps it's missing other parameters.
>
> If you include all the parameters I may be able to spot the issue.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Mon, Aug 14, 2017 at 10:10 PM, Joel Bernstein <joels...@gmail.com>
> wrote:
>
>> It looks like you just need to set the rows parameter in the search
>> expression. If you don't set rows the default will be 20 I believe, which
>> will pull to top 20 docs from each shard. If you have 5 shards than the
>> 1000 results would make sense.
>>
>> You can parallelize the whole expression by wrapping it in a parallel
>> expression. You'll need to set the partitionKeys in the search expression
>> to do this.
>>
>> If you have a large number of records to process I would recommend batch
>> processing. This blog explains the parallel batch framework:
>>
>> http://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-para
>> llel-etl-and.html
>>
>>
>>
>>
>>
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Mon, Aug 14, 2017 at 7:53 PM, Joe Obernberger <
>> joseph.obernber...@gmail.com> wrote:
>>
>>> Hi All - I'm using the classify stream expression and the results
>>> returned are always limited to 1,000.  Where do I specify the number to
>>> return?  The stream expression that I'm using looks like:
>>>
>>> classify(model(models,id="MODEL1014",cacheMillis=5000),searc
>>> h(COL,df="FULL_DOCUMENT",q="Collection:(COLLECT2000) AND
>>> DocTimestamp:[2017-08-14T04:00:00Z TO 
>>> 2017-08-15T03:59:00Z]",fl="id,score",sort="id
>>> asc"),field="ClusterText")
>>>
>>> When I read this (code snipet):
>>>
>>>              stream.open();
>>>             while (true) {
>>>                 Tuple tuple = stream.read();
>>>                 if (tuple.EOF) {
>>>                     break;
>>>                 }
>>>                 Double probabilty = (Double)
>>> tuple.fields.get("probability_d");
>>>                 String docID = (String) tuple.fields.get("id");
>>>
>>> I get back 1,000 results.  Another question is if there is a way to
>>> parallelize the classify call to other worker nodes?  Thank you!
>>>
>>> -Joe
>>>
>>>
>>
>

Re: Classify stream expression questions

Reply via email to