My math was off again ... If you have 20 results from 50 shards that would produce the 1000 results.
Joel Bernstein http://joelsolr.blogspot.com/ On Mon, Aug 14, 2017 at 10:17 PM, Joel Bernstein <joels...@gmail.com> wrote: > Actually my math was off. You would need 200 shards to get to 1000 result. > How many shards do you have? > > The expression you provided also didn't include the ClusterText field in > field list of the search. So perhaps it's missing other parameters. > > If you include all the parameters I may be able to spot the issue. > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Mon, Aug 14, 2017 at 10:10 PM, Joel Bernstein <joels...@gmail.com> > wrote: > >> It looks like you just need to set the rows parameter in the search >> expression. If you don't set rows the default will be 20 I believe, which >> will pull to top 20 docs from each shard. If you have 5 shards than the >> 1000 results would make sense. >> >> You can parallelize the whole expression by wrapping it in a parallel >> expression. You'll need to set the partitionKeys in the search expression >> to do this. >> >> If you have a large number of records to process I would recommend batch >> processing. This blog explains the parallel batch framework: >> >> http://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-para >> llel-etl-and.html >> >> >> >> >> >> >> Joel Bernstein >> http://joelsolr.blogspot.com/ >> >> On Mon, Aug 14, 2017 at 7:53 PM, Joe Obernberger < >> joseph.obernber...@gmail.com> wrote: >> >>> Hi All - I'm using the classify stream expression and the results >>> returned are always limited to 1,000. Where do I specify the number to >>> return? The stream expression that I'm using looks like: >>> >>> classify(model(models,id="MODEL1014",cacheMillis=5000),searc >>> h(COL,df="FULL_DOCUMENT",q="Collection:(COLLECT2000) AND >>> DocTimestamp:[2017-08-14T04:00:00Z TO >>> 2017-08-15T03:59:00Z]",fl="id,score",sort="id >>> asc"),field="ClusterText") >>> >>> When I read this (code snipet): >>> >>> stream.open(); >>> while (true) { >>> Tuple tuple = stream.read(); >>> if (tuple.EOF) { >>> break; >>> } >>> Double probabilty = (Double) >>> tuple.fields.get("probability_d"); >>> String docID = (String) tuple.fields.get("id"); >>> >>> I get back 1,000 results. Another question is if there is a way to >>> parallelize the classify call to other worker nodes? Thank you! >>> >>> -Joe >>> >>> >> >