"...it is a distributed real-time query scheme..."
SolrCloud does this already. It treats all the shards like one-big-index, and you can query it normally to get "subset" results from each shard. Why do you have to re-write the query for each shard? Seems unnecessary. <br><br><br>------- Original Message ------- On 4/9/2012 08:45 AM Benson Margulies wrote:<br> Jan Høydahl, <br> <br>My problem is intimately connected to Solr. it is not a batch job for <br>hadoop, it is a distributed real-time query scheme. I hate to add yet <br>another complex framework if a Solr RP can do the job simply. <br> <br>For this problem, I can transform a Solr query into a subset query on <br>each shard, and then let the SolrCloud mechanism. <br> <br>I am well aware of the 'zoo' of alternatives, and I will be evaluating <br>them if I can't get what I want from Solr. <br> <br>On Mon, Apr 9, 2012 at 9:34 AM, Jan Høydahl <jan....@cominvent.com> wrote: <br>> Hi, <br>> <br>> Instead of using Solr, you may want to have a look at Hadoop or another framework for distributed computation, see e.g. http://java.dzone.com/articles/comparison-gridcloud-computing <br>> <br>> -- <br>> Jan Høydahl, search solution architect <br>> Cominvent AS - www.cominvent.com <br>> Solr Training - www.solrtraining.com <br>> <br>> On 9. apr. 2012, at 13:41, Benson Margulies wrote: <br>> <br>>> I'm working on a prototype of a scheme that uses SolrCloud to, in <br>>> effect, distribute a computation by running it inside of a request <br>>> processor. <br>>> <br>>> If there are N shards and M operations, I want each node to perform <br>>> M/N operations. That, of course, implies that I know N. <br>>> <br>>> Is that fact available anyplace inside Solr, or do I need to just configure it? <br>> <br> <br>