Hi folks,

We have a solution where we would like to connect to SOLR via an API, submit a 
query, and then pre-process the results before we return the results to our 
users.  However, in some cases, it is possible that the results being returned 
by SOLR, in a large distributed cluster deployment, is very large.  In these 
cases, we would like to set up parallel streams, so that each parallel SOLR 
worker feeds directly into one of our processes distributed across the cluster. 
 That way, we can pre-process those results in parallel, before we consolidate 
(and potentially reduce / aggregate) the results further for the user, who has 
a single client connection to our solution.  Sort of a MapReduce type scenario 
where our processors are the reducers.  We could consume the results as 
returned by these SOLR Worker processes, or perhaps have them shuffled based on 
a shard key, before our processes would receive them.

Any ideas on how this could be done?

Rohit Jain

Reply via email to