Hi Hector,

On Jan 9, 2012, at 4:15pm, Hector Castro wrote:

> Hi,
> 
> Has anyone had success with multicore single node Solr configurations that 
> have one core acting solely as a dispatcher for the other cores?  For 
> example, say you had 4 populated Solr cores – configure a 5th to be the 
> definitive endpoint with `shards` containing cores 1-4.  
> 
> Is there any advantage to this setup over simply having requests distributed 
> randomly across the 4 populated cores (all with `shards` equal to cores 1-4)? 
>  Is it even worth distributing requests across the cores over always hitting 
> the same one?

If you have low query rates, then using a shards approach can improve 
performance on a multi-core (CPUs here, not Solr cores) setup.

By distributing the requests, you effectively use all CPU cores in parallel on 
one request.

And if you spread your shards across spindles, then you're also maximizing I/O 
throughput.

But there are a few issues with this approach:

- binary fields don't work. The results come back as "@B[<hex address>]", 
versus the actual data.
- short fields get "java.lang.Short" text prefixed on every value.
- deep queries result in lots of extra load. E.g. if you want the 5000th hit 
then you'll get (5000 * # of shards) hits being collected/returned to the 
dispatcher. Though only the unique id & score is returned in this case, 
followed by the second request to get the actual top N hits from the shards.

And there's something wonky with the way that distributed HTTP requests are 
queued up & processed - under load, I see IOExceptions where it's always N-1 
shards that succeed, and one shard request fails. But I don't have a good 
reproducible case yet to debug.

-- Ken

--------------------------
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr




Reply via email to