Hi Hector, On Jan 9, 2012, at 4:15pm, Hector Castro wrote:
> Hi, > > Has anyone had success with multicore single node Solr configurations that > have one core acting solely as a dispatcher for the other cores? For > example, say you had 4 populated Solr cores – configure a 5th to be the > definitive endpoint with `shards` containing cores 1-4. > > Is there any advantage to this setup over simply having requests distributed > randomly across the 4 populated cores (all with `shards` equal to cores 1-4)? > Is it even worth distributing requests across the cores over always hitting > the same one? If you have low query rates, then using a shards approach can improve performance on a multi-core (CPUs here, not Solr cores) setup. By distributing the requests, you effectively use all CPU cores in parallel on one request. And if you spread your shards across spindles, then you're also maximizing I/O throughput. But there are a few issues with this approach: - binary fields don't work. The results come back as "@B[<hex address>]", versus the actual data. - short fields get "java.lang.Short" text prefixed on every value. - deep queries result in lots of extra load. E.g. if you want the 5000th hit then you'll get (5000 * # of shards) hits being collected/returned to the dispatcher. Though only the unique id & score is returned in this case, followed by the second request to get the actual top N hits from the shards. And there's something wonky with the way that distributed HTTP requests are queued up & processed - under load, I see IOExceptions where it's always N-1 shards that succeed, and one shard request fails. But I don't have a good reproducible case yet to debug. -- Ken -------------------------- Ken Krugler http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Mahout & Solr