Shawn, thank you for the tips. I know the significant cons of virtualization, but I don't want to move this thread into a virtualization pros/cons in the Solr(Cloud) case.
I've just asked what is the minimal code change should be made, in order to examine whether this is a possible solution or not.. :) On Sun, Jul 28, 2013 at 1:06 AM, Shawn Heisey <s...@elyograg.org> wrote: > On 7/27/2013 3:33 PM, Isaac Hebsh wrote: > > I have about 40 shards. repFactor=2. > > The cause of slower shards is very interesting, and this is the main > > approach we took. > > Note that in every query, it is another shard which is the slowest. In > 20% > > of the queries, the slowest shard takes about 4 times more than the > average > > shard qtime. > > While continuing investigation, remember it might be the virtualization / > > storage-access / network / gc /..., so I thought that reducing the effect > > of the slow shards might be a good (temporary or permanent) solution. > > Virtualization is not the best approach for Solr. Assuming you're > dealing with your own hardware and not something based in the cloud like > Amazon, you can get better results by running on bare metal and having > multiple shards per host. > > Garbage collection is a very likely source of this problem. > > http://wiki.apache.org/solr/SolrPerformanceProblems#GC_pause_problems > > > I thought it should be an almost trivial code change (for proving the > > concept). Isn't it? > > I have no idea what you're saying/asking here. Can you clarify? > > It seems to me that sending requests to all replicas would just increase > the overall load on the cluster, with no real benefit. > > Thanks, > Shawn > >