Hello,

I am experimenting with solr distributed search/random sharding (currently
use geo sharding), hope to gain some performance and also scalability in
the future. (index size keep growing and geo shard is hard to scale)

However I'm seeing worse performance with distributed search, on a testing
server of 6 shards, 15 core cpu, 24G mem, index size is about 8G on each
shard. With geo sharding it can easily take 150 QPS load with good response
time. Now with distribute search, there are timeout and average response
time also inreases. This is probably no big surprise since I'm using same
amount of shards and plus overhead of distribute search/merge/http network
etc.

When I look into details (slow queries), I found some real issues that I
need help with. For example, a query which takes 200ms with geo sharding,
now timeout (>2000ms) with distributed search. And each shard query
(isShard=true) takes about 1200ms. But if I run the query toward the shard
only (without distributed search), it only takes <200ms. So I compared the
two query urls, the only difference is shard query using distribute
search has "fsv=true". I understand field sort values are need during merge
process, but didn't expect that'll make this much difference in
performance, although we do have lot of sort orders (about 20 different
sort orders).

Any suggestion/comment on the performance problem I'm having with
distributed search? Is distributed search the right choice for me? What
other setup/idea I can try?

thanks,
XJ

Reply via email to