On 5/20/2020 11:43 AM, Modassar Ather wrote:
Can you please help me with following few questions?
- What is the ideal index size per shard?
We have no way of knowing that. A size that works well for one index
use case may not work well for another, even if the index size in both
cases is identical. Determining the ideal shard size requires
experimentation.
https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
- The optimisation takes lot of time and IOPs to complete. Will
increasing the number of shards help in reducing the optimisation time and
IOPs?
No, changing the number of shards will not help with the time required
to optimize, and might make it slower. Increasing the speed of the
disks won't help either. Optimizing involves a lot more than just
copying data -- it will never use all the available disk bandwidth of
modern disks. SolrCloud does optimizes of the shard replicas making up
a full collection sequentially, not simultaneously.
- We are planning to reduce each shard index size to 30GB and the entire
3.5 TB index will be distributed across more shards. In this case to almost
70+ shards. Will this help?
Maybe. Maybe not. You'll have to try it. If you increase the number
of shards without adding additional servers, I would expect things to
get worse, not better.
Kindly share your thoughts on how best we can use Solr with such a large
index size.
Something to keep in mind -- memory is the resource that makes the most
difference in performance. Buying enough memory to get decent
performance out of an index that big would probably be very expensive.
You should probably explore ways to make your index smaller. Another
idea is to split things up so the most frequently accessed search data
is in a relatively small index and lives on beefy servers, and data used
for less frequent or data-mining queries (where performance doesn't
matter as much) can live on less expensive servers.
Thanks,
Shawn