On 5/20/2020 11:43 AM, Modassar Ather wrote:
Can you please help me with following few questions?

    - What is the ideal index size per shard?

We have no way of knowing that. A size that works well for one index use case may not work well for another, even if the index size in both cases is identical. Determining the ideal shard size requires experimentation.

https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

    - The optimisation takes lot of time and IOPs to complete. Will
    increasing the number of shards help in reducing the optimisation time and
    IOPs?

No, changing the number of shards will not help with the time required to optimize, and might make it slower. Increasing the speed of the disks won't help either. Optimizing involves a lot more than just copying data -- it will never use all the available disk bandwidth of modern disks. SolrCloud does optimizes of the shard replicas making up a full collection sequentially, not simultaneously.

    - We are planning to reduce each shard index size to 30GB and the entire
    3.5 TB index will be distributed across more shards. In this case to almost
    70+ shards. Will this help?

Maybe. Maybe not. You'll have to try it. If you increase the number of shards without adding additional servers, I would expect things to get worse, not better.

Kindly share your thoughts on how best we can use Solr with such a large
index size.

Something to keep in mind -- memory is the resource that makes the most difference in performance. Buying enough memory to get decent performance out of an index that big would probably be very expensive. You should probably explore ways to make your index smaller. Another idea is to split things up so the most frequently accessed search data is in a relatively small index and lives on beefy servers, and data used for less frequent or data-mining queries (where performance doesn't matter as much) can live on less expensive servers.

Thanks,
Shawn

Reply via email to