Re: Need help on handling large size of index.

2020-05-22 Thread Phill Campbell
Maybe your problems are in AWS land. > On May 22, 2020, at 3:45 AM, Modassar Ather wrote: > > Thanks Erick and Phill. > > We index data weekly once and that is why we do the optimisation and it has > helped in faster query result. I will experiment with a fewer segments with > the current

Re: Need help on handling large size of index.

2020-05-22 Thread Modassar Ather
Thanks Erick and Phill. We index data weekly once and that is why we do the optimisation and it has helped in faster query result. I will experiment with a fewer segments with the current hardware. The thing I am not clear about is although there is no constant high usage of extra IOPs other

Re: Need help on handling large size of index.

2020-05-21 Thread Phill Campbell
The optimal size for a shard of the index is be definition what works best on the hardware with the JVM heap that is in use. More shards mean smaller sizes of the index for the shard as you already know. I spent months changing the sharing, the JVM heap, the GC values before taking the system

Re: Need help on handling large size of index.

2020-05-21 Thread Erick Erickson
Please consider _not_ optimizing. It’s kind of a misleading name anyway, and the version of solr you’re using may have unintended consequences, see: https://lucidworks.com/post/segment-merging-deleted-documents-optimize-may-bad/ and

Re: Need help on handling large size of index.

2020-05-21 Thread Modassar Ather
Thanks Shawn for your response. We have seen a performance increase in optimisation with a bigger number of IOPs. Without the IOPs we saw the optimisation took around 15-20 hours whereas the same index took 5-6 hours to optimise with higher IOPs. Yes the entire extra IOPs were never used to full

Re: Need help on handling large size of index.

2020-05-21 Thread Modassar Ather
Thanks Phill for your response. Optimal Index size: Depends on what you are optimizing for. Query Speed? Hardware utilization? We are optimising it for query speed. What I understand even if we set the merge policy to any number the amount of hard disk will still be required for the bigger

Re: Need help on handling large size of index.

2020-05-20 Thread Shawn Heisey
On 5/20/2020 11:43 AM, Modassar Ather wrote: Can you please help me with following few questions? - What is the ideal index size per shard? We have no way of knowing that. A size that works well for one index use case may not work well for another, even if the index size in both cases

Re: Need help on handling large size of index.

2020-05-20 Thread Phill Campbell
In my world your index size is common. Optimal Index size: Depends on what you are optimizing for. Query Speed? Hardware utilization? Optimizing the index is something I never do. We live with about 28% deletes. You should check your configuration for your merge policy. I run 120 shards, and I

Re: Need help on handling large size of index.

2020-05-20 Thread Phill Campbell
In my world your index size is common. Optimal Index size: Depends on what you are optimizing for. Query Speed? Hardware utilization? Optimizing the index is something I never do. We live with about 28% deletes. You should check your configuration for your merge policy. I run 120 shards, and I

Need help on handling large size of index.

2020-05-20 Thread Modassar Ather
Hi, Currently we have index of size 3.5 TB. These index are distributed across 12 shards under two cores. The size of index on each shards are almost equal. We do a delta indexing every week and optimise the index. The server configuration is as follows. - Solr Version : 6.5.1 - AWS