Thanks Phill for your response.

Optimal Index size: Depends on what you are optimizing for. Query Speed?
Hardware utilization?
We are optimising it for query speed. What I understand even if we set the
merge policy to any number the amount of hard disk will still be required
for the bigger segment merges. Please correct me if I am wrong.

Optimizing the index is something I never do. We live with about 28%
deletes. You should check your configuration for your merge policy.
There is a delete of about 10-20% in our updates. We have no merge policy
set in configuration as we do a full optimisation after the indexing.

Increased sharding has helped reduce query response time, but surely there
is a point where the colation of results starts to be the bottleneck.
The query response time is my concern. I understand the aggregation of
results may increase the search response time.

*What does your schema look like? I index around 120 fields per document.*
The schema has a combination of text and string fields. None of the field
except Id field is stored. We also have around 120 fields. A few of them
have docValues enabled.

*What does your queries look like? Mine are so varied that caching never
helps, the same query rarely comes through.*
Our search queries are combination of proximity, nested proximity and
wildcards most of the time. The query can be very complex with 100s of
wildcard and proximity terms in it. Different grouping option are also
enabled on search result. And the search queries vary a lot.

Oh, another thing, are you concerned about  availability? Do you have a
replication factor > 1? Do you run those replicas in a different region for
safety?
How many zookeepers are you running and where are they?
As of now we do not have any replication factor. We are not using zookeeper
ensemble but would like to move to it sooner.

Best,
Modassar

On Thu, May 21, 2020 at 9:19 AM Shawn Heisey <apa...@elyograg.org> wrote:

> On 5/20/2020 11:43 AM, Modassar Ather wrote:
> > Can you please help me with following few questions?
> >
> >     - What is the ideal index size per shard?
>
> We have no way of knowing that.  A size that works well for one index
> use case may not work well for another, even if the index size in both
> cases is identical.  Determining the ideal shard size requires
> experimentation.
>
>
> https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
> >     - The optimisation takes lot of time and IOPs to complete. Will
> >     increasing the number of shards help in reducing the optimisation
> time and
> >     IOPs?
>
> No, changing the number of shards will not help with the time required
> to optimize, and might make it slower.  Increasing the speed of the
> disks won't help either.  Optimizing involves a lot more than just
> copying data -- it will never use all the available disk bandwidth of
> modern disks.  SolrCloud does optimizes of the shard replicas making up
> a full collection sequentially, not simultaneously.
>
> >     - We are planning to reduce each shard index size to 30GB and the
> entire
> >     3.5 TB index will be distributed across more shards. In this case to
> almost
> >     70+ shards. Will this help?
>
> Maybe.  Maybe not.  You'll have to try it.  If you increase the number
> of shards without adding additional servers, I would expect things to
> get worse, not better.
>
> > Kindly share your thoughts on how best we can use Solr with such a large
> > index size.
>
> Something to keep in mind -- memory is the resource that makes the most
> difference in performance.  Buying enough memory to get decent
> performance out of an index that big would probably be very expensive.
> You should probably explore ways to make your index smaller.  Another
> idea is to split things up so the most frequently accessed search data
> is in a relatively small index and lives on beefy servers, and data used
> for less frequent or data-mining queries (where performance doesn't
> matter as much) can live on less expensive servers.
>
> Thanks,
> Shawn
>

Reply via email to