On 10/3/2018 8:01 AM, yasoobhaider wrote:
Master and slave config:
ram: 120GB
cores: 16
At any point there are between 10-20 slaves in the cluster, each serving ~2k
requests per minute. Each slave houses two collections of approx 10G
(~2.5mil docs) and 2G(10mil docs) when optimized.
I am working with Solr 6.2.1
Solr configuration:
<snip>
-Xmn10G
-Xms80G
-Xmx80G
I cannot imagine that an 80GB heap is needed when there are only 12.5
million documents and 12GB of index data. I've handled MUCH larger
indexes with only 8GB of heap. Even with your very high query rate, if
you really do need 80GB of heap, there's something unusual going on.
I would really be grateful for any advice on the following:
1. What could be the reason behind CMS not being able to free up the memory?
What are some experiments I can run to solve this problem?
Maybe there's no garbage in the heap to free up? If the GC never
finishes, that sounds like a possible problem with either Java or the
operating system, maybe even some kind of hardware issue.
2. Can stopping/starting indexing be a reason for such drastic changes to GC
pattern?
Indexing generally requires more heap than just handling queries.
3. I have read at multiple places on this mailing list that the heap size
should be much lower (2x-3x the size of collection), but the last time I
tried CMS was not able to run smoothly and GC STW would occur which was only
solved by a restart. My reasoning for this is that the type of queries and
the throughput are also a factor in deciding the heap size, so it may be
that our queries are creating too many objects maybe. Is my reasoning
correct or should I try with a lower heap size (if it helps achieve a stable
gc pattern)?
Do you have a GC log covering a good long runtime, where the problems
happened during the time the log covers? Can you share it? Attachments
rarely make it to the list, you'll need to find a file sharing site.
The small excerpt from the GC log that you included in your message
isn't enough to make any kind of determination. Full disclosure: I'm
going to send your log to http://gceasy.io for analysis. You can do
this yourself, their analysis is really good.
There is no generic advice possible regarding how large a heap you
need. It will depend on many factors.
(4. Silly question, but what is the right way to ask question on the mailing
list? via mail or via the nabble website? I sent this question earlier as a
mail, but it was not showing up on the nabble website so I am posting it
from the website now)
Nabble mirrors the mailing list in forum format. It's generally better
to use the mailing list directly. The project has absolutely no
influence over the Nabble website, and things do not always work
correctly when Nabble is involved. The IRC channel is another good way
to get support. If there is somebody paying attention when you ask your
question, a far more interactive chat can be obtained.
Thanks,
Shawn