We are using JRE and not JDK , hence not able to take heap dump . On Sun, 5 Apr 2020 at 19:21, Jeff Jirsa <jji...@gmail.com> wrote:
> > Set the jvm flags to heap dump on oom > > Open up the result in a heap inspector of your preference (like yourkit or > similar) > > Find a view that counts objects by total retained size. Take a screenshot. > Send that. > > > > On Apr 5, 2020, at 6:51 PM, Surbhi Gupta <surbhi.gupt...@gmail.com> wrote: > > > I just checked, we have setup the Heapsize to be 31GB not 32GB in DC2. > > I checked the CPU and RAM both are same on all the nodes in DC1 and DC2. > What specific parameter I should check on OS ? > We are using CentOS release 6.10. > > Currently disk_access_modeis not set hence it is auto in our env. Should > setting disk_access_mode to mmap_index_only will help ? > > Thanks > Surbhi > > On Sun, 5 Apr 2020 at 01:31, Alex Ott <alex...@gmail.com> wrote: > >> Have you set -Xmx32g ? In this case you may get significantly less >> available memory because of switch to 64-bit references. See >> http://java-performance.info/over-32g-heap-java/ for details, and set >> slightly less than 32Gb >> >> Reid Pinchback at "Sun, 5 Apr 2020 00:50:43 +0000" wrote: >> RP> Surbi: >> >> RP> If you aren’t seeing connection activity in DC2, I’d check to see if >> the operations hitting DC1 are quorum ops instead of local quorum. That >> RP> still wouldn’t explain DC2 nodes going down, but would at least >> explain them doing more work than might be on your radar right now. >> >> RP> The hint replay being slow to me sounds like you could be fighting >> GC. >> >> RP> You mentioned bumping the DC2 nodes to 32gb. You might have already >> been doing this, but if not, be sure to be under 32gb, like 31gb. >> RP> Otherwise you’re using larger object pointers and could actually >> have less effective ability to allocate memory. >> >> RP> As the problem is only happening in DC2, then there has to be a >> thing that is true in DC2 that isn’t true in DC1. A difference in >> hardware, a >> RP> difference in O/S version, a difference in networking config or >> physical infrastructure, a difference in client-triggered activity, or a >> RP> difference in how repairs are handled. Somewhere, there is a >> difference. I’d start with focusing on that. >> >> RP> From: Erick Ramirez <erick.rami...@datastax.com> >> RP> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> >> RP> Date: Saturday, April 4, 2020 at 8:28 PM >> RP> To: "user@cassandra.apache.org" <user@cassandra.apache.org> >> RP> Subject: Re: OOM only on one datacenter nodes >> >> RP> Message from External Sender >> >> RP> With a lack of heapdump for you to analyse, my hypothesis is that >> your DC2 nodes are taking on traffic (from some client somewhere) but you're >> RP> just not aware of it. The hints replay is just a side-effect of the >> nodes getting overloaded. >> >> RP> To rule out my hypothesis in the first instance, my recommendation >> is to monitor the incoming connections to the nodes in DC2. If you don't >> RP> have monitoring in place, you could simply run netstat at regular >> intervals and go from there. Cheers! >> >> RP> GOT QUESTIONS? Apache Cassandra experts from the community and >> DataStax have answers! Share your expertise on >> https://community.datastax.com/. >> >> >> >> -- >> With best wishes, Alex Ott >> Principal Architect, DataStax >> http://datastax.com/ >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: user-h...@cassandra.apache.org >> >>