Have you set -Xmx32g ? In this case you may get significantly less
available memory because of switch to 64-bit references.  See
http://java-performance.info/over-32g-heap-java/ for details, and set
slightly less than 32Gb

Reid Pinchback  at "Sun, 5 Apr 2020 00:50:43 +0000" wrote:
 RP> Surbi:

 RP> If you aren’t seeing connection activity in DC2, I’d check to see if the 
operations hitting DC1 are quorum ops instead of local quorum.  That
 RP> still wouldn’t explain DC2 nodes going down, but would at least explain 
them doing more work than might be on your radar right now.

 RP> The hint replay being slow to me sounds like you could be fighting GC.

 RP> You mentioned bumping the DC2 nodes to 32gb.  You might have already been 
doing this, but if not, be sure to be under 32gb, like 31gb. 
 RP> Otherwise you’re using larger object pointers and could actually have less 
effective ability to allocate memory.

 RP> As the problem is only happening in DC2, then there has to be a thing that 
is true in DC2 that isn’t true in DC1.  A difference in hardware, a
 RP> difference in O/S version, a difference in networking config or physical 
infrastructure, a difference in client-triggered activity, or a
 RP> difference in how repairs are handled. Somewhere, there is a difference.  
I’d start with focusing on that.

 RP> From: Erick Ramirez <erick.rami...@datastax.com>
 RP> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
 RP> Date: Saturday, April 4, 2020 at 8:28 PM
 RP> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
 RP> Subject: Re: OOM only on one datacenter nodes

 RP> Message from External Sender

 RP> With a lack of heapdump for you to analyse, my hypothesis is that your DC2 
nodes are taking on traffic (from some client somewhere) but you're
 RP> just not aware of it. The hints replay is just a side-effect of the nodes 
getting overloaded.

 RP> To rule out my hypothesis in the first instance, my recommendation is to 
monitor the incoming connections to the nodes in DC2. If you don't
 RP> have monitoring in place, you could simply run netstat at regular 
intervals and go from there. Cheers!

 RP> GOT QUESTIONS? Apache Cassandra experts from the community and DataStax 
have answers! Share your expertise on https://community.datastax.com/.



-- 
With best wishes,                    Alex Ott
Principal Architect, DataStax
http://datastax.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Reply via email to