Hi Sumit, 1. I have a Cassandra cluster with 11 nodes, 5 of which have Cassandra > version 3.0.3 and then newer 5 nodes have 3.6.0 version.
I strongly recommend to: - Stick with one version of Apache Cassandra per cluster. - Always be as close as possible from the last minor release of the Cassandra version in use. So you *really should* not be using 3.0.6 *AND* 3.6.0 but rather 3.0.10 *OR* 3.7 (currently). Note that Cassandra 3.X (with X > 0) uses a tic toc release cycle where odd are bug fixes only and even numbers introduce new features as well. Running multiple version for a long period can induces errors, Cassandra is built to handle multiple versions only to give the time to operators to run a rolling restart. No streaming (adding / removing / repairing nodes) should happen during this period. Also, I have seen in the past some cases where changing the schema was also an issue with multiple versions leading to schema disagreements. Due to this scenario, a couple boxes are running very high on memory (95% > usage) whereas some of the older version nodes have just 60-70% memory > usage. Hard to say if this is related to the mutiple versions of Cassandra but it could. Are you sure nodes are using the same JVM / GC options (cassandra-env.sh) and Java version? Also, what is exactly "high on memory 95%"? Are we talking about heap or Native memory. Isn't the memory used as page cache (that would still be available for the system)? 2. To counter #1, I am planning to upgrade system configuration of the > nodes where there is higher memory usage. But the question is, will it be a > problem if we have a Cassandra cluster, where in a couple of nodes have > double the system configuration than other nodes in the cluster. > It is not a problem per se to have distinct configurations on distinct nodes. Cassandra does it very well, and it is frequently used to test some configuration change on a canary node, to prevent it from impacting the whole service. Yet, all the nodes should be doing the same work (unless you have some heterogenous hardware and are using distinct number of vnodes on each node). Keeping things homogenous allows the operator to easily compare how nodes are doing and it makes reasoning about Cassandra, as well as troubleshooting issues a way easier. So I would: - Fully upgrade / downgrade asap to a chosen version (3.X is known as being not yet stable, but going back to 3.0.X might be more painful) - Make sure nodes are well balanced and using the same number of ranges 'nodetool status <anyuserkeyspace>' - Make sure the node are using the same Java version and JVM settings. Hope that helps, C*heers, ----------------------- Alain Rodriguez - @arodream - al...@thelastpickle.com France The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com 2016-12-21 8:22 GMT+01:00 Sumit Anvekar <sumit.anve...@gmail.com>: > I have a couple questions. > > 1. I have a Cassandra cluster with 11 nodes, 5 of which have Cassandra > version 3.0.3 and then newer 5 nodes have 3.6.0 version. I has been running > fine until recently I am seeing higher amount of data residing in newer > boxes. The configuration file (YAML file) is exactly same on all nodes > (except for the node host names). Wondering if the version has something to > do with this scenario. Due to this scenario, a couple boxes are running > very high on memory (95% usage) whereas some of the older version nodes > have just 60-70% memory usage. > > 2. To counter #1, I am planning to upgrade system configuration of the > nodes where there is higher memory usage. But the question is, will it be a > problem if we have a Cassandra cluster, where in a couple of nodes have > double the system configuration than other nodes in the cluster. > > Appreciate any comment on the same. > > Sumit. >