Re: New node has high network and disk usage.

2016-01-17 Thread Kai Wang
James, Thanks for sharing. Anyway, good to know there's one more thing to add to the checklist. On Sun, Jan 17, 2016 at 12:23 PM, James Griffin < james.grif...@idioplatform.com> wrote: > Hi all, > > Just to let you know, we finally figured this out on Friday. It turns out > the new nodes had an

Re: New node has high network and disk usage.

2016-01-17 Thread James Griffin
Hi all, Just to let you know, we finally figured this out on Friday. It turns out the new nodes had an older version of the kernel installed. Upgrading the kernel solved our issues. For reference, the "bad" kernel was 3.2.0-75-virtual, upgrading to 3.2.0-86-virtual resolved the issue. We still

Re: New node has high network and disk usage.

2016-01-14 Thread James Griffin
A summary of what we've done this morning: - Noted that there are no GCInspector lines in system.log on bad node (there are GCInspector logs on other healthy nodes) - Turned on GC logging, noted that we had logs which stated out total time for which application threads were stopped

Re: New node has high network and disk usage.

2016-01-14 Thread Kai Wang
James, Can you post the result of "nodetool netstats" on the bad node? On Thu, Jan 14, 2016 at 9:09 AM, James Griffin < james.grif...@idioplatform.com> wrote: > A summary of what we've done this morning: > >- Noted that there are no GCInspector lines in system.log on bad node >(there

Re: New node has high network and disk usage.

2016-01-14 Thread Kai Wang
James, I may miss something. You mentioned your cluster had RF=3. Then why does "nodetool status" show each node owns 1/3 of the data especially after a full repair? On Thu, Jan 14, 2016 at 9:56 AM, James Griffin < james.grif...@idioplatform.com> wrote: > Hi Kai, > > Below - nothing going on

Re: New node has high network and disk usage.

2016-01-14 Thread James Griffin
Hi Kai, Below - nothing going on that I can see $ nodetool netstats Mode: NORMAL Not sending any streams. Read Repair Statistics: Attempted: 0 Mismatch (Blocking): 0 Mismatch (Background): 0 Pool NameActive Pending Completed Commandsn/a

Re: New node has high network and disk usage.

2016-01-14 Thread James Griffin
Hi Kai, Well observed - running `nodetool status` without specifying keyspace does report ~33% on each node. We have two keyspaces on this cluster - if I specify either of them the ownership reported by each node is 100%, so I believe the repair completed successfully. Best wishes, Griff

Re: New node has high network and disk usage.

2016-01-13 Thread Anuj Wadehra
Hi, Revisiting the thread I can see that nodetool status had both good and bad nodes at same time. How do you replace nodes? When you say bad node..I understand that the node is no more usable even though Cassandra is UP? Is that correct? If a node is in bad shape and not working, adding new

Re: New node has high network and disk usage.

2016-01-13 Thread James Griffin
Hi Anuj, Below is the output of nodetool status. The nodes were replaced following the instructions in Datastax documentation for replacing running nodes since the nodes were running fine, it was that the servers had been incorrectly initialised and they thus had less disk space. The status below

Re: New node has high network and disk usage.

2016-01-13 Thread Anuj Wadehra
Node 2 has slightly higher data but that should be ok. Not sure how read ops are so high when no IO intensive activity such as repair and compaction is running on node 3.May be you can try investigating logs to see whats happening. Others on the mailing list could also share their views on the

Re: New node has high network and disk usage.

2016-01-13 Thread James Griffin
I think I was incorrect in assuming GC wasn't an issue due to the lack of logs. Comparing jstat output on nodes 2 & 3 show some fairly marked differences, though comparing the startup flags on the two machines show the GC config is identical.: $ jstat -gcutil S0 S1 E O P

Re: New node has high network and disk usage.

2016-01-13 Thread Anuj Wadehra
Ok. I saw dropped mutations on your cluster and full gc is a common cause for that.Can you just search the word GCInspector in system.log and share the frequency of minor and full gc. Moreover, are you printing promotion failures in gc logs?? Why full gc ia getting triggered??promotion failures

Re: New node has high network and disk usage.

2016-01-13 Thread James Griffin
Hi all, We’ve spent a few days running things but are in the same position. To add some more flavour: - We have a 3-node ring, replication factor = 3. We’ve been running in this configuration for a few years without any real issues - Nodes 2 & 3 are much newer than node 1. These two

New node has high network and disk usage.

2016-01-06 Thread Vickrum Loi
Hi, We recently added a new node to our cluster in order to replace a node that died (hardware failure we believe). For the next two weeks it had high disk and network activity. We replaced the server, but it's happened again. We've looked into memory allowances, disk performance, number of

Re: New node has high network and disk usage.

2016-01-06 Thread Vickrum Loi
I should probably have mentioned that we're on Cassandra 2.0.10. On 6 January 2016 at 15:26, Vickrum Loi wrote: > Hi, > > We recently added a new node to our cluster in order to replace a node > that died (hardware failure we believe). For the next two weeks it had

Re: New node has high network and disk usage.

2016-01-06 Thread Jeff Ferland
What’s your output of `nodetool compactionstats`? > On Jan 6, 2016, at 7:26 AM, Vickrum Loi wrote: > > Hi, > > We recently added a new node to our cluster in order to replace a node that > died (hardware failure we believe). For the next two weeks it had high

Re: New node has high network and disk usage.

2016-01-06 Thread Anuj Wadehra
Hi Vickrum, I would have proceeded with diagnosis as follows: 1. Analysis of sar report to check system health -cpu memory swap disk etc.  System seems to be overloaded. This is evident from mutation drops. 2. Make sure that  all recommended Cassandra production settings available at Datastax

Re: New node has high network and disk usage.

2016-01-06 Thread Vickrum Loi
# nodetool compactionstats pending tasks: 22 compaction typekeyspace table completed total unit progress Compactionproduction_analyticsinteractions 240410213161172668724 bytes 0.15%