[jira] [Updated] (CASSANDRA-7567) when the commit_log disk for a single node is overwhelmed the entire cluster slows down

2014-07-31 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-7567:
--

Fix Version/s: 2.1.0

 when the commit_log disk for a single node is overwhelmed the entire cluster 
 slows down
 ---

 Key: CASSANDRA-7567
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7567
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: debian 7.5, bare metal, 14 nodes, 64CPUs, 64GB RAM, 
 commit_log disk sata, data disk SSD, vnodes, leveled compaction strategy
Reporter: David O'Dell
Assignee: Brandon Williams
 Fix For: 2.1.0

 Attachments: 7567.logs.bz2, write_request_latency.png


 We've run into a situation where a single node out of 14 is experiencing high 
 disk io. This can happen when a node is being decommissioned or after it 
 joins the ring and runs into the bug cassandra-6621.
 When this occurs the write latency for the entire cluster spikes.
 From 0.3ms to 170ms.
 To simulate this simply run dd on the commit_log disk (dd if=/dev/zero 
 of=/tmp/foo bs=1024) and you will see that instantly all nodes in the cluster 
 have slowed down.
 BTW overwhelming the data disk does not have this same effect.
 Also I've tried this where the overwhelmed node isn't being connected 
 directly from the client and it still has the same effect.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7567) when the commit_log disk for a single node is overwhelmed the entire cluster slows down

2014-07-29 Thread Brandon Williams (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-7567:


Since Version:   (was: 2.0.8)

 when the commit_log disk for a single node is overwhelmed the entire cluster 
 slows down
 ---

 Key: CASSANDRA-7567
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7567
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: debian 7.5, bare metal, 14 nodes, 64CPUs, 64GB RAM, 
 commit_log disk sata, data disk SSD, vnodes, leveled compaction strategy
Reporter: David O'Dell
Assignee: Brandon Williams
 Attachments: 7567.logs.bz2, write_request_latency.png


 We've run into a situation where a single node out of 14 is experiencing high 
 disk io. This can happen when a node is being decommissioned or after it 
 joins the ring and runs into the bug cassandra-6621.
 When this occurs the write latency for the entire cluster spikes.
 From 0.3ms to 170ms.
 To simulate this simply run dd on the commit_log disk (dd if=/dev/zero 
 of=/tmp/foo bs=1024) and you will see that instantly all nodes in the cluster 
 have slowed down.
 BTW overwhelming the data disk does not have this same effect.
 Also I've tried this where the overwhelmed node isn't being connected 
 directly from the client and it still has the same effect.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7567) when the commit_log disk for a single node is overwhelmed the entire cluster slows down

2014-07-17 Thread Ryan McGuire (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McGuire updated CASSANDRA-7567:


Tester: Ryan McGuire

 when the commit_log disk for a single node is overwhelmed the entire cluster 
 slows down
 ---

 Key: CASSANDRA-7567
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7567
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: debian 7.5, bare metal, 14 nodes, 64CPUs, 64GB RAM, 
 commit_log disk sata, data disk SSD, vnodes, leveled compaction strategy
Reporter: David O'Dell
 Attachments: write_request_latency.png


 We've run into a situation where a single node out of 14 is experiencing high 
 disk io. This can happen when a node is being decommissioned or after it 
 joins the ring and runs into the bug cassandra-6621.
 When this occurs the write latency for the entire cluster spikes.
 From 0.3ms to 170ms.
 To simulate this simply run dd on the commit_log disk (dd if=/dev/zero 
 of=/tmp/foo bs=1024) and you will see that instantly all nodes in the cluster 
 have slowed down.
 BTW overwhelming the data disk does not have this same effect.
 Also I've tried this where the overwhelmed node isn't being connected 
 directly from the client and it still has the same effect.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7567) when the commit_log disk for a single node is overwhelmed the entire cluster slows down

2014-07-17 Thread Ryan McGuire (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McGuire updated CASSANDRA-7567:


Attachment: 7567.logs.bz2

 when the commit_log disk for a single node is overwhelmed the entire cluster 
 slows down
 ---

 Key: CASSANDRA-7567
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7567
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: debian 7.5, bare metal, 14 nodes, 64CPUs, 64GB RAM, 
 commit_log disk sata, data disk SSD, vnodes, leveled compaction strategy
Reporter: David O'Dell
 Attachments: 7567.logs.bz2, write_request_latency.png


 We've run into a situation where a single node out of 14 is experiencing high 
 disk io. This can happen when a node is being decommissioned or after it 
 joins the ring and runs into the bug cassandra-6621.
 When this occurs the write latency for the entire cluster spikes.
 From 0.3ms to 170ms.
 To simulate this simply run dd on the commit_log disk (dd if=/dev/zero 
 of=/tmp/foo bs=1024) and you will see that instantly all nodes in the cluster 
 have slowed down.
 BTW overwhelming the data disk does not have this same effect.
 Also I've tried this where the overwhelmed node isn't being connected 
 directly from the client and it still has the same effect.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7567) when the commit_log disk for a single node is overwhelmed the entire cluster slows down

2014-07-17 Thread Ryan McGuire (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McGuire updated CASSANDRA-7567:


Reproduced In: 2.1 rc3

 when the commit_log disk for a single node is overwhelmed the entire cluster 
 slows down
 ---

 Key: CASSANDRA-7567
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7567
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: debian 7.5, bare metal, 14 nodes, 64CPUs, 64GB RAM, 
 commit_log disk sata, data disk SSD, vnodes, leveled compaction strategy
Reporter: David O'Dell
 Attachments: 7567.logs.bz2, write_request_latency.png


 We've run into a situation where a single node out of 14 is experiencing high 
 disk io. This can happen when a node is being decommissioned or after it 
 joins the ring and runs into the bug cassandra-6621.
 When this occurs the write latency for the entire cluster spikes.
 From 0.3ms to 170ms.
 To simulate this simply run dd on the commit_log disk (dd if=/dev/zero 
 of=/tmp/foo bs=1024) and you will see that instantly all nodes in the cluster 
 have slowed down.
 BTW overwhelming the data disk does not have this same effect.
 Also I've tried this where the overwhelmed node isn't being connected 
 directly from the client and it still has the same effect.



--
This message was sent by Atlassian JIRA
(v6.2#6252)