Re: Replication-aware compaction
Thanks! I'm actually on vacation now, so I hope to look into this next week. On Mon, Jun 6, 2011 at 10:25 PM, aaron morton aa...@thelastpickle.com wrote: You should consider upgrading to 0.7.6 to get a fix to Gossip. Earlier 0.7 releases were prone to marking nodes up and down when they should not have been. See https://github.com/apache/cassandra/blob/cassandra-0.7/CHANGES.txt#L22 Are the TimedOutExceptions to the client for read or write requests ? During the burst times which stages are backing up nodetool tpstats ? Compaction should not affect writes too much (assuming different log and data spindles). You could also take a look at the read and write latency stats for a particular CF using nodetool cfstats or JConsole. These will give you the stats for the local operations. You could also take a look at the iostats on the box http://spyced.blogspot.com/2010/01/linux-performance-basics.html Hope that helps. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 7 Jun 2011, at 00:30, David Boxenhorn wrote: Version 0.7.3. Yes, I am talking about minor compactions. I have three nodes, RF=3. 3G data (before replication). Not many users (yet). It seems like 3 nodes should be plenty. But when all 3 nodes are compacting, I sometimes get timeouts on the client, and I see in my logs that each one is full of notifications that the other nodes have died (and come back to life after about a second). My cluster can tolerate one node being out of commission, so I would rather have longer compactions one at a time than shorter compactions all at the same time. I think that our usage pattern of bursty writes causes the three nodes to decide to compact at the same time. These bursts are followed by periods of relative quiet, so there should be time for the other two nodes to compact one at a time. On Mon, Jun 6, 2011 at 3:27 PM, David Boxenhorn da...@citypath.com wrote: Version 0.7.3. Yes, I am talking about minor compactions. I have three nodes, RF=3. 3G data (before replication). Not many users (yet). It seems like 3 nodes should be plenty. But when all 3 nodes are compacting, I sometimes get timeouts on the client, and I see in my logs that each one is full of notifications that the other nodes have died (and come back to life after about a second). My cluster can tolerate one node being out of commission, so I would rather have longer compactions one at a time than shorter compactions all at the same time. I think that our usage pattern of bursty writes causes the three nodes to decide to compact at the same time. These bursts are followed by periods of relative quiet, so there should be time for the other two nodes to compact one at a time. On Mon, Jun 6, 2011 at 2:36 PM, aaron morton aa...@thelastpickle.com wrote: Are you talking about minor (automatic) compactions ? Can you provide some more information on what's happening to make the node unusable and what version you are using? It's not lightweight process, but it should not hurt the node that badly. It is considered an online operation. Delaying compaction will only make it run for longer and take more resources. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 6 Jun 2011, at 20:14, David Boxenhorn wrote: Is there some deep architectural reason why compaction can't be replication-aware? What I mean is, if one node is doing compaction, its replicas shouldn't be doing compaction at the same time. Or, at least a quorum of nodes should be available at all times. For example, if RF=3, and one node is doing compaction, the nodes to its right and left in the ring should wait on compaction until that node is done. Of course, my real problem is that compaction makes a node pretty much unavailable. If we can fix that problem then this is not necessary.
Replication-aware compaction
Is there some deep architectural reason why compaction can't be replication-aware? What I mean is, if one node is doing compaction, its replicas shouldn't be doing compaction at the same time. Or, at least a quorum of nodes should be available at all times. For example, if RF=3, and one node is doing compaction, the nodes to its right and left in the ring should wait on compaction until that node is done. Of course, my real problem is that compaction makes a node pretty much unavailable. If we can fix that problem then this is not necessary.
Re: Replication-aware compaction
Version 0.7.3. Yes, I am talking about minor compactions. I have three nodes, RF=3. 3G data (before replication). Not many users (yet). It seems like 3 nodes should be plenty. But when all 3 nodes are compacting, I sometimes get timeouts on the client, and I see in my logs that each one is full of notifications that the other nodes have died (and come back to life after about a second). My cluster can tolerate one node being out of commission, so I would rather have longer compactions one at a time than shorter compactions all at the same time. I think that our usage pattern of bursty writes causes the three nodes to decide to compact at the same time. These bursts are followed by periods of relative quiet, so there should be time for the other two nodes to compact one at a time. On Mon, Jun 6, 2011 at 3:27 PM, David Boxenhorn da...@citypath.com wrote: Version 0.7.3. Yes, I am talking about minor compactions. I have three nodes, RF=3. 3G data (before replication). Not many users (yet). It seems like 3 nodes should be plenty. But when all 3 nodes are compacting, I sometimes get timeouts on the client, and I see in my logs that each one is full of notifications that the other nodes have died (and come back to life after about a second). My cluster can tolerate one node being out of commission, so I would rather have longer compactions one at a time than shorter compactions all at the same time. I think that our usage pattern of bursty writes causes the three nodes to decide to compact at the same time. These bursts are followed by periods of relative quiet, so there should be time for the other two nodes to compact one at a time. On Mon, Jun 6, 2011 at 2:36 PM, aaron morton aa...@thelastpickle.com wrote: Are you talking about minor (automatic) compactions ? Can you provide some more information on what's happening to make the node unusable and what version you are using? It's not lightweight process, but it should not hurt the node that badly. It is considered an online operation. Delaying compaction will only make it run for longer and take more resources. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 6 Jun 2011, at 20:14, David Boxenhorn wrote: Is there some deep architectural reason why compaction can't be replication-aware? What I mean is, if one node is doing compaction, its replicas shouldn't be doing compaction at the same time. Or, at least a quorum of nodes should be available at all times. For example, if RF=3, and one node is doing compaction, the nodes to its right and left in the ring should wait on compaction until that node is done. Of course, my real problem is that compaction makes a node pretty much unavailable. If we can fix that problem then this is not necessary.
Re: Replication-aware compaction
You should consider upgrading to 0.7.6 to get a fix to Gossip. Earlier 0.7 releases were prone to marking nodes up and down when they should not have been. See https://github.com/apache/cassandra/blob/cassandra-0.7/CHANGES.txt#L22 Are the TimedOutExceptions to the client for read or write requests ? During the burst times which stages are backing up nodetool tpstats ? Compaction should not affect writes too much (assuming different log and data spindles). You could also take a look at the read and write latency stats for a particular CF using nodetool cfstats or JConsole. These will give you the stats for the local operations. You could also take a look at the iostats on the box http://spyced.blogspot.com/2010/01/linux-performance-basics.html Hope that helps. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 7 Jun 2011, at 00:30, David Boxenhorn wrote: Version 0.7.3. Yes, I am talking about minor compactions. I have three nodes, RF=3. 3G data (before replication). Not many users (yet). It seems like 3 nodes should be plenty. But when all 3 nodes are compacting, I sometimes get timeouts on the client, and I see in my logs that each one is full of notifications that the other nodes have died (and come back to life after about a second). My cluster can tolerate one node being out of commission, so I would rather have longer compactions one at a time than shorter compactions all at the same time. I think that our usage pattern of bursty writes causes the three nodes to decide to compact at the same time. These bursts are followed by periods of relative quiet, so there should be time for the other two nodes to compact one at a time. On Mon, Jun 6, 2011 at 3:27 PM, David Boxenhorn da...@citypath.com wrote: Version 0.7.3. Yes, I am talking about minor compactions. I have three nodes, RF=3. 3G data (before replication). Not many users (yet). It seems like 3 nodes should be plenty. But when all 3 nodes are compacting, I sometimes get timeouts on the client, and I see in my logs that each one is full of notifications that the other nodes have died (and come back to life after about a second). My cluster can tolerate one node being out of commission, so I would rather have longer compactions one at a time than shorter compactions all at the same time. I think that our usage pattern of bursty writes causes the three nodes to decide to compact at the same time. These bursts are followed by periods of relative quiet, so there should be time for the other two nodes to compact one at a time. On Mon, Jun 6, 2011 at 2:36 PM, aaron morton aa...@thelastpickle.com wrote: Are you talking about minor (automatic) compactions ? Can you provide some more information on what's happening to make the node unusable and what version you are using? It's not lightweight process, but it should not hurt the node that badly. It is considered an online operation. Delaying compaction will only make it run for longer and take more resources. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 6 Jun 2011, at 20:14, David Boxenhorn wrote: Is there some deep architectural reason why compaction can't be replication-aware? What I mean is, if one node is doing compaction, its replicas shouldn't be doing compaction at the same time. Or, at least a quorum of nodes should be available at all times. For example, if RF=3, and one node is doing compaction, the nodes to its right and left in the ring should wait on compaction until that node is done. Of course, my real problem is that compaction makes a node pretty much unavailable. If we can fix that problem then this is not necessary.