[jira] [Issue Comment Edited] (CASSANDRA-3620) Proposal for distributed deletes - fully automatic Reaper Model rather than GCSeconds and manual repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13169761#comment-13169761 ] Dominic Williams edited comment on CASSANDRA-3620 at 12/15/11 2:19 PM: --- Ok I got it and +1 on that idea. I had actually assumed tombstones were compacted away after repair anyway. So abandon GCSeconds and simply kill of tombstones created before repair when it runs successfully (presumably on a range-by-range basis?) * Improved performance through reduced tombstone load * No risk of data corruption if repair not run That would be a cool first step and improve the current situation. I think a reaper system is still needed though, although this feature would take some of the existing pressure off. There would still be the issue of tombstone build up between repairs, which means performance can vary (or actually, degrade) between invocations, the load spikes from repair itself and the manual nature of the process. I guess I'm on the sharp end of this - we have several column families where columns represent game objects or messages owned by users where there is a high delete and insert load. Various operations need to perform slices of user rows and these can get much slower as tombstones build up, so GCSeconds has been brought right down, but this leads to the constant pain of omg how long left before need to run repair or increase GCSeconds etc.. improving repair as described would remove the Sword of Damocles threat of data corruption but we'd still need to make sure it was run regularly, performance would degrade between invocations and repair would create load spikes. The reaping model can take away those problems. was (Author: dccwilliams): Ok I got it and +1 on that idea. Abandon GCSeconds and simply kill of tombstones created before repair when it runs successfully (presumably on a range-by-range basis) * Improved performance through reduced tombstone load * No risk of data corruption if repair not run That would be a very cool first step to optimize this I think a reaper system would still be well worthwhile though, although this feature would take some pressure off. There is still the issue of tombstone build up between repairs, which means performance can vary (or actually, degrade) between invocations plus there are still the load spikes from repair itself I guess I'm on the sharp end of this - we have several column families where columns represent game objects or messages owned by users where there is a high delete and insert load. Various operations need to perform slices of user rows and these can get much slower as tombstones build up, so GCSeconds has been brought right down, but this leads to the constant pain of omg how long left before need to run repair or increase GCSeconds etc.. improving repair would remove the Sword of Damocles thing but we'd still need to run it regularly and performance wouldn't be as consistent it could be with constant background reaping Proposal for distributed deletes - fully automatic Reaper Model rather than GCSeconds and manual repairs -- Key: CASSANDRA-3620 URL: https://issues.apache.org/jira/browse/CASSANDRA-3620 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Dominic Williams Labels: GCSeconds,, deletes,, distributed_deletes,, merkle_trees, repair, Original Estimate: 504h Remaining Estimate: 504h Proposal for an improved system for handling distributed deletes, which removes the requirement to regularly run repair processes to maintain performance and data integrity. h2. The Problem There are various issues with repair: * Repair is expensive to run * Repair jobs are often made more expensive than they should be by other issues (nodes dropping requests, hinted handoff not working, downtime etc) * Repair processes can often fail and need restarting, for example in cloud environments where network issues make a node disappear from the ring for a brief moment * When you fail to run repair within GCSeconds, either by error or because of issues with Cassandra, data written to a node that did not see a later delete can reappear (and a node might miss a delete for several reasons including being down or simply dropping requests during load shedding) * If you cannot run repair and have to increase GCSeconds to prevent deleted data reappearing, in some cases the growing tombstone overhead can significantly degrade performance Because of the foregoing, in high throughput environments it can be very difficult to make repair a cron job. It can be preferable to keep a terminal
[jira] [Issue Comment Edited] (CASSANDRA-3620) Proposal for distributed deletes - fully automatic Reaper Model rather than GCSeconds and manual repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13169761#comment-13169761 ] Dominic Williams edited comment on CASSANDRA-3620 at 12/15/11 2:22 PM: --- Ok I got it and +1 on that idea. I had actually assumed tombstones were compacted away after repair anyway. So as I understand GCSeconds would be removed, and tombstones would be marked for deletion once a repair operation was successfully run. That would be a cool first step and improve the current situation. But I think a reaper system is still needed: although this feature would take some of the current pressure off, there would still be the issue of tombstone build up between repairs, which means performance will degrade between invocations, the load spikes from repair itself and the manual nature of the process. I guess I'm on the sharp end of this - we have several column families where columns represent game objects or messages owned by users where there is a high delete and insert load. Various operations need to perform slices of user rows and these can get much slower as tombstones build up, so GCSeconds has been brought right down, but this leads to the constant pain of omg how long left before need to run repair or increase GCSeconds etc.. improving repair as described would remove the Sword of Damocles threat of data corruption but we'd still need to make sure it was run regularly, performance would degrade between invocations and repair would create load spikes. The reaping model can take away those problems. was (Author: dccwilliams): Ok I got it and +1 on that idea. I had actually assumed tombstones were compacted away after repair anyway. So abandon GCSeconds and simply kill of tombstones created before repair when it runs successfully (presumably on a range-by-range basis?) * Improved performance through reduced tombstone load * No risk of data corruption if repair not run That would be a cool first step and improve the current situation. I think a reaper system is still needed though, although this feature would take some of the existing pressure off. There would still be the issue of tombstone build up between repairs, which means performance can vary (or actually, degrade) between invocations, the load spikes from repair itself and the manual nature of the process. I guess I'm on the sharp end of this - we have several column families where columns represent game objects or messages owned by users where there is a high delete and insert load. Various operations need to perform slices of user rows and these can get much slower as tombstones build up, so GCSeconds has been brought right down, but this leads to the constant pain of omg how long left before need to run repair or increase GCSeconds etc.. improving repair as described would remove the Sword of Damocles threat of data corruption but we'd still need to make sure it was run regularly, performance would degrade between invocations and repair would create load spikes. The reaping model can take away those problems. Proposal for distributed deletes - fully automatic Reaper Model rather than GCSeconds and manual repairs -- Key: CASSANDRA-3620 URL: https://issues.apache.org/jira/browse/CASSANDRA-3620 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Dominic Williams Labels: GCSeconds,, deletes,, distributed_deletes,, merkle_trees, repair, Original Estimate: 504h Remaining Estimate: 504h Proposal for an improved system for handling distributed deletes, which removes the requirement to regularly run repair processes to maintain performance and data integrity. h2. The Problem There are various issues with repair: * Repair is expensive to run * Repair jobs are often made more expensive than they should be by other issues (nodes dropping requests, hinted handoff not working, downtime etc) * Repair processes can often fail and need restarting, for example in cloud environments where network issues make a node disappear from the ring for a brief moment * When you fail to run repair within GCSeconds, either by error or because of issues with Cassandra, data written to a node that did not see a later delete can reappear (and a node might miss a delete for several reasons including being down or simply dropping requests during load shedding) * If you cannot run repair and have to increase GCSeconds to prevent deleted data reappearing, in some cases the growing tombstone overhead can significantly degrade performance Because of the foregoing, in high throughput environments it can be