[jira] [Issue Comment Edited] (CASSANDRA-3620) Proposal for distributed deletes - fully automatic Reaper Model rather than GCSeconds and manual repairs

2011-12-15 Thread Dominic Williams (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13169761#comment-13169761
 ] 

Dominic Williams edited comment on CASSANDRA-3620 at 12/15/11 2:19 PM:
---

Ok I got it and +1 on that idea. I had actually assumed tombstones were 
compacted away after repair anyway. So abandon GCSeconds and simply kill of 
tombstones created before repair when it runs successfully (presumably on a 
range-by-range basis?)
* Improved performance through reduced tombstone load
* No risk of data corruption if repair not run

That would be a cool first step and improve the current situation. 

I think a reaper system is still needed though, although this feature would 
take some of the existing pressure off. There would still be the issue of 
tombstone build up between repairs, which means performance can vary (or 
actually, degrade) between invocations, the load spikes from repair itself and 
the manual nature of the process.

I guess I'm on the sharp end of this - we have several column families where 
columns represent game objects or messages owned by users where there is a high 
delete and insert load. Various operations need to perform slices of user rows 
and these can get much slower as tombstones build up, so GCSeconds has been 
brought right down, but this leads to the constant pain of omg how long left 
before need to run repair or increase GCSeconds etc.. improving repair as 
described would remove the Sword of Damocles threat of data corruption but we'd 
still need to make sure it was run regularly, performance would degrade between 
invocations and repair would create load spikes. The reaping model can take 
away those problems. 

  was (Author: dccwilliams):
Ok I got it and +1 on that idea. Abandon GCSeconds and simply kill of 
tombstones created before repair when it runs successfully (presumably on a 
range-by-range basis)
* Improved performance through reduced tombstone load
* No risk of data corruption if repair not run

That would be a very cool first step to optimize this

I think a reaper system would still be well worthwhile though, although this 
feature would take some pressure off. There is still the issue of tombstone 
build up between repairs, which means performance can vary (or actually, 
degrade) between invocations plus there are still the load spikes from repair 
itself

I guess I'm on the sharp end of this - we have several column families where 
columns represent game objects or messages owned by users where there is a high 
delete and insert load. Various operations need to perform slices of user rows 
and these can get much slower as tombstones build up, so GCSeconds has been 
brought right down, but this leads to the constant pain of omg how long left 
before need to run repair or increase GCSeconds etc.. improving repair would 
remove the Sword of Damocles thing but we'd still need to run it regularly and 
performance wouldn't be as consistent it could be with constant background 
reaping
  
 Proposal for distributed deletes - fully automatic Reaper Model rather than 
 GCSeconds and manual repairs
 --

 Key: CASSANDRA-3620
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3620
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Dominic Williams
  Labels: GCSeconds,, deletes,, distributed_deletes,, 
 merkle_trees, repair,
   Original Estimate: 504h
  Remaining Estimate: 504h

 Proposal for an improved system for handling distributed deletes, which 
 removes the requirement to regularly run repair processes to maintain 
 performance and data integrity. 
 h2. The Problem
 There are various issues with repair:
 * Repair is expensive to run
 * Repair jobs are often made more expensive than they should be by other 
 issues (nodes dropping requests, hinted handoff not working, downtime etc)
 * Repair processes can often fail and need restarting, for example in cloud 
 environments where network issues make a node disappear from the ring for a 
 brief moment
 * When you fail to run repair within GCSeconds, either by error or because of 
 issues with Cassandra, data written to a node that did not see a later delete 
 can reappear (and a node might miss a delete for several reasons including 
 being down or simply dropping requests during load shedding)
 * If you cannot run repair and have to increase GCSeconds to prevent deleted 
 data reappearing, in some cases the growing tombstone overhead can 
 significantly degrade performance
 Because of the foregoing, in high throughput environments it can be very 
 difficult to make repair a cron job. It can be preferable to keep a terminal 
 

[jira] [Issue Comment Edited] (CASSANDRA-3620) Proposal for distributed deletes - fully automatic Reaper Model rather than GCSeconds and manual repairs

2011-12-15 Thread Dominic Williams (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13169761#comment-13169761
 ] 

Dominic Williams edited comment on CASSANDRA-3620 at 12/15/11 2:22 PM:
---

Ok I got it and +1 on that idea. I had actually assumed tombstones were 
compacted away after repair anyway. So  as I understand GCSeconds would be 
removed, and tombstones would be marked for deletion once a repair operation 
was successfully run. 

That would be a cool first step and improve the current situation. 

But I think a reaper system is still needed: although this feature would take 
some of the current pressure off, there would still be the issue of tombstone 
build up between repairs, which means performance will degrade between 
invocations, the load spikes from repair itself and the manual nature of the 
process.

I guess I'm on the sharp end of this - we have several column families where 
columns represent game objects or messages owned by users where there is a high 
delete and insert load. Various operations need to perform slices of user rows 
and these can get much slower as tombstones build up, so GCSeconds has been 
brought right down, but this leads to the constant pain of omg how long left 
before need to run repair or increase GCSeconds etc.. improving repair as 
described would remove the Sword of Damocles threat of data corruption but we'd 
still need to make sure it was run regularly, performance would degrade between 
invocations and repair would create load spikes. The reaping model can take 
away those problems. 

  was (Author: dccwilliams):
Ok I got it and +1 on that idea. I had actually assumed tombstones were 
compacted away after repair anyway. So abandon GCSeconds and simply kill of 
tombstones created before repair when it runs successfully (presumably on a 
range-by-range basis?)
* Improved performance through reduced tombstone load
* No risk of data corruption if repair not run

That would be a cool first step and improve the current situation. 

I think a reaper system is still needed though, although this feature would 
take some of the existing pressure off. There would still be the issue of 
tombstone build up between repairs, which means performance can vary (or 
actually, degrade) between invocations, the load spikes from repair itself and 
the manual nature of the process.

I guess I'm on the sharp end of this - we have several column families where 
columns represent game objects or messages owned by users where there is a high 
delete and insert load. Various operations need to perform slices of user rows 
and these can get much slower as tombstones build up, so GCSeconds has been 
brought right down, but this leads to the constant pain of omg how long left 
before need to run repair or increase GCSeconds etc.. improving repair as 
described would remove the Sword of Damocles threat of data corruption but we'd 
still need to make sure it was run regularly, performance would degrade between 
invocations and repair would create load spikes. The reaping model can take 
away those problems. 
  
 Proposal for distributed deletes - fully automatic Reaper Model rather than 
 GCSeconds and manual repairs
 --

 Key: CASSANDRA-3620
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3620
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Dominic Williams
  Labels: GCSeconds,, deletes,, distributed_deletes,, 
 merkle_trees, repair,
   Original Estimate: 504h
  Remaining Estimate: 504h

 Proposal for an improved system for handling distributed deletes, which 
 removes the requirement to regularly run repair processes to maintain 
 performance and data integrity. 
 h2. The Problem
 There are various issues with repair:
 * Repair is expensive to run
 * Repair jobs are often made more expensive than they should be by other 
 issues (nodes dropping requests, hinted handoff not working, downtime etc)
 * Repair processes can often fail and need restarting, for example in cloud 
 environments where network issues make a node disappear from the ring for a 
 brief moment
 * When you fail to run repair within GCSeconds, either by error or because of 
 issues with Cassandra, data written to a node that did not see a later delete 
 can reappear (and a node might miss a delete for several reasons including 
 being down or simply dropping requests during load shedding)
 * If you cannot run repair and have to increase GCSeconds to prevent deleted 
 data reappearing, in some cases the growing tombstone overhead can 
 significantly degrade performance
 Because of the foregoing, in high throughput environments it can be