Re: one way to make counter delete work better

2011-06-15 Thread Yang
patch in https://issues.apache.org/jira/browse/CASSANDRA-2774 https://issues.apache.org/jira/browse/CASSANDRA-2774some coding is messy and only intended for demonstration only, we could refine it after we agree this is a feasible way to go. Thanks Yang On Tue, Jun 14, 2011 at 11:21 AM, Sylvain

Re: one way to make counter delete work better

2011-06-14 Thread Sylvain Lebresne
Who assigns those epoch numbers ? You need all nodes to agree on the epoch number somehow to have this work, but then how do you maintain those in a partition tolerant distributed system ? I may have missed some parts of your proposal but let me consider a scenario that we have to be able to

Re: one way to make counter delete work better

2011-06-14 Thread Milind Parikh
If I understand this correctly, then the epoch integer would be generated by each node. Since time always flows forward, the assumption would be, I suppose, that the epochs would be tagged with the node that generated them and additionally the counter would carry as much history as necessary (and

Re: one way to make counter delete work better

2011-06-14 Thread Yang
I almost got the code done, should release in a bit. your scenario is not a problem concerned with implementation, but really with definition of same time. remember that in a distributed system, there is no absolute physical time concept, time is just another way of saying before or after. in

Re: one way to make counter delete work better

2011-06-14 Thread Yang
in stronger reason, I mean the +3 is already merged up in memtable of node B, you can't find +1 and +2 any more On Tue, Jun 14, 2011 at 7:02 PM, Yang tedd...@gmail.com wrote: I almost got the code done, should release in a bit. your scenario is not a problem concerned with

Re: one way to make counter delete work better

2011-06-14 Thread Yang
yes epoch is generated by each node, in the replica set, upon a delete operation. epoch is **global** to the replica set, for one counter, in contrast to clock, with is local to partition. different counters have different epoch numbers , because different counters can be seen as completely

one way to make counter delete work better

2011-06-13 Thread Yang
as https://issues.apache.org/jira/browse/CASSANDRA-2101 indicates, the problem with counter delete is in scenarios like the following: add 1, clock 100 delete , clock 200 add 2 , clock 300 if the 1st and 3rd operations are merged in SStable compaction, then we have delete clock 200 add 3,

Re: one way to make counter delete work better

2011-06-13 Thread Jonathan Ellis
I don't think that's bulletproof either. For instance, what if the two adds go to replica 1 but the delete to replica 2? Bottom line (and this was discussed on the original delete-for-counters ticket, https://issues.apache.org/jira/browse/CASSANDRA-2101), counter deletes are not fully

Re: one way to make counter delete work better

2011-06-13 Thread Yang
I think this approach also works for your scenario: I thought that the issue is only concerned with merging within the same leader; but you pointed out that a similar merging happens between leaders too, now I see that the same rules on epoch number also applies to inter-leader data merging,