Cassandra atomicity/isolation/transaction in multithread counter updates

2012-06-16 Thread Manuel Peli
I'm in a pseudo-deadlock about Cassandra and
atomicity/isolation/transaction arguments. My simple question is: what
happens when two (or more) threads try to update (increment) the same
integer column value of the same row in a column family? I've read
something about row-level isolation, but I don't sure that is managed
properly. Any suggestions? (N.B. The updates requires a read of current
value before the update write. Otherwise counter column can be used, but in
my opinion the problem still remain).

My personal idea is described next. Because it's a real time analytics
application, the counter updates are inherent only the current hour, while
previous hours still remain the same. So I think that one way to avoid the
problem should be to use a RDBMS layer for current updates (which support
ACID properties) and when the hour expires consolidate data on Cassandra.
It's right?

Also in the case of RDBMS layer still remain the transaction problem: some
update on different column family are correlated and if even one fails a
rollback is needed. I know that Cassandra doesn't support transactions, but
I think that, playing with replication factor and write/read levels the
problem can be mitigated, eventually implementing an application level
commit/rollback. I read something about Zookeeper, but I guess that add
complexity and latency.


Re: Unbalanced ring in Cassandra 0.8.4

2012-06-16 Thread Raj N
Nick, do you think I should still run cleanup on the first node.

-Rajesh

On Fri, Jun 15, 2012 at 3:47 PM, Raj N raj.cassan...@gmail.com wrote:

 I did run nodetool move. But that was when I was setting up the cluster
 which means I didn't have any data at that time.

 -Raj


 On Fri, Jun 15, 2012 at 1:29 PM, Nick Bailey n...@datastax.com wrote:

 Did you start all your nodes at the correct tokens or did you balance
 by moving them? Moving nodes around won't delete unneeded data after
 the move is done.

 Try running 'nodetool cleanup' on all of your nodes.

 On Fri, Jun 15, 2012 at 12:24 PM, Raj N raj.cassan...@gmail.com wrote:
  Actually I am not worried about the percentage. Its the data I am
 concerned
  about. Look at the first node. It has 102.07GB data. And the other nodes
  have around 60 GB(one has 69, but lets ignore that one). I am not
  understanding why the first node has almost double the data.
 
  Thanks
  -Raj
 
 
  On Fri, Jun 15, 2012 at 11:06 AM, Nick Bailey n...@datastax.com
 wrote:
 
  This is just a known problem with the nodetool output and multiple
  DCs. Your configuration is correct. The problem with nodetool is fixed
  in 1.1.1
 
  https://issues.apache.org/jira/browse/CASSANDRA-3412
 
  On Fri, Jun 15, 2012 at 9:59 AM, Raj N raj.cassan...@gmail.com
 wrote:
   Hi experts,
   I have a 6 node cluster across 2 DCs(DC1:3, DC2:3). I have
 assigned
   tokens using the first strategy(adding 1) mentioned here -
  
   http://wiki.apache.org/cassandra/Operations?#Token_selection
  
   But when I run nodetool ring on my cluster, this is the result I get
 -
  
   Address DC  Rack  Status State   LoadOwnsToken
  
113427455640312814857969558651062452225
   172.17.72.91DC1 RAC13 Up Normal  102.07 GB   33.33%  0
   45.10.80.144DC2 RAC5  Up Normal  59.1 GB 0.00%   1
   172.17.72.93DC1 RAC18 Up Normal  59.57 GB33.33%
56713727820156407428984779325531226112
   45.10.80.146DC2 RAC7  Up Normal  59.64 GB0.00%
   56713727820156407428984779325531226113
   172.17.72.95DC1 RAC19 Up Normal  69.58 GB33.33%
113427455640312814857969558651062452224
   45.10.80.148DC2 RAC9  Up Normal  59.31 GB0.00%
   113427455640312814857969558651062452225
  
  
   As you can see the first node has considerably more load than the
   others(almost double) which is surprising since all these are
 replicas
   of
   each other. I am running Cassandra 0.8.4. Is there an explanation for
   this
   behaviour? Could
 https://issues.apache.org/jira/browse/CASSANDRA-2433 be
   the
   cause for this?
  
   Thanks
   -Raj