Re: Understand eventually consistent

2011-02-24 Thread Javier Canillas
First of all, in your example W=CL?

If it so, then the success of any read / write operarion will be
determine by if the CL required can be satisfied in that moment.

If you write with CL ONE over a CF with RF 3 when 1 node of the
replicas is down, then the operarion will success and HitedHandOff
will manage to propagate the op through the falling node when it comes
up.

Instead, when you execute the same OP using CL QUORUM, then it means
RF /2+1, it will try to write on the coordinator node and replica.
Considering only 1 replica is down, the OP will success too.

Now consider same OP but with CL ALL, it will fail since it cant
assure that coordinador and both replicas are updated.

Hope you can understand the relation between CL and RF

Enviado desde mi iPhone

El 23/02/2011, a las 21:43, mcasandra mohitanch...@gmail.com escribió:


 I am reading this again http://wiki.apache.org/cassandra/HintedHandoff and
 got little confused. This is my understanding about how HH should work based
 on what I read in Dynamo Paper:

 1) Say node A, B, C, D, E are in the cluster in a ring (in that order).
 2) For a given key K RF=3.
 3) Node B holds theyhash of that key K. Which means when K is written it
 will be written to B (owner of the hash) + C + D since RF = 3
 4) If Node D goes down and there is a write again to key K then this time
 key K row will be written with W=1 to B (owner) + C + E (HH) since RF=3
 needs to be satisfied. Is this correct?
 5) In above scenario where node D is down and if we are reading at W=2 and
 R=2 would it fail even though original nodes B + C are up? Here I am
 thinking W=2 and R=2 means that 2 nodes that hold the key K are up so it
 satisfies the CL and thus writes and read will not fail.
 --
 View this message in context: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6058576.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
 Nabble.com.


Re: Understand eventually consistent

2011-02-24 Thread mcasandra


Javier Canillas wrote:
 
 Instead, when you execute the same OP using CL QUORUM, then it means
 RF /2+1, it will try to write on the coordinator node and replica.
 Considering only 1 replica is down, the OP will success too.
 

I am assuming even read will succeed when CL QUORUM and RF=3 and 1 node is
down.


Javier Canillas wrote:
 
 Now consider same OP but with CL ALL, it will fail since it cant
 assure that coordinador and both replicas are updated. 
 

Can you please explain this little more? I thought CL ALL will fail because
it needs all the nodes to be up.
http://wiki.apache.org/cassandra/API

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6061399.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Understand eventually consistent

2011-02-24 Thread Javier Canillas
Well, it will need all nodes that are required on the operation to be up,
and to response in a timely fashion, even a time-out rpc of 1 replica will
get you a fail response.

CL is calculated based on the RF configured for the ColumnFamily.

The ConsistencyLevel is an enum that controls both read and write behavior
based on ReplicationFactor in your storage-conf.xml.

QUORUM = RF / 2 +1;
ALL = RF
ONE = 1
ANY = 0

Then, on a column family configured with RF = 6, QUORUM means be sure to
write at least over 4 nodes before responding, but on a column family
configured with RF = 3, QUORUM means be sure to write on 2 at least. In
cases where RF is 1 or 2, then QUORUM is like ALL (be sure to write on all
nodes involved).


On Thu, Feb 24, 2011 at 3:29 PM, mcasandra mohitanch...@gmail.com wrote:



 Javier Canillas wrote:
 
  Instead, when you execute the same OP using CL QUORUM, then it means
  RF /2+1, it will try to write on the coordinator node and replica.
  Considering only 1 replica is down, the OP will success too.
 

 I am assuming even read will succeed when CL QUORUM and RF=3 and 1 node is
 down.


 Javier Canillas wrote:
 
  Now consider same OP but with CL ALL, it will fail since it cant
  assure that coordinador and both replicas are updated.
 

 Can you please explain this little more? I thought CL ALL will fail because
 it needs all the nodes to be up.
 http://wiki.apache.org/cassandra/API

 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6061399.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.



Re: Understand eventually consistent

2011-02-24 Thread mcasandra

Does HH count towards QUORUM? Say  RF=1 and CL of W=QUORUM and one node that
owns the key dies. Would subsequent write operations for that key be
successful? I am guessing it will not succeed.
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6061593.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Understand eventually consistent

2011-02-24 Thread Tyler Hobbs
On Thu, Feb 24, 2011 at 1:26 PM, mcasandra mohitanch...@gmail.com wrote:


 Does HH count towards QUORUM? Say  RF=1 and CL of W=QUORUM and one node
 that
 owns the key dies. Would subsequent write operations for that key be
 successful? I am guessing it will not succeed.


No, it would not succeed. It would only succeed at CL.ANY.

-- 
Tyler Hobbs
Software Engineer, DataStax http://datastax.com/
Maintainer of the pycassa http://github.com/pycassa/pycassa Cassandra
Python client library


Re: Understand eventually consistent

2011-02-24 Thread Javier Canillas
HH is some kind of write repair, so it has nothing to do with CL that is a
requirement of the operation; and it won't be used over reads.

In your example QUORUM is the same as ALL, since you only have 1 RF (only
the data holder - coordinator). If that node fails, all read / writes will
fail.

Now, on another scenario, with RF = 3 and 1 node down:

CL = QUORUM. Will work, but the coordination will mark an HH over the write
and attempt to do it for some time over the failed node. Despite this, the
operation will success for the client.
CL = ALL. Will fail.
CL = ONE. Will work. 2 HH will be sent to replicas to perform the update.

*Consider CL is the client minimum requirement over an operation to succeed*.
If the cluster can assure that value, then the operation will succeed and
returned to the client (despite some HH work needs to be done after), if not
an error response will be returned.


On Thu, Feb 24, 2011 at 4:26 PM, mcasandra mohitanch...@gmail.com wrote:


 Does HH count towards QUORUM? Say  RF=1 and CL of W=QUORUM and one node
 that
 owns the key dies. Would subsequent write operations for that key be
 successful? I am guessing it will not succeed.
 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6061593.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.



Re: Understand eventually consistent

2011-02-24 Thread mcasandra


Javier Canillas wrote:
 
 HH is some kind of write repair, so it has nothing to do with CL that is a
 requirement of the operation; and it won't be used over reads.
 
 In your example QUORUM is the same as ALL, since you only have 1 RF (only
 the data holder - coordinator). If that node fails, all read / writes will
 fail.
 
 Now, on another scenario, with RF = 3 and 1 node down:
 
 CL = QUORUM. Will work, but the coordination will mark an HH over the
 write
 and attempt to do it for some time over the failed node. Despite this, the
 operation will success for the client.
 CL = ALL. Will fail.
 CL = ONE. Will work. 2 HH will be sent to replicas to perform the update.
 
 *Consider CL is the client minimum requirement over an operation to
 succeed*.
 If the cluster can assure that value, then the operation will succeed and
 returned to the client (despite some HH work needs to be done after), if
 not
 an error response will be returned.
 
 
 On Thu, Feb 24, 2011 at 4:26 PM, mcasandra mohitanch...@gmail.com wrote:
 

 Does HH count towards QUORUM? Say  RF=1 and CL of W=QUORUM and one node
 that
 owns the key dies. Would subsequent write operations for that key be
 successful? I am guessing it will not succeed.
 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6061593.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.

 
 

Thanks! In above scenario what happens if 2 nodes die and RF=3, CL of
W=QUORUM. Would a write succeed since one write can be made to coordinator
node with HH and other to the replica node that is up.

And similarly in above scenario would read succeed. Would HH be considered
towards CL in this case?
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6061772.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Understand eventually consistent

2011-02-24 Thread Javier Canillas
No, since you are intentionally asking that at least a QUORUM of the RFs are
written. So in your scenario, only 1 node is up of 3, and QUORUM value is 2.
So that operation will fail, no HH is made.

A read won't succedd either, since you are asking that the data to be
returned must be validated at least by 2 nodes.

HH only takes place on write operations and when the OP succeded because the
CL can be satisfied and other replicas are down. Then the coordinator uses
HH to perform the updates on the failed replicas (as soon as they get up).

On Thu, Feb 24, 2011 at 5:13 PM, mcasandra mohitanch...@gmail.com wrote:



 Javier Canillas wrote:
 
  HH is some kind of write repair, so it has nothing to do with CL that is
 a
  requirement of the operation; and it won't be used over reads.
 
  In your example QUORUM is the same as ALL, since you only have 1 RF (only
  the data holder - coordinator). If that node fails, all read / writes
 will
  fail.
 
  Now, on another scenario, with RF = 3 and 1 node down:
 
  CL = QUORUM. Will work, but the coordination will mark an HH over the
  write
  and attempt to do it for some time over the failed node. Despite this,
 the
  operation will success for the client.
  CL = ALL. Will fail.
  CL = ONE. Will work. 2 HH will be sent to replicas to perform the update.
 
  *Consider CL is the client minimum requirement over an operation to
  succeed*.
  If the cluster can assure that value, then the operation will succeed and
  returned to the client (despite some HH work needs to be done after), if
  not
  an error response will be returned.
 
 
  On Thu, Feb 24, 2011 at 4:26 PM, mcasandra mohitanch...@gmail.com
 wrote:
 
 
  Does HH count towards QUORUM? Say  RF=1 and CL of W=QUORUM and one node
  that
  owns the key dies. Would subsequent write operations for that key be
  successful? I am guessing it will not succeed.
  --
  View this message in context:
 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6061593.html
  Sent from the cassandra-u...@incubator.apache.org mailing list archive
 at
  Nabble.com.
 
 
 

 Thanks! In above scenario what happens if 2 nodes die and RF=3, CL of
 W=QUORUM. Would a write succeed since one write can be made to coordinator
 node with HH and other to the replica node that is up.

 And similarly in above scenario would read succeed. Would HH be considered
 towards CL in this case?
 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6061772.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.



Re: Understand eventually consistent

2011-02-24 Thread mcasandra

Thanks. This helps a lot!
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6061838.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Understand eventually consistent

2011-02-23 Thread mcasandra

I am reading this again http://wiki.apache.org/cassandra/HintedHandoff and
got little confused. This is my understanding about how HH should work based
on what I read in Dynamo Paper:

1) Say node A, B, C, D, E are in the cluster in a ring (in that order). 
2) For a given key K RF=3.
3) Node B holds theyhash of that key K. Which means when K is written it
will be written to B (owner of the hash) + C + D since RF = 3
4) If Node D goes down and there is a write again to key K then this time
key K row will be written with W=1 to B (owner) + C + E (HH) since RF=3
needs to be satisfied. Is this correct?
5) In above scenario where node D is down and if we are reading at W=2 and
R=2 would it fail even though original nodes B + C are up? Here I am
thinking W=2 and R=2 means that 2 nodes that hold the key K are up so it
satisfies the CL and thus writes and read will not fail.
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6058576.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Understand eventually consistent

2011-02-21 Thread mcasandra


David Strauss-2 wrote:
 
 On Fri, 2011-02-18 at 12:01 -0600, Anthony John wrote:
 Writes will go thru w/hinted handoff, read will fail
 
 That is not correct. Hinted handoffs do not count toward reaching QUORUM
 counts.[1]
 
 [1] http://wiki.apache.org/cassandra/HintedHandoff
 
 -- 
 David Strauss
| da...@davidstrauss.net
| +1 512 577 5827 [mobile]
 
 
 
I read the logic of why writes are not allowed.  But other alternative is to
allow write and just fail the reads until it's in sync again. Is there some
other problem with this logic?
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6049678.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Understand eventually consistent

2011-02-21 Thread Peter Schuller
 I read the logic of why writes are not allowed.  But other alternative is to
 allow write and just fail the reads until it's in sync again. Is there some
 other problem with this logic?

The problem lies in until it's in sync again. A given node cannot
easily know for a given read, whether everything is in sync with
respect to the data participating in that read. I didn't think about
it very carefully, but off the top of my head, in the most general
case this would seem to require strong co-ordination that is
antithetical to the design of Cassandra.

(Consider that for a read of a set of columns, for each column the
node would have to know whether the nodes participating in the read
have any hints pending in the cluster. Since the co-ordinating node
cannot know the context in which the call is made (maybe the client or
some other client *just* wrote at quorom with nodes down), this
essentially implies co-ordination on every read, at all times.)

-- 
/ Peter Schuller


Re: Understand eventually consistent

2011-02-20 Thread David Strauss
On Fri, 2011-02-18 at 12:01 -0600, Anthony John wrote:
 Writes will go thru w/hinted handoff, read will fail

That is not correct. Hinted handoffs do not count toward reaching QUORUM
counts.[1]

[1] http://wiki.apache.org/cassandra/HintedHandoff

-- 
David Strauss
   | da...@davidstrauss.net
   | +1 512 577 5827 [mobile]



signature.asc
Description: This is a digitally signed message part


Re: Understand eventually consistent

2011-02-18 Thread Anthony John
At Quorum - if 2 of 3 nodes are down, a read should not be returned, right ?

But yes - if single node READs are opted for, it will go through.

The original question was - Why is Cassandra called eventually consistent
data store?
Because at write time, there is not a guarantee that all replicas are
consistent. But they eventually will be!

At Quorum write and Read - you will not get inconsistent results and your
read will force consistency, if such a state has not yet been arrived at for
the particular piece of data.

But you have the option of or writing and reading at a lower standard, which
could result in inconsistencies.

HTH,

-JA

On Fri, Feb 18, 2011 at 12:00 AM, Stu Hood stuh...@gmail.com wrote:

 But, the reason that it isn't safe to say that we are a strongly consistent
 store is that if 2 of your 3 replicas were to die and come back with no
 data, QUORUM might return the wrong result.

 A requirement of a strongly consistent store is that replicas cannot begin
 answering queries until they are consistent: this is not a requirement in
 Cassandra, althought arguably should be an option at some point in the
 distant future.


 On Thu, Feb 17, 2011 at 5:26 PM, Aaron Morton aa...@thelastpickle.comwrote:

 For background...

 http://wiki.apache.org/cassandra/ArchitectureOverview
 (There is a section on consistency in there)

 For  deep background...
 http://www.allthingsdistributed.com/2008/12/eventually_consistent.html

 http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf

 In short, yes (for all your questions) if you read and write at Quorum you
 have consistency behavior for your operations. Even though some nodes
 may have an inconsistent view of the data, e.g. one node is partitioned by
 a broken network or is overloaded and does not respond.

 Aaron

 On 18 Feb, 2011,at 02:11 PM, mcasandra mohitanch...@gmail.com wrote:


 Why is Cassandra called eventually consistent data store? Wouldn't it be
 consistent if QUORAM is used?

 Another question is when I specify replication factor of 3 and write with
 factor of 2 and read with factor of 2 then what happens?

 1. When write occurs cassandra will return to the client only when the
 writes go to commit log on 2 nodes successfully?

 2. When read happens cassandra will return only when it is able to read
 from
 2 nodes and determine that it has consistent copy?
 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6038330.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.





Re: Understand eventually consistent

2011-02-18 Thread Markus Klems
Related question: Is it a good idea to specify ConsistencyLevels on a
per-operation basis? For example: Read ONE Write ALL would deliver
consistent read results, just like Read ALL Write ONE. However, if you
specify Read ONE Write QUORUM you cannot give such guarantees anymore.
Should there be (is there) a programming abstraction on top of
ConsistencyLevel that takes care of these things and makes them
explicit to the application developer?

On Fri, Feb 18, 2011 at 2:04 PM, Anthony John chirayit...@gmail.com wrote:
 At Quorum - if 2 of 3 nodes are down, a read should not be returned, right ?
 But yes - if single node READs are opted for, it will go through.
 The original question was - Why is Cassandra called eventually consistent
 data store?
 Because at write time, there is not a guarantee that all replicas are
 consistent. But they eventually will be!
 At Quorum write and Read - you will not get inconsistent results and your
 read will force consistency, if such a state has not yet been arrived at for
 the particular piece of data.
 But you have the option of or writing and reading at a lower standard, which
 could result in inconsistencies.
 HTH,
 -JA
 On Fri, Feb 18, 2011 at 12:00 AM, Stu Hood stuh...@gmail.com wrote:

 But, the reason that it isn't safe to say that we are a strongly
 consistent store is that if 2 of your 3 replicas were to die and come back
 with no data, QUORUM might return the wrong result.
 A requirement of a strongly consistent store is that replicas cannot begin
 answering queries until they are consistent: this is not a requirement in
 Cassandra, althought arguably should be an option at some point in the
 distant future.

 On Thu, Feb 17, 2011 at 5:26 PM, Aaron Morton aa...@thelastpickle.com
 wrote:

 For background...
 http://wiki.apache.org/cassandra/ArchitectureOverview
 (There is a section on consistency in there)
 For  deep background...
 http://www.allthingsdistributed.com/2008/12/eventually_consistent.html

 http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf
 In short, yes (for all your questions) if you read and write at Quorum
 you have consistency behavior for your operations. Even though some nodes
 may have an inconsistent view of the data, e.g. one node is partitioned
 by a broken network or is overloaded and does not respond.

 Aaron
 On 18 Feb, 2011,at 02:11 PM, mcasandra mohitanch...@gmail.com wrote:


 Why is Cassandra called eventually consistent data store? Wouldn't it be
 consistent if QUORAM is used?

 Another question is when I specify replication factor of 3 and write with
 factor of 2 and read with factor of 2 then what happens?

 1. When write occurs cassandra will return to the client only when the
 writes go to commit log on 2 nodes successfully?

 2. When read happens cassandra will return only when it is able to read
 from
 2 nodes and determine that it has consistent copy?
 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6038330.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.





Re: Understand eventually consistent

2011-02-18 Thread Jonathan Ellis
On Fri, Feb 18, 2011 at 12:00 AM, Stu Hood stuh...@gmail.com wrote:
 But, the reason that it isn't safe to say that we are a strongly consistent
 store is that if 2 of your 3 replicas were to die and come back with no
 data, QUORUM might return the wrong result.

 Not so.  If you allow vaporizing arbitrary numbers of machines
without a trace then only systems that block for all replicas on each
update could be considered strongly consistent, and I don't know of
any systems in the wild that actually do that.  Certainly other
systems commonly considered strongly consisent like HBase do not.

 A requirement of a strongly consistent store is that replicas cannot begin
 answering queries until they are consistent

The system as a whole can be consistent even if an individual replica
is not; that is the point of CL   ONE.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Understand eventually consistent

2011-02-18 Thread Anthony John
Again, my understanding!

1. Writes will go thru w/hinted handoff, read will fail
2. Yes - but Oracle and others have no partition tolerance and lower levels
of availability. To build in partition tolerance and high availability and
still be shared nothing to avoid SPOF (to cover the RAC implementation), you
have to write on to multiple nodes and then read off multiple nodes to
ensure consistency.

You could always run RF=1 to be like most of the traditional DBMSs. The
issues you would phase are the ones that Cassandra is trying to prevent!

HTH,

-JA

On Fri, Feb 18, 2011 at 11:53 AM, mcasandra mohitanch...@gmail.com wrote:


 I have couple of more quesitons:

 1. What happens when RF = 3, R = 2 and W = 2 and 2 machines go down? Would
 read and write fail or get the results from that one machine that is up?
 2. Someone in this thread mentioned that write is eventually consistent. Is
 it because response is returned to the client as soon as data is written to
 commit log. But isn't this same as other RDBMS? Oracle does the same thing
 it writes to REDO log and somepoint later does a checkpoint and flushes
 data
 to disk. But RDBMS is not called eventually consistent.
 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6040893.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.



Re: Understand eventually consistent

2011-02-17 Thread Aaron Morton
For background...http://wiki.apache.org/cassandra/ArchitectureOverview(There is a section on consistency in there)For deep background...http://www.allthingsdistributed.com/2008/12/eventually_consistent.htmlhttp://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdfIn short, yes (for all your questions) if you read and write at Quorum you have consistencybehavior for your operations. Even though some nodesmay have an inconsistent view of the data, e.g. one node is partitioned by a broken network or is overloaded and does not respond.AaronOn 18 Feb, 2011,at 02:11 PM, mcasandra mohitanch...@gmail.com wrote:
Why is Cassandra called eventually consistent data store? Wouldn't it be
consistent if QUORAM is used?

Another question is when I specify replication factor of 3 and write with
factor of 2 and read with factor of 2 then what happens?

1. When write occurs cassandra will return to the client only when the
writes go to commit log on 2 nodes successfully?

2. When read happens cassandra will return only when it is able to read from
2 nodes and determine that it has consistent copy?
-- 
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6038330.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.