Re: Strong Consistency with ONE read/writes

2011-07-12 Thread AJ
Yang, I'm not sure I understand what you mean by prefix of the HLog. Also, can you explain what failure scenario you are talking about? The major failure that I see is when the leader node confirms to the client a successful local write, but then fails before the write can be replicated to

Re: Strong Consistency with ONE read/writes

2011-07-12 Thread Ryan King
If you're interested in this idea, you should read up about Spinnaker: http://www.vldb.org/pvldb/vol4/p243-rao.pdf -ryan On Mon, Jul 11, 2011 at 2:48 PM, Yang tedd...@gmail.com wrote: I'm not proposing any changes to be done, but this looks like a very interesting topic for

Re: Strong Consistency with ONE read/writes

2011-07-12 Thread Yang
for example, coord writes record 1,2 ,3 ,4,5 in sequence if u have replica A, B, C currently A can have 1 , 3 B can have 1,3,4, C can have 2345 by prefix, I mean I want them to have only 1---n where n is some number between 1 and 5, for example A having 1,2,3 B having 1,2,3,4 C having 1,2,3,4,5

Re: Strong Consistency with ONE read/writes

2011-07-12 Thread Yang
thanks , let me read it... On Tue, Jul 12, 2011 at 9:27 AM, Ryan King r...@twitter.com wrote: If you're interested in this idea, you should read up about Spinnaker: http://www.vldb.org/pvldb/vol4/p243-rao.pdf -ryan On Mon, Jul 11, 2011 at 2:48 PM, Yang tedd...@gmail.com wrote: I'm not

Re: Strong Consistency with ONE read/writes

2011-07-12 Thread AJ
On 7/12/2011 10:48 AM, Yang wrote: for example, coord writes record 1,2 ,3 ,4,5 in sequence if u have replica A, B, C currently A can have 1 , 3 B can have 1,3,4, C can have 2345 by prefix, I mean I want them to have only 1---n where n is some number between 1 and 5, for example A having

Re: Strong Consistency with ONE read/writes

2011-07-12 Thread Yang
that is not an important issue, it's separate from the replication question I'm thinking about. for now I'll just think about the case where every node owns the same key range , or N=RF. Are you saying:  All replicas will receive the value whether or not they actually own the key range for the

Re: Strong Consistency with ONE read/writes

2011-07-11 Thread Yang
I'm not proposing any changes to be done, but this looks like a very interesting topic for thought/hack/learning, so the following are only for thought exercises HBase enforces a single write/read entry point, so you can achieve strong consistency by writing/reading only one node. but just

Re: Strong Consistency with ONE read/writes

2011-07-03 Thread AJ
Yang, How would you deal with the problem when the 1st node responds success but then crashes before completely forwarding any replicas? Then, after switching to the next primary, a read would return stale data. Here's a quick-n-dirty way: Send the value to the primary replica and send

Re: Strong Consistency with ONE read/writes

2011-07-03 Thread Will Oberman
Why not send the value itself instead of a placeholder? Now it takes 2x writes on a random node to do a single update (write placeholder, write update) and N*x writes from the client (write value, write placeholder to N-1). Where N is replication factor. Seems like extra network and IO

Re: Strong Consistency with ONE read/writes

2011-07-03 Thread Yang
I'm no expert. So addressing the question to me probably give you real answers :) The single entry mode makes sure that all writes coming through the leader are received by replicas before ack to client. Probably wont be stale data On Jul 3, 2011 11:20 AM, AJ a...@dude.podzone.net wrote: Yang,

Re: Strong Consistency with ONE read/writes

2011-07-03 Thread AJ
On 7/3/2011 3:49 PM, Will Oberman wrote: Why not send the value itself instead of a placeholder? Now it takes 2x writes on a random node to do a single update (write placeholder, write update) and N*x writes from the client (write value, write placeholder to N-1). Where N is replication

Re: Strong Consistency with ONE read/writes

2011-07-03 Thread AJ
On 7/3/2011 4:07 PM, Yang wrote: I'm no expert. So addressing the question to me probably give you real answers :) The single entry mode makes sure that all writes coming through the leader are received by replicas before ack to client. Probably wont be stale data That doesn't sound

Re: Strong Consistency with ONE read/writes

2011-07-03 Thread William Oberman
Was just going off of: Send the value to the primary replica and send placeholder values to the other replicas. Sounded like you wanted to write the value to one, and write the placeholder to N-1 to me. But, C* will propagate the value to N-1 eventually anyways, 'cause that's just what it does

Re: Strong Consistency with ONE read/writes

2011-07-03 Thread AJ
On 7/3/2011 6:32 PM, William Oberman wrote: Was just going off of: Send the value to the primary replica and send placeholder values to the other replicas. Sounded like you wanted to write the value to one, and write the placeholder to N-1 to me. Yes, that is what I was suggesting. The

Re: Strong Consistency with ONE read/writes

2011-07-03 Thread William Oberman
I'm using cassandra as a tool, like a black box with a certain contract to the world. Without modifying the core, C* will send the updates to all replicas, so your plan would cause the extra write (for the placeholder). I wasn't assuming a modification to how C* fundamentally works. Sounds like

Re: Strong Consistency with ONE read/writes

2011-07-03 Thread AJ
We seem to be having a fundamental misunderstanding. Thanks for your comments. aj On 7/3/2011 8:28 PM, William Oberman wrote: I'm using cassandra as a tool, like a black box with a certain contract to the world. Without modifying the core, C* will send the updates to all replicas, so your

Re: Strong Consistency with ONE read/writes

2011-07-02 Thread William Oberman
Ok, I see the you happen to choose the 'right' node idea, but it sounds like you want to solve C* problems in the client, and they already wrote that complicated code to make clients simple. You're talking about reimplementing key-node mappings, network topology (with failures), etc... Plus, if

Re: Strong Consistency with ONE read/writes

2011-07-02 Thread Yang
there is a JIRA completed in 0.7.x that Prefers a certain node in snitch, so this does roughly what you want MOST of the time but the problem is that it does not GUARANTEE that the same node will always be read. I recently read into the HBase vs Cassandra comparison thread that started after

Re: Strong Consistency with ONE read/writes

2011-07-02 Thread Jonathan Ellis
The way HBase uses ZK (for master election) is not even close to how Cassandra uses the failure detector. Using ZK for each operation would (a) not scale and (b) not work cross-DC for any reasonable latency requirements. On Sat, Jul 2, 2011 at 11:55 AM, Yang tedd...@gmail.com wrote: there

Re: Strong Consistency with ONE read/writes

2011-07-02 Thread Yang
Jonathan: could you please elaborate more on specifically why they are not even close? --- I kind of see what you mean (please correct me if I misunderstood): Cassandra failure detector is consulted on every write; while HBase failure detector is only used when the tablet server joins or leaves.

Re: Strong Consistency with ONE read/writes

2011-07-02 Thread Edward Capriolo
On Sat, Jul 2, 2011 at 3:57 PM, Yang tedd...@gmail.com wrote: Jonathan: could you please elaborate more on specifically why they are not even close? --- I kind of see what you mean (please correct me if I misunderstood): Cassandra failure detector is consulted on every write; while

Re: Strong Consistency with ONE read/writes

2011-07-02 Thread AJ
Yang, you seem to understand all of the details, at least the details that have occurred to me, such as having a failure protocol rather than a perfect failure detector and new leader coordination. I finally did some more reading outside of Cassandra space and realized HBase has what I was

Re: Strong Consistency with ONE read/writes

2011-07-02 Thread AJ
On 7/2/2011 6:03 AM, William Oberman wrote: Ok, I see the you happen to choose the 'right' node idea, but it sounds like you want to solve C* problems in the client, and they already wrote that complicated code to make clients simple. You're talking about reimplementing key-node mappings,

Strong Consistency with ONE read/writes

2011-07-01 Thread AJ
Is this possible? All reads and writes for a given key will always go to the same node from a client. It seems the only thing needed is to allow the clients to compute which node is the closes replica for the given key using the same algorithm C* uses. When the first replica receives the

Re: Strong Consistency with ONE read/writes

2011-07-01 Thread Will Oberman
Sent from my iPhone On Jul 1, 2011, at 9:53 PM, AJ a...@dude.podzone.net wrote: Is this possible? All reads and writes for a given key will always go to the same node from a client. I don't think that's true. Given a key K, the client will write to N nodes (N=replication factor). And

Re: Strong Consistency with ONE read/writes

2011-07-01 Thread AJ
I'm saying I will make my clients forward the C* requests to the first replica instead of forwarding to a random node. -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. Will Oberman ober...@civicscience.com wrote: Sent from my iPhone On Jul 1, 2011, at 9:53 PM, AJ