Yang, I'm not sure I understand what you mean by prefix of the HLog.
Also, can you explain what failure scenario you are talking about? The
major failure that I see is when the leader node confirms to the client
a successful local write, but then fails before the write can be
replicated to
If you're interested in this idea, you should read up about Spinnaker:
http://www.vldb.org/pvldb/vol4/p243-rao.pdf
-ryan
On Mon, Jul 11, 2011 at 2:48 PM, Yang tedd...@gmail.com wrote:
I'm not proposing any changes to be done, but this looks like a very
interesting topic for
for example,
coord writes record 1,2 ,3 ,4,5 in sequence
if u have replica A, B, C
currently A can have 1 , 3
B can have 1,3,4,
C can have 2345
by prefix, I mean I want them to have only 1---n where n is some
number between 1 and 5,
for example A having 1,2,3
B having 1,2,3,4
C having 1,2,3,4,5
thanks , let me read it...
On Tue, Jul 12, 2011 at 9:27 AM, Ryan King r...@twitter.com wrote:
If you're interested in this idea, you should read up about Spinnaker:
http://www.vldb.org/pvldb/vol4/p243-rao.pdf
-ryan
On Mon, Jul 11, 2011 at 2:48 PM, Yang tedd...@gmail.com wrote:
I'm not
On 7/12/2011 10:48 AM, Yang wrote:
for example,
coord writes record 1,2 ,3 ,4,5 in sequence
if u have replica A, B, C
currently A can have 1 , 3
B can have 1,3,4,
C can have 2345
by prefix, I mean I want them to have only 1---n where n is some
number between 1 and 5,
for example A having
that is not an important issue, it's separate from the replication
question I'm thinking about.
for now I'll just think about the case where every node owns the same
key range , or N=RF.
Are you saying: All replicas will receive the value whether or not they
actually own the key range for the
I'm not proposing any changes to be done, but this looks like a very
interesting topic for thought/hack/learning, so the following are only
for thought exercises
HBase enforces a single write/read entry point, so you can achieve
strong consistency by writing/reading only one node. but just
Yang,
How would you deal with the problem when the 1st node responds success
but then crashes before completely forwarding any replicas? Then, after
switching to the next primary, a read would return stale data.
Here's a quick-n-dirty way: Send the value to the primary replica and
send
Why not send the value itself instead of a placeholder? Now it takes
2x writes on a random node to do a single update (write placeholder,
write update) and N*x writes from the client (write value, write
placeholder to N-1). Where N is replication factor. Seems like extra
network and IO
I'm no expert. So addressing the question to me probably give you real
answers :)
The single entry mode makes sure that all writes coming through the leader
are received by replicas before ack to client. Probably wont be stale data
On Jul 3, 2011 11:20 AM, AJ a...@dude.podzone.net wrote:
Yang,
On 7/3/2011 3:49 PM, Will Oberman wrote:
Why not send the value itself instead of a placeholder? Now it takes
2x writes on a random node to do a single update (write placeholder,
write update) and N*x writes from the client (write value, write
placeholder to N-1). Where N is replication
On 7/3/2011 4:07 PM, Yang wrote:
I'm no expert. So addressing the question to me probably give you real
answers :)
The single entry mode makes sure that all writes coming through the
leader are received by replicas before ack to client. Probably wont be
stale data
That doesn't sound
Was just going off of: Send the value to the primary replica and send
placeholder values to the other replicas. Sounded like you wanted to write
the value to one, and write the placeholder to N-1 to me. But, C* will
propagate the value to N-1 eventually anyways, 'cause that's just what it
does
On 7/3/2011 6:32 PM, William Oberman wrote:
Was just going off of: Send the value to the primary replica and
send placeholder values to the other replicas. Sounded like you
wanted to write the value to one, and write the placeholder to N-1 to me.
Yes, that is what I was suggesting. The
I'm using cassandra as a tool, like a black box with a certain contract to
the world. Without modifying the core, C* will send the updates to all
replicas, so your plan would cause the extra write (for the placeholder). I
wasn't assuming a modification to how C* fundamentally works.
Sounds like
We seem to be having a fundamental misunderstanding. Thanks for your
comments. aj
On 7/3/2011 8:28 PM, William Oberman wrote:
I'm using cassandra as a tool, like a black box with a certain
contract to the world. Without modifying the core, C* will send the
updates to all replicas, so your
Ok, I see the you happen to choose the 'right' node idea, but it sounds
like you want to solve C* problems in the client, and they already wrote
that complicated code to make clients simple. You're talking about
reimplementing key-node mappings, network topology (with failures), etc...
Plus, if
there is a JIRA completed in 0.7.x that Prefers a certain node in snitch,
so this does roughly what you want MOST of the time
but the problem is that it does not GUARANTEE that the same node will always
be read. I recently read into the HBase vs Cassandra comparison thread that
started after
The way HBase uses ZK (for master election) is not even close to how
Cassandra uses the failure detector.
Using ZK for each operation would (a) not scale and (b) not work
cross-DC for any reasonable latency requirements.
On Sat, Jul 2, 2011 at 11:55 AM, Yang tedd...@gmail.com wrote:
there
Jonathan:
could you please elaborate more on specifically why they are not even
close?
--- I kind of see what you mean (please correct me if I misunderstood):
Cassandra failure detector
is consulted on every write; while HBase failure detector is only used when
the tablet server joins or leaves.
On Sat, Jul 2, 2011 at 3:57 PM, Yang tedd...@gmail.com wrote:
Jonathan:
could you please elaborate more on specifically why they are not even
close?
--- I kind of see what you mean (please correct me if I misunderstood):
Cassandra failure detector
is consulted on every write; while
Yang, you seem to understand all of the details, at least the details
that have occurred to me, such as having a failure protocol rather than
a perfect failure detector and new leader coordination.
I finally did some more reading outside of Cassandra space and realized
HBase has what I was
On 7/2/2011 6:03 AM, William Oberman wrote:
Ok, I see the you happen to choose the 'right' node idea, but it
sounds like you want to solve C* problems in the client, and they
already wrote that complicated code to make clients simple. You're
talking about reimplementing key-node mappings,
Is this possible?
All reads and writes for a given key will always go to the same node
from a client. It seems the only thing needed is to allow the clients
to compute which node is the closes replica for the given key using the
same algorithm C* uses. When the first replica receives the
Sent from my iPhone
On Jul 1, 2011, at 9:53 PM, AJ a...@dude.podzone.net wrote:
Is this possible?
All reads and writes for a given key will always go to the same node
from a client.
I don't think that's true. Given a key K, the client will write to N
nodes (N=replication factor). And
I'm saying I will make my clients forward the C* requests to the first replica
instead of forwarding to a random node.
--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
Will Oberman ober...@civicscience.com wrote:
Sent from my iPhone
On Jul 1, 2011, at 9:53 PM, AJ
26 matches
Mail list logo