Re: All writes fail with ONE consistency level when adding second node to cluster?

2014-07-23 Thread Jack Krupansky
Besides the obviously confusing error message, this particular case could 
simply be that the hash value of the primary key belonged to the other node 
that wasn’t up, so even though one node was up, it didn’t own that particular 
hash value or token, so CL=ONE could not succeed.

What was RF set to for this two node cluster?

-- Jack Krupansky

From: Andrew 
Sent: Wednesday, July 23, 2014 1:02 AM
To: graham sanderson ; user@cassandra.apache.org 
Cc: Kevin Burton 
Subject: Re: All writes fail with ONE consistency level when adding second node 
to cluster?

I looked into this; ONE means it must be written to one replica—i.e., a node 
the data is supposed to be written to.  ANY means a hinted handoff will 
“count”.  So as long as it writes to any node on the cluster—even one that it’s 
not supposed to be on—it will be a success.  Good to know.

Andrew


On July 22, 2014 at 8:13:57 PM, graham sanderson (gra...@vast.com) wrote:

  Incorrect, ONE does not refer to the number of “other nodes, it just refers 
to the number of nodes. so ONE under normal circumstances would only require 
one node to acknowledge the write. 

  The confusing error message you are getting is related to 
https://issues.apache.org/jira/browse/CASSANDRA-833… Kevin you are correct in 
that normally that error message would make no sense.

  I don’t have much experience adding/removing nodes, but I think what is 
happening is that your new node is in the middle of taken over ownership of a 
token range - while that happens C* is trying to write to both the old owner 
(your original node), AND (hence the 2 not 1 in the error message) the new 
owner (the new node) so that once the bootstrapping of the new node is 
complete, it is immediately safe to delete the (no longer owned data) from the 
old node. For whatever reason the write to the new node is timing out, causing 
the exception, and the error message is exposing the “2” which happens to be 
how many C* thinks it is waiting for at the time (i.e. how many it should be 
waiting for based on the consistency level (1) plus this extra node).


  On Jul 22, 2014, at 9:46 PM, Andrew redmu...@gmail.com wrote:


ONE means write to one replica (in addition to the original).  If you want 
to write to any of them, use ANY.  Is that the right understanding?

http://www.datastax.com/docs/1.0/dml/data_consistency

Andrew


On July 22, 2014 at 7:43:43 PM, Kevin Burton (bur...@spinn3r.com) wrote:

  I'm super confused by this.. and disturbed that this was my failure 
scenario :-( 


  I had one cassandra node for the alpha of my app… and now we're moving 
into beta… which means three replicas.


  So I added the second node… but my app immediately broke with:


  Cassandra timeout during write query at consistency ONE (2 replica were 
required but only 1 acknowledged the write)


  … but that makes no sense… if I'm at ONE and I have one acknowledged 
write, why does it matter that the second one hasn't ack'd yet…


  ?


  --


  Founder/CEO Spinn3r.com

  Location: San Francisco, CA

  blog: http://burtonator.wordpress.com
  … or check out my Google+ profile 



--


Re: All writes fail with ONE consistency level when adding second node to cluster?

2014-07-23 Thread Robert Coli
On Tue, Jul 22, 2014 at 7:46 PM, Andrew redmu...@gmail.com wrote:

 ONE means write to one replica (in addition to the original).  If you want
 to write to any of them, use ANY.  Is that the right understanding?


This has come up a few times, so let me be unambiguous about when to use
CL.ANY :

NEVER EVER USE CL.ANY. IT ALMOST CERTAINLY SHOULD NOT EVEN EXIST.

IF YOU THINK YOU NEED TO USE IT, YOU ARE ALMOST CERTAINLY WRONG.

;D

=Rob


Re: All writes fail with ONE consistency level when adding second node to cluster?

2014-07-23 Thread Kevin Burton
Interesting.. it was unclear what it does… ONE sounds right to me so I was
curious what was up with ANY.  We just set it to ANY so that we could track
down what was causing this bug.


On Wed, Jul 23, 2014 at 10:15 AM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Jul 22, 2014 at 7:46 PM, Andrew redmu...@gmail.com wrote:

 ONE means write to one replica (in addition to the original).  If you
 want to write to any of them, use ANY.  Is that the right understanding?


 This has come up a few times, so let me be unambiguous about when to use
 CL.ANY :

 NEVER EVER USE CL.ANY. IT ALMOST CERTAINLY SHOULD NOT EVEN EXIST.

 IF YOU THINK YOU NEED TO USE IT, YOU ARE ALMOST CERTAINLY WRONG.

 ;D

 =Rob




-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: All writes fail with ONE consistency level when adding second node to cluster?

2014-07-23 Thread graham sanderson
Hey now; it is GREAT for a 100% write only use case ;-)

On Jul 23, 2014, at 12:15 PM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Jul 22, 2014 at 7:46 PM, Andrew redmu...@gmail.com wrote:
 ONE means write to one replica (in addition to the original).  If you want to 
 write to any of them, use ANY.  Is that the right understanding?
 
 This has come up a few times, so let me be unambiguous about when to use 
 CL.ANY :
 
 NEVER EVER USE CL.ANY. IT ALMOST CERTAINLY SHOULD NOT EVEN EXIST.
 
 IF YOU THINK YOU NEED TO USE IT, YOU ARE ALMOST CERTAINLY WRONG.
 
 ;D
 
 =Rob
 



smime.p7s
Description: S/MIME cryptographic signature


Re: All writes fail with ONE consistency level when adding second node to cluster?

2014-07-23 Thread Robert Coli
On Wed, Jul 23, 2014 at 12:01 PM, graham sanderson gra...@vast.com wrote:

 Hey now; it is GREAT for a 100% write only use case ;-)


A well WORN [1] path in databases, for sure.

=Rob
[1] Write Once Read Never


Re: All writes fail with ONE consistency level when adding second node to cluster?

2014-07-23 Thread DuyHai Doan
Why that ? In worst case, CL.ANY will write hints for replicas that are
down. If will be extraordinary unlucky to have all replicas down at the
same time


On Wed, Jul 23, 2014 at 9:26 PM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Jul 23, 2014 at 12:01 PM, graham sanderson gra...@vast.com
 wrote:

 Hey now; it is GREAT for a 100% write only use case ;-)


 A well WORN [1] path in databases, for sure.

 =Rob
 [1] Write Once Read Never



Re: All writes fail with ONE consistency level when adding second node to cluster?

2014-07-23 Thread Jack Krupansky
Granted, for “normal” apps it is unlikely to be appropriate but...

From an old post by Jonathan:
---
Extreme write availability

For applications that want Cassandra to accept writes even when all the normal 
replicas are down (so even ConsistencyLevel.ONE cannot be satisfied), Cassandra 
provides ConsistencyLevel.ANY. ConsistencyLevel.ANY guarantees that the write 
is durable and will be readable once an appropriate replica target becomes 
available and receives the hint replay.
---
See:
http://www.datastax.com/dev/blog/understanding-hinted-handoff

I can think of a couple of use cases: sensor data where the devices are 
streaming frequently, so losing a reading is not a big deal because another 
reading is coming soon anyway, and a Twitter firehose where you are after a 
robust sample rather than absolute consistency. Minimizing network latency may 
be a bigger deal than whether immediate queries can see the data.

And as the description notes, hinted handoff will eventually propagate the data 
(unless it times out and drops the hint.)

-- Jack Krupansky

From: Robert Coli 
Sent: Wednesday, July 23, 2014 1:15 PM
To: user@cassandra.apache.org 
Cc: Kevin Burton 
Subject: Re: All writes fail with ONE consistency level when adding second node 
to cluster?

On Tue, Jul 22, 2014 at 7:46 PM, Andrew redmu...@gmail.com wrote:

ONE means write to one replica (in addition to the original).  If you want to 
write to any of them, use ANY.  Is that the right understanding?


This has come up a few times, so let me be unambiguous about when to use CL.ANY 
:

NEVER EVER USE CL.ANY. IT ALMOST CERTAINLY SHOULD NOT EVEN EXIST.

IF YOU THINK YOU NEED TO USE IT, YOU ARE ALMOST CERTAINLY WRONG.

;D

=Rob


Re: All writes fail with ONE consistency level when adding second node to cluster?

2014-07-23 Thread graham sanderson
I was being a little tongue in cheek!

On Jul 23, 2014, at 3:20 PM, Jack Krupansky j...@basetechnology.com wrote:

 Granted, for “normal” apps it is unlikely to be appropriate but...
  
 From an old post by Jonathan:
 ---
 Extreme write availability
  
 For applications that want Cassandra to accept writes even when all the 
 normal replicas are down (so even ConsistencyLevel.ONE cannot be satisfied), 
 Cassandra provides ConsistencyLevel.ANY. ConsistencyLevel.ANY guarantees that 
 the write is durable and will be readable once an appropriate replica target 
 becomes available and receives the hint replay.
 ---
 See:
 http://www.datastax.com/dev/blog/understanding-hinted-handoff
  
 I can think of a couple of use cases: sensor data where the devices are 
 streaming frequently, so losing a reading is not a big deal because another 
 reading is coming soon anyway, and a Twitter firehose where you are after a 
 robust sample rather than absolute consistency. Minimizing network latency 
 may be a bigger deal than whether immediate queries can see the data.
  
 And as the description notes, hinted handoff will eventually propagate the 
 data (unless it times out and drops the hint.)
  
 -- Jack Krupansky
  
 From: Robert Coli
 Sent: Wednesday, July 23, 2014 1:15 PM
 To: user@cassandra.apache.org
 Cc: Kevin Burton
 Subject: Re: All writes fail with ONE consistency level when adding second 
 node to cluster?
  
 On Tue, Jul 22, 2014 at 7:46 PM, Andrew redmu...@gmail.com wrote:
  
 ONE means write to one replica (in addition to the original).  If you want to 
 write to any of them, use ANY.  Is that the right understanding?
  
  
 This has come up a few times, so let me be unambiguous about when to use 
 CL.ANY :
  
 NEVER EVER USE CL.ANY. IT ALMOST CERTAINLY SHOULD NOT EVEN EXIST.
  
 IF YOU THINK YOU NEED TO USE IT, YOU ARE ALMOST CERTAINLY WRONG.
  
 ;D
  
 =Rob
  



smime.p7s
Description: S/MIME cryptographic signature


Re: All writes fail with ONE consistency level when adding second node to cluster?

2014-07-23 Thread Robert Coli
On Wed, Jul 23, 2014 at 1:18 PM, DuyHai Doan doanduy...@gmail.com wrote:

 Why that ? In worst case, CL.ANY will write hints for replicas that are
 down. If will be extraordinary unlucky to have all replicas down at the
 same time


Hints are not writes for the purposes of consistency or durability, so your
write hasn't actually succeeded. Most people don't have applications which
need a database to potentially persist a write.

In addition, the implementation details of Hinted Handoff can make ANY a
meaningful contributor to cascading failure mode when nodes are actually
hard down, because instead of excepting with not available exception (which
gives your app a chance to back off), you write hints. There is some
throttling in terms of how many hints can be in flight at once, but ones
over the threshold are dropped on the floor. I've seen nodes with more
hints data than actual data, and which were completely unable to ever
deliver and purge these hints, though they uselessly compacted them for
weeks on end. In most configs, you will end up discarding some subset of
these hints in the course of your cascading failure, but you will probably
not know which ones. You will also discard 100% of hints after three hours
in the default config. You might be happier to just get an exception at the
start of the incident, back off your application access a bit, and fix the
small subset of affected nodes?

In the future when hints are not handled via Column Families, ANY probably
gets a lot less risky in terms of overload-with-undelivered-hints, but
probably still doesn't actually provide what I consider worthwhile benefit.
It is of course possible that I have just never had or heard of a case for
which it was appropriate or necessary.

tl;dr - CL.ANY creates more risk of cases where you will write a bunch of
hints, and cases where you write a bunch of hints are almost never the
solution to any actual problem, because hints are not writes. If you really
really need extreme availability and can't do it via increasing RF, maybe
you might want to consider using CL.ANY. But probably not.

=Rob


Re: All writes fail with ONE consistency level when adding second node to cluster?

2014-07-22 Thread Andrew
ONE means write to one replica (in addition to the original).  If you want to 
write to any of them, use ANY.  Is that the right understanding?

http://www.datastax.com/docs/1.0/dml/data_consistency

Andrew

On July 22, 2014 at 7:43:43 PM, Kevin Burton (bur...@spinn3r.com) wrote:

I'm super confused by this.. and disturbed that this was my failure scenario :-(

I had one cassandra node for the alpha of my app… and now we're moving into 
beta… which means three replicas.

So I added the second node… but my app immediately broke with:

Cassandra timeout during write query at consistency ONE (2 replica were 
required but only 1 acknowledged the write)

… but that makes no sense… if I'm at ONE and I have one acknowledged write, why 
does it matter that the second one hasn't ack'd yet…

?

--
Founder/CEO Spinn3r.com
Location: San Francisco, CA
blog: http://burtonator.wordpress.com
… or check out my Google+ profile


Re: All writes fail with ONE consistency level when adding second node to cluster?

2014-07-22 Thread Kevin Burton
WEIRD that it was working before… with one node.  Granted that this is a
rare config (one cassandra node) but it shouldn't work then.

If you attempt to write ONE to a single cassandra node, there is no (in
addition to) additional node to write to…

So this should have failed.

Bug?

… and I know why this is failing… my cassandra node is joining the
cluster now, but none of the ports are open.  So all writes will fail… I
have NO idea why the ports aren't open yet .. but it's not a firewall issue.



On Tue, Jul 22, 2014 at 7:46 PM, Andrew redmu...@gmail.com wrote:

 ONE means write to one replica (in addition to the original).  If you want
 to write to any of them, use ANY.  Is that the right understanding?

 http://www.datastax.com/docs/1.0/dml/data_consistency

 Andrew

 On July 22, 2014 at 7:43:43 PM, Kevin Burton (bur...@spinn3r.com) wrote:

  I'm super confused by this.. and disturbed that this was my failure
 scenario :-(

 I had one cassandra node for the alpha of my app… and now we're moving
 into beta… which means three replicas.

 So I added the second node… but my app immediately broke with:

 Cassandra timeout during write query at consistency ONE (2 replica were
 required but only 1 acknowledged the write)

 … but that makes no sense… if I'm at ONE and I have one acknowledged
 write, why does it matter that the second one hasn't ack'd yet…

 ?

 --

  Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
  http://spinn3r.com




-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: All writes fail with ONE consistency level when adding second node to cluster?

2014-07-22 Thread Kevin Burton
Yeah.. that's fascinating … so now I get something that's even worse:

Cassandra timeout during write query at consistency ANY (2 replica were
required but only 1 acknowledged the write)

… the issue is that the new cassandra node has all its ports closed.

Only the storage port is open.

So obviously writes are going to fail to it.

… is this by design?  Perhaps it's not going to open the ports until the
node joins the ring?  It's currently joining …

so… basically, my entire cluster is offline during this join?

I assume this is either a bug or some weird state base on growing from 1-2
nodes?

frustrating :-(


On Tue, Jul 22, 2014 at 8:13 PM, graham sanderson gra...@vast.com wrote:

 Incorrect, ONE does not refer to the number of “other nodes, it just
 refers to the number of nodes. so ONE under normal circumstances would only
 require one node to acknowledge the write.

 The confusing error message you are getting is related to
 https://issues.apache.org/jira/browse/CASSANDRA-833… Kevin you are
 correct in that normally that error message would make no sense.

 I don’t have much experience adding/removing nodes, but I think what is
 happening is that your new node is in the middle of taken over ownership of
 a token range - while that happens C* is trying to write to both the old
 owner (your original node), AND (hence the 2 not 1 in the error message)
 the new owner (the new node) so that once the bootstrapping of the new node
 is complete, it is immediately safe to delete the (no longer owned data)
 from the old node. For whatever reason the write to the new node is timing
 out, causing the exception, and the error message is exposing the “2” which
 happens to be how many C* thinks it is waiting for at the time (i.e. how
 many it should be waiting for based on the consistency level (1) plus this
 extra node).


 On Jul 22, 2014, at 9:46 PM, Andrew redmu...@gmail.com wrote:

 ONE means write to one replica (in addition to the original).  If you want
 to write to any of them, use ANY.  Is that the right understanding?

 http://www.datastax.com/docs/1.0/dml/data_consistency

 Andrew

 On July 22, 2014 at 7:43:43 PM, Kevin Burton (bur...@spinn3r.com) wrote:

 I'm super confused by this.. and disturbed that this was my failure
 scenario :-(

 I had one cassandra node for the alpha of my app… and now we're moving
 into beta… which means three replicas.

 So I added the second node… but my app immediately broke with:

 Cassandra timeout during write query at consistency ONE (2 replica were
 required but only 1 acknowledged the write)

 … but that makes no sense… if I'm at ONE and I have one acknowledged
 write, why does it matter that the second one hasn't ack'd yet…

 ?

 --

 Founder/CEO Spinn3r.com http://spinn3r.com/
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com/





-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: All writes fail with ONE consistency level when adding second node to cluster?

2014-07-22 Thread Kevin Burton
and there are literally zero google hits on the query: Cassandra timeout
during write query at consistency ANY (2 replica were required but only 1
acknowledged the write)

.. so I imagine I'm the first to find this bug!  Aren't I lucky!


On Tue, Jul 22, 2014 at 8:46 PM, Kevin Burton bur...@spinn3r.com wrote:

 Yeah.. that's fascinating … so now I get something that's even worse:

 Cassandra timeout during write query at consistency ANY (2 replica were
 required but only 1 acknowledged the write)

 … the issue is that the new cassandra node has all its ports closed.

 Only the storage port is open.

 So obviously writes are going to fail to it.

 … is this by design?  Perhaps it's not going to open the ports until the
 node joins the ring?  It's currently joining …

 so… basically, my entire cluster is offline during this join?

 I assume this is either a bug or some weird state base on growing from 1-2
 nodes?

 frustrating :-(


 On Tue, Jul 22, 2014 at 8:13 PM, graham sanderson gra...@vast.com wrote:

 Incorrect, ONE does not refer to the number of “other nodes, it just
 refers to the number of nodes. so ONE under normal circumstances would only
 require one node to acknowledge the write.

 The confusing error message you are getting is related to
 https://issues.apache.org/jira/browse/CASSANDRA-833… Kevin you are
 correct in that normally that error message would make no sense.

 I don’t have much experience adding/removing nodes, but I think what is
 happening is that your new node is in the middle of taken over ownership of
 a token range - while that happens C* is trying to write to both the old
 owner (your original node), AND (hence the 2 not 1 in the error message)
 the new owner (the new node) so that once the bootstrapping of the new node
 is complete, it is immediately safe to delete the (no longer owned data)
 from the old node. For whatever reason the write to the new node is timing
 out, causing the exception, and the error message is exposing the “2” which
 happens to be how many C* thinks it is waiting for at the time (i.e. how
 many it should be waiting for based on the consistency level (1) plus this
 extra node).


 On Jul 22, 2014, at 9:46 PM, Andrew redmu...@gmail.com wrote:

 ONE means write to one replica (in addition to the original).  If you
 want to write to any of them, use ANY.  Is that the right understanding?

 http://www.datastax.com/docs/1.0/dml/data_consistency

 Andrew

 On July 22, 2014 at 7:43:43 PM, Kevin Burton (bur...@spinn3r.com) wrote:

 I'm super confused by this.. and disturbed that this was my failure
 scenario :-(

 I had one cassandra node for the alpha of my app… and now we're moving
 into beta… which means three replicas.

 So I added the second node… but my app immediately broke with:

 Cassandra timeout during write query at consistency ONE (2 replica
 were required but only 1 acknowledged the write)

 … but that makes no sense… if I'm at ONE and I have one acknowledged
 write, why does it matter that the second one hasn't ack'd yet…

 ?

 --

 Founder/CEO Spinn3r.com http://spinn3r.com/
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com/





 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com




-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: All writes fail with ONE consistency level when adding second node to cluster?

2014-07-22 Thread graham sanderson
I assumed you must have now switched to ANY which you probably didn’t want to 
do, and likely won’t help (and very few people use ANY which may explain the 
lack of google hits, plus this particular “Cassandra timeout during write query 
at consistency” error message comes from the datastax CQL java driver not C* 
itself.

In any case… my original response was just to explain to you that your 
understanding of what ONE means in general was correct, and this incorrect 
looking error message was a weird case during adding a node.

I have no idea what is going on with your bootstrapping node others may be able 
to help, but in the meanwhile I’d look for errors in the server log and google 
those and/or google for instructions on how to add nodes to a cassandra cluster 
on whatever version you are running.

On Jul 22, 2014, at 10:47 PM, Kevin Burton bur...@spinn3r.com wrote:

 and there are literally zero google hits on the query: Cassandra timeout 
 during write query at consistency ANY (2 replica were required but only 1 
 acknowledged the write)
 
 .. so I imagine I'm the first to find this bug!  Aren't I lucky!
 
 
 On Tue, Jul 22, 2014 at 8:46 PM, Kevin Burton bur...@spinn3r.com wrote:
 Yeah.. that's fascinating … so now I get something that's even worse:
 
 Cassandra timeout during write query at consistency ANY (2 replica were 
 required but only 1 acknowledged the write)
 
 … the issue is that the new cassandra node has all its ports closed.
 
 Only the storage port is open.
 
 So obviously writes are going to fail to it.
 
 … is this by design?  Perhaps it's not going to open the ports until the node 
 joins the ring?  It's currently joining …
 
 so… basically, my entire cluster is offline during this join?
 
 I assume this is either a bug or some weird state base on growing from 1-2 
 nodes?
 
 frustrating :-(
 
 
 On Tue, Jul 22, 2014 at 8:13 PM, graham sanderson gra...@vast.com wrote:
 Incorrect, ONE does not refer to the number of “other nodes, it just refers 
 to the number of nodes. so ONE under normal circumstances would only require 
 one node to acknowledge the write.
 
 The confusing error message you are getting is related to 
 https://issues.apache.org/jira/browse/CASSANDRA-833… Kevin you are correct in 
 that normally that error message would make no sense.
 
 I don’t have much experience adding/removing nodes, but I think what is 
 happening is that your new node is in the middle of taken over ownership of a 
 token range - while that happens C* is trying to write to both the old owner 
 (your original node), AND (hence the 2 not 1 in the error message) the new 
 owner (the new node) so that once the bootstrapping of the new node is 
 complete, it is immediately safe to delete the (no longer owned data) from 
 the old node. For whatever reason the write to the new node is timing out, 
 causing the exception, and the error message is exposing the “2” which 
 happens to be how many C* thinks it is waiting for at the time (i.e. how many 
 it should be waiting for based on the consistency level (1) plus this extra 
 node).
 
 
 On Jul 22, 2014, at 9:46 PM, Andrew redmu...@gmail.com wrote:
 
 ONE means write to one replica (in addition to the original).  If you want 
 to write to any of them, use ANY.  Is that the right understanding?
 
 http://www.datastax.com/docs/1.0/dml/data_consistency
 
 Andrew
 
 On July 22, 2014 at 7:43:43 PM, Kevin Burton (bur...@spinn3r.com) wrote:
 
 I'm super confused by this.. and disturbed that this was my failure 
 scenario :-(
 
 I had one cassandra node for the alpha of my app… and now we're moving into 
 beta… which means three replicas.
 
 So I added the second node… but my app immediately broke with:
 
 Cassandra timeout during write query at consistency ONE (2 replica were 
 required but only 1 acknowledged the write)
 
 … but that makes no sense… if I'm at ONE and I have one acknowledged write, 
 why does it matter that the second one hasn't ack'd yet…
 
 ?
 
 --
 
 Founder/CEO Spinn3r.com
 Location: San Francisco, CA
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 
 
 
 
 
 -- 
 
 Founder/CEO Spinn3r.com
 Location: San Francisco, CA
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 
 
 
 
 
 -- 
 
 Founder/CEO Spinn3r.com
 Location: San Francisco, CA
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 
 



smime.p7s
Description: S/MIME cryptographic signature


Re: All writes fail with ONE consistency level when adding second node to cluster?

2014-07-22 Thread Kevin Burton
Thanks of the feedback…

In hindsight.. I think what happened was that the new node started up… and
the driver wanted to write records to it… but the ports weren't up.

so I wonder if this is a bug in the datastax driver.

On bootstrap, and when joining, does cassandra always keep the ports
offline and then only open them up with the node has joined?


On Tue, Jul 22, 2014 at 8:55 PM, graham sanderson gra...@vast.com wrote:

 I assumed you must have now switched to ANY which you probably didn’t want
 to do, and likely won’t help (and very few people use ANY which may explain
 the lack of google hits, plus this particular “Cassandra timeout during
 write query at consistency” error message comes from the datastax CQL java
 driver not C* itself.

 In any case… my original response was just to explain to you that your
 understanding of what ONE means in general was correct, and this incorrect
 looking error message was a weird case during adding a node.

 I have no idea what is going on with your bootstrapping node others may be
 able to help, but in the meanwhile I’d look for errors in the server log
 and google those and/or google for instructions on how to add nodes to a
 cassandra cluster on whatever version you are running.

 On Jul 22, 2014, at 10:47 PM, Kevin Burton bur...@spinn3r.com wrote:

 and there are literally zero google hits on the query: Cassandra timeout
 during write query at consistency ANY (2 replica were required but only 1
 acknowledged the write)

 .. so I imagine I'm the first to find this bug!  Aren't I lucky!


 On Tue, Jul 22, 2014 at 8:46 PM, Kevin Burton bur...@spinn3r.com wrote:

 Yeah.. that's fascinating … so now I get something that's even worse:

 Cassandra timeout during write query at consistency ANY (2 replica were
 required but only 1 acknowledged the write)

 … the issue is that the new cassandra node has all its ports closed.

 Only the storage port is open.

 So obviously writes are going to fail to it.

 … is this by design?  Perhaps it's not going to open the ports until the
 node joins the ring?  It's currently joining …

 so… basically, my entire cluster is offline during this join?

 I assume this is either a bug or some weird state base on growing from
 1-2 nodes?

 frustrating :-(


 On Tue, Jul 22, 2014 at 8:13 PM, graham sanderson gra...@vast.com
 wrote:

 Incorrect, ONE does not refer to the number of “other nodes, it just
 refers to the number of nodes. so ONE under normal circumstances would only
 require one node to acknowledge the write.

 The confusing error message you are getting is related to
 https://issues.apache.org/jira/browse/CASSANDRA-833… Kevin you are
 correct in that normally that error message would make no sense.

 I don’t have much experience adding/removing nodes, but I think what is
 happening is that your new node is in the middle of taken over ownership of
 a token range - while that happens C* is trying to write to both the old
 owner (your original node), AND (hence the 2 not 1 in the error message)
 the new owner (the new node) so that once the bootstrapping of the new node
 is complete, it is immediately safe to delete the (no longer owned data)
 from the old node. For whatever reason the write to the new node is timing
 out, causing the exception, and the error message is exposing the “2” which
 happens to be how many C* thinks it is waiting for at the time (i.e. how
 many it should be waiting for based on the consistency level (1) plus this
 extra node).


 On Jul 22, 2014, at 9:46 PM, Andrew redmu...@gmail.com wrote:

 ONE means write to one replica (in addition to the original).  If you
 want to write to any of them, use ANY.  Is that the right understanding?

 http://www.datastax.com/docs/1.0/dml/data_consistency

 Andrew

 On July 22, 2014 at 7:43:43 PM, Kevin Burton (bur...@spinn3r.com) wrote:

 I'm super confused by this.. and disturbed that this was my failure
 scenario :-(

 I had one cassandra node for the alpha of my app… and now we're moving
 into beta… which means three replicas.

 So I added the second node… but my app immediately broke with:

 Cassandra timeout during write query at consistency ONE (2 replica
 were required but only 1 acknowledged the write)

 … but that makes no sense… if I'm at ONE and I have one acknowledged
 write, why does it matter that the second one hasn't ack'd yet…

 ?

 --

 Founder/CEO Spinn3r.com http://spinn3r.com/
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com/





 --

 Founder/CEO Spinn3r.com http://spinn3r.com/
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com/




 --

 Founder/CEO Spinn3r.com http://spinn3r.com/
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 

Re: All writes fail with ONE consistency level when adding second node to cluster?

2014-07-22 Thread Andrew
I looked into this; ONE means it must be written to one replica—i.e., a node 
the data is supposed to be written to.  ANY means a hinted handoff will 
“count”.  So as long as it writes to any node on the cluster—even one that it’s 
not supposed to be on—it will be a success.  Good to know.

Andrew

On July 22, 2014 at 8:13:57 PM, graham sanderson (gra...@vast.com) wrote:

Incorrect, ONE does not refer to the number of “other nodes, it just refers to 
the number of nodes. so ONE under normal circumstances would only require one 
node to acknowledge the write.

The confusing error message you are getting is related to 
https://issues.apache.org/jira/browse/CASSANDRA-833… Kevin you are correct in 
that normally that error message would make no sense.

I don’t have much experience adding/removing nodes, but I think what is 
happening is that your new node is in the middle of taken over ownership of a 
token range - while that happens C* is trying to write to both the old owner 
(your original node), AND (hence the 2 not 1 in the error message) the new 
owner (the new node) so that once the bootstrapping of the new node is 
complete, it is immediately safe to delete the (no longer owned data) from the 
old node. For whatever reason the write to the new node is timing out, causing 
the exception, and the error message is exposing the “2” which happens to be 
how many C* thinks it is waiting for at the time (i.e. how many it should be 
waiting for based on the consistency level (1) plus this extra node).


On Jul 22, 2014, at 9:46 PM, Andrew redmu...@gmail.com wrote:

ONE means write to one replica (in addition to the original).  If you want to 
write to any of them, use ANY.  Is that the right understanding?

http://www.datastax.com/docs/1.0/dml/data_consistency

Andrew

On July 22, 2014 at 7:43:43 PM, Kevin Burton (bur...@spinn3r.com) wrote:

I'm super confused by this.. and disturbed that this was my failure scenario :-(

I had one cassandra node for the alpha of my app… and now we're moving into 
beta… which means three replicas.

So I added the second node… but my app immediately broke with:

Cassandra timeout during write query at consistency ONE (2 replica were 
required but only 1 acknowledged the write)

… but that makes no sense… if I'm at ONE and I have one acknowledged write, why 
does it matter that the second one hasn't ack'd yet…

?

--

Founder/CEO Spinn3r.com
Location: San Francisco, CA
blog: http://burtonator.wordpress.com
… or check out my Google+ profile