Re: All writes fail with ONE consistency level when adding second node to cluster?

Kevin Burton Tue, 22 Jul 2014 21:07:29 -0700

Thanks of the feedback…

In hindsight.. I think what happened was that the new node started up… and
the driver wanted to write records to it… but the ports weren't up.


so I wonder if this is a bug in the datastax driver.

On bootstrap, and when joining, does cassandra always keep the ports
offline and then only open them up with the node has joined?


On Tue, Jul 22, 2014 at 8:55 PM, graham sanderson <gra...@vast.com> wrote:

> I assumed you must have now switched to ANY which you probably didn’t want
> to do, and likely won’t help (and very few people use ANY which may explain
> the lack of google hits, plus this particular “Cassandra timeout during
> write query at consistency” error message comes from the datastax CQL java
> driver not C* itself.
>
> In any case… my original response was just to explain to you that your
> understanding of what ONE means in general was correct, and this incorrect
> looking error message was a weird case during adding a node.
>
> I have no idea what is going on with your bootstrapping node others may be
> able to help, but in the meanwhile I’d look for errors in the server log
> and google those and/or google for instructions on how to add nodes to a
> cassandra cluster on whatever version you are running.
>
> On Jul 22, 2014, at 10:47 PM, Kevin Burton <bur...@spinn3r.com> wrote:
>
> and there are literally zero google hits on the query: "Cassandra timeout
> during write query at consistency ANY (2 replica were required but only 1
> acknowledged the write)"
>
> .. so I imagine I'm the first to find this bug!  Aren't I lucky!
>
>
> On Tue, Jul 22, 2014 at 8:46 PM, Kevin Burton <bur...@spinn3r.com> wrote:
>
>> Yeah.. that's fascinating … so now I get something that's even worse:
>>
>> "Cassandra timeout during write query at consistency ANY (2 replica were
>> required but only 1 acknowledged the write)"
>>
>> … the issue is that the new cassandra node has all its ports closed.
>>
>> Only the storage port is open.
>>
>> So obviously writes are going to fail to it.
>>
>> … is this by design?  Perhaps it's not going to open the ports until the
>> node joins the ring?  It's currently "joining" …
>>
>> so… basically, my entire cluster is offline during this join?
>>
>> I assume this is either a bug or some weird state base on growing from
>> 1-2 nodes?
>>
>> frustrating :-(
>>
>>
>> On Tue, Jul 22, 2014 at 8:13 PM, graham sanderson <gra...@vast.com>
>> wrote:
>>
>>> Incorrect, ONE does not refer to the number of “other" nodes, it just
>>> refers to the number of nodes. so ONE under normal circumstances would only
>>> require one node to acknowledge the write.
>>>
>>> The confusing error message you are getting is related to
>>> https://issues.apache.org/jira/browse/CASSANDRA-833… Kevin you are
>>> correct in that normally that error message would make no sense.
>>>
>>> I don’t have much experience adding/removing nodes, but I think what is
>>> happening is that your new node is in the middle of taken over ownership of
>>> a token range - while that happens C* is trying to write to both the old
>>> owner (your original node), AND (hence the 2 not 1 in the error message)
>>> the new owner (the new node) so that once the bootstrapping of the new node
>>> is complete, it is immediately safe to delete the (no longer owned data)
>>> from the old node. For whatever reason the write to the new node is timing
>>> out, causing the exception, and the error message is exposing the “2” which
>>> happens to be how many C* thinks it is waiting for at the time (i.e. how
>>> many it should be waiting for based on the consistency level (1) plus this
>>> extra node).
>>>
>>>
>>> On Jul 22, 2014, at 9:46 PM, Andrew <redmu...@gmail.com> wrote:
>>>
>>> ONE means write to one replica (in addition to the original).  If you
>>> want to write to any of them, use ANY.  Is that the right understanding?
>>>
>>> http://www.datastax.com/docs/1.0/dml/data_consistency
>>>
>>> Andrew
>>>
>>> On July 22, 2014 at 7:43:43 PM, Kevin Burton (bur...@spinn3r.com) wrote:
>>>
>>> I'm super confused by this.. and disturbed that this was my failure
>>> scenario :-(
>>>
>>> I had one cassandra node for the alpha of my app… and now we're moving
>>> into beta… which means three replicas.
>>>
>>> So I added the second node… but my app immediately broke with:
>>>
>>> ""Cassandra timeout during write query at consistency ONE (2 replica
>>> were required but only 1 acknowledged the write)""
>>>
>>> … but that makes no sense… if I'm at ONE and I have one acknowledged
>>> write, why does it matter that the second one hasn't ack'd yet…
>>>
>>> ?
>>>
>>> --
>>>
>>> Founder/CEO Spinn3r.com <http://spinn3r.com/>
>>> Location: *San Francisco, CA*
>>> blog: http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>> <https://plus.google.com/102718274791889610666/posts>
>>> <http://spinn3r.com/>
>>>
>>>
>>>
>>
>>
>> --
>>
>> Founder/CEO Spinn3r.com <http://spinn3r.com/>
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> <https://plus.google.com/102718274791889610666/posts>
>> <http://spinn3r.com/>
>>
>>
>
>
> --
>
> Founder/CEO Spinn3r.com <http://spinn3r.com/>
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
> <http://spinn3r.com/>
>
>
>


-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>

Re: All writes fail with ONE consistency level when adding second node to cluster?

Reply via email to