subject:"Lots of write timeouts and missing data during decomission\/bootstrap"

Lots of write timeouts and missing data during decomission/bootstrap

2015-07-01 Thread Kevin Burton

We get lots of write timeouts when we decommission a node.  About 80% of
them are write timeout and just about 20% of them are read timeout.

We’ve tried to adjust streamthroughput (and compaction throughput) for that
matter and that doesn’t resolve the issue.

We’ve increased write_request_timeout_in_ms … and read timeout as well.

Is there anything else I should be looking at?

I can’t seem to find the documentation that explains what the heck is
happening.

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts

Re: Lots of write timeouts and missing data during decomission/bootstrap

2015-07-01 Thread Kevin Burton

Looks like all of this is happening because we’re using CAS operations and
the driver is going to SERIAL consistency level.

SERIAL and LOCAL_SERIAL write failure scenarios¶

http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html?scroll=concept_ds_umf_5xx_zj__failure-scenariosIf
one of three nodes is down, the Paxos commit fails under the following
conditions:

- CQL query-configured consistency level of ALL

- Driver-configured serial consistency level of SERIAL

- Replication factor of 3

I don’t understand why this would fail.. it seems completely broken in this
situation.

We were having write timeout at replication factor of 2 .. and a lot of
people from the list said of course , because 2 nodes with 1 node down
means there’s no quorum and paxos needs a quorum. .. and not sure why I
missed that :-P

So we went with 3 replicas, and a quorum,

but this is new and I didn’t see this documented. We set the driver to
QUORUM but then I guess the driver sees that this is a CAS operation and
forces it back to SERIAL? Doesn’t this mean that all decommissions result
in failures of CAS?

This is Cassandra 2.0.9 btw.

On Wed, Jul 1, 2015 at 2:22 PM, Kevin Burton bur...@spinn3r.com wrote:

We get lots of write timeouts when we decommission a node. About 80% of
them are write timeout and just about 20% of them are read timeout.

We’ve tried to adjust streamthroughput (and compaction throughput) for
that matter and that doesn’t resolve the issue.

We’ve increased write_request_timeout_in_ms … and read timeout as well.

Is there anything else I should be looking at?

I can’t seem to find the documentation that explains what the heck is
happening.

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts

Re: Lots of write timeouts and missing data during decomission/bootstrap

2015-07-01 Thread Robert Coli

On Wed, Jul 1, 2015 at 2:58 PM, Kevin Burton bur...@spinn3r.com wrote:

 Looks like all of this is happening because we’re using CAS operations and
 the driver is going to SERIAL consistency level.
 ...
 This is Cassandra 2.0.9 btw.


 https://issues.apache.org/jira/browse/CASSANDRA-8640

=Rob
(credit to iamaleksey on IRC for remembering the JIRA #)

Re: Lots of write timeouts and missing data during decomission/bootstrap

2015-07-01 Thread Kevin Burton

WOW.. nice. you rock!!

On Wed, Jul 1, 2015 at 3:18 PM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Jul 1, 2015 at 2:58 PM, Kevin Burton bur...@spinn3r.com wrote:

 Looks like all of this is happening because we’re using CAS operations
 and the driver is going to SERIAL consistency level.
 ...
 This is Cassandra 2.0.9 btw.


  https://issues.apache.org/jira/browse/CASSANDRA-8640

 =Rob
 (credit to iamaleksey on IRC for remembering the JIRA #)




-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts

Lots of write timeouts and missing data during decomission/bootstrap

Re: Lots of write timeouts and missing data during decomission/bootstrap

Re: Lots of write timeouts and missing data during decomission/bootstrap

Re: Lots of write timeouts and missing data during decomission/bootstrap

4 matches

Site Navigation

Mail list logo

Footer information