Lots of write timeouts and missing data during decomission/bootstrap

2015-07-01 Thread Kevin Burton
We get lots of write timeouts when we decommission a node.  About 80% of
them are write timeout and just about 20% of them are read timeout.

We’ve tried to adjust streamthroughput (and compaction throughput) for that
matter and that doesn’t resolve the issue.

We’ve increased write_request_timeout_in_ms … and read timeout as well.

Is there anything else I should be looking at?

I can’t seem to find the documentation that explains what the heck is
happening.

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts


Re: Lots of write timeouts and missing data during decomission/bootstrap

2015-07-01 Thread Kevin Burton
Looks like all of this is happening because we’re using CAS operations and
the driver is going to SERIAL consistency level.

SERIAL and LOCAL_SERIAL write failure scenarios¶

 http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html?scroll=concept_ds_umf_5xx_zj__failure-scenariosIf
 one of three nodes is down, the Paxos commit fails under the following
 conditions:

- CQL query-configured consistency level of ALL


- Driver-configured serial consistency level of SERIAL


- Replication factor of 3


I don’t understand why this would fail.. it seems completely broken in this
situation.

We were having write timeout at replication factor of 2 .. and a lot of
people from the list said of course , because 2 nodes with 1 node down
means there’s no quorum and paxos needs a quorum.  .. and not sure why I
missed that :-P

So we went with 3 replicas, and a quorum,

but this is new and I didn’t see this documented.  We set the driver to
QUORUM but then I guess the driver sees that this is a CAS operation and
forces it back to SERIAL?  Doesn’t this mean that all decommissions result
in failures of CAS?

This is Cassandra 2.0.9 btw.


On Wed, Jul 1, 2015 at 2:22 PM, Kevin Burton bur...@spinn3r.com wrote:

 We get lots of write timeouts when we decommission a node.  About 80% of
 them are write timeout and just about 20% of them are read timeout.

 We’ve tried to adjust streamthroughput (and compaction throughput) for
 that matter and that doesn’t resolve the issue.

 We’ve increased write_request_timeout_in_ms … and read timeout as well.

 Is there anything else I should be looking at?

 I can’t seem to find the documentation that explains what the heck is
 happening.

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts




-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts


Re: Lots of write timeouts and missing data during decomission/bootstrap

2015-07-01 Thread Robert Coli
On Wed, Jul 1, 2015 at 2:58 PM, Kevin Burton bur...@spinn3r.com wrote:

 Looks like all of this is happening because we’re using CAS operations and
 the driver is going to SERIAL consistency level.
 ...
 This is Cassandra 2.0.9 btw.


 https://issues.apache.org/jira/browse/CASSANDRA-8640

=Rob
(credit to iamaleksey on IRC for remembering the JIRA #)


Re: Lots of write timeouts and missing data during decomission/bootstrap

2015-07-01 Thread Kevin Burton
WOW.. nice. you rock!!

On Wed, Jul 1, 2015 at 3:18 PM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Jul 1, 2015 at 2:58 PM, Kevin Burton bur...@spinn3r.com wrote:

 Looks like all of this is happening because we’re using CAS operations
 and the driver is going to SERIAL consistency level.
 ...
 This is Cassandra 2.0.9 btw.


  https://issues.apache.org/jira/browse/CASSANDRA-8640

 =Rob
 (credit to iamaleksey on IRC for remembering the JIRA #)




-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts