[jira] [Created] (CASSANDRA-6156) Poor resilience and recovery for bootstrapping node - unable to fetch range

2013-10-07 Thread Alyssa Kwan (JIRA)
Alyssa Kwan created CASSANDRA-6156:
--

 Summary: Poor resilience and recovery for bootstrapping node - 
unable to fetch range
 Key: CASSANDRA-6156
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6156
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Alyssa Kwan
 Fix For: 1.2.8


We have an 8 node cluster on 1.2.8 using vnodes.  One of our nodes failed and 
we are having lots of trouble bootstrapping it back.  On each attempt, 
bootstrapping eventually fails with a RuntimeException Unable to fetch range. 
 As far as we can tell, long GC pauses on the sender side cause heartbeat drops 
or delays, which leads the gossip controller to convict the connection and mark 
the sender dead.  We've done significant GC tuning to minimize the duration of 
pauses and raised phi_convict to its max.  It merely lets the bootstrap process 
take longer to fail.

The inability to reliably add nodes significantly affects our ability to scale.

We're not the only ones:  
http://stackoverflow.com/questions/19199349/cassandra-bootstrap-fails-with-unable-to-fetch-range

What can we do in the immediate term to bring this node in?  And what's the 
long term solution?

One possible solution would be to allow bootstrapping to be an incremental 
process with individual transfers of vnode ownership instead of attempting to 
transfer the whole set of vnodes transactionally.  (I assume that's what's 
happening now.)  I don't know what would have to change on the gossip and 
token-aware client side to support this.

Another solution would be to partition sstable files by vnode and allow 
transfer of those files directly with some sort of checkpointing of and 
incremental transfer of writes after the sstable is transferred.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (CASSANDRA-5977) Structure for cfstats output (JSON, YAML, or XML)

2013-09-05 Thread Alyssa Kwan (JIRA)
Alyssa Kwan created CASSANDRA-5977:
--

 Summary: Structure for cfstats output (JSON, YAML, or XML)
 Key: CASSANDRA-5977
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5977
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Alyssa Kwan
Priority: Minor


nodetool cfstats should take a --format arg that structures the output in JSON, 
YAML, or XML.  This would be useful for piping into another script that can 
easily parse this and act on it.  It would also help those of us who use things 
like MCollective gather aggregate stats across clusters/nodes.

Thoughts?  I can submit a patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira