Re: Cassandra vnodes Streaming Reliability Calculator

2019-02-16 Thread James Briggs
Hi Ken.
1) Thanks for the great link.
Ironically it was written by Netflix, who continued to use single tokenfor 
years after vnodes were released so that they could continue touse Priam and 
their other tools dependent on single token. (I was in theearly Cassandra group 
there.)

2) My tool agrees overall with their findings:
a) it does reflect that increasing numbers of vnodes and nodes reduce 
reliabilitydramatically, so the results are conceptually the same and the 
deltas atdifferent vnode counts matches what I see in my calculator.

b) but it uses a more complicated model. I'm happy with my calculator thatlooks 
at simple "probability of a streaming connection failed for any reason"and is 
immediately usable by any DBA or SRE.

3) As an Operations DBA, their reference to "centuries" made me laugh 
though.Note that my calculations are about failures within one week, which 
alignsmore with my experience. So either they're overly optimistic, or I'm 
pessimistic.
You can verify which by doing a grep of your logs on a production cluster fora 
month and counting how many connection failures there were. My blogpost has 
some links to actual error message to grep for. 4) Note that Datastax 
recommends 8 vnodes now. See my blog for the reference.
Thanks, James Briggs.
--
Cassandra/MySQL DBA. Available in Bay area or remote.
cass_top: https://github.com/jamesbriggs/cassandra-top

  From: Kenneth Brotman 
 To: user@cassandra.apache.org 
 Sent: Saturday, February 16, 2019 5:00 AM
 Subject: RE: Cassandra vnodes Streaming Reliability Calculator
   
#yiv4674113709 #yiv4674113709 -- _filtered #yiv4674113709 
{font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;} _filtered #yiv4674113709 
{font-family:Tahoma;panose-1:2 11 6 4 3 5 4 4 2 4;}#yiv4674113709 
#yiv4674113709 p.yiv4674113709MsoNormal, #yiv4674113709 
li.yiv4674113709MsoNormal, #yiv4674113709 div.yiv4674113709MsoNormal 
{margin:0in;margin-bottom:.0001pt;font-size:12.0pt;font-family:New;}#yiv4674113709
 a:link, #yiv4674113709 span.yiv4674113709MsoHyperlink 
{color:blue;text-decoration:underline;}#yiv4674113709 a:visited, #yiv4674113709 
span.yiv4674113709MsoHyperlinkFollowed 
{color:purple;text-decoration:underline;}#yiv4674113709 
span.yiv4674113709EmailStyle17 {color:#1F497D;}#yiv4674113709 
.yiv4674113709MsoChpDefault {font-size:10.0pt;} _filtered #yiv4674113709 
{margin:1.0in 1.0in 1.0in 1.0in;}#yiv4674113709 div.yiv4674113709WordSection1 
{}#yiv4674113709 Hi James,  Thanks for doing that.  Very interesting.  I 
haven’t had a chance to check the math.  Did you look at this white paper by 
Lynch and Snyder called Cassandra Availability with Virtual Nodes: 
https://github.com/jolynch/python_performance_toolkit/blob/master/notebooks/cassandra_availability/whitepaper/cassandra-availability-virtual.pdf
  Are the calculations consistent with your online calculator?  Thanks again,  
Kenneth Brotman  From: James Briggs [mailto:james.bri...@yahoo.com.INVALID] 
Sent: Friday, February 15, 2019 7:42 PM
To: user@cassandra.apache.org
Subject: Cassandra vnodes Streaming Reliability Calculator  Hi folks.



Please check out my online vnodes reliability calculator and reply with any 
feedback:http://www.jebriggs.com/blog/2019/02/cassandra-vnodes-reliability-calculator/
  Thanks, James Briggs.
--
Cassandra/MySQL DBA. Available in Bay Area or remote.
cass_top: https://github.com/jamesbriggs/cassandra-top

   

Cassandra vnodes Streaming Reliability Calculator

2019-02-15 Thread James Briggs
Hi folks.

Please check out my online vnodes reliability calculator and reply with any 
feedback:http://www.jebriggs.com/blog/2019/02/cassandra-vnodes-reliability-calculator/
Thanks, James Briggs.
--
Cassandra/MySQL DBA. Available in Bay Area or remote.
cass_top: https://github.com/jamesbriggs/cassandra-top

Re: Long GC Pauses

2018-11-19 Thread James Briggs
General best practices with Java 8:

If you have enough RAM for 24 GB heap, use G1 GC.If you have less RAM, then use 
CMS with a medium-sized heap setting so theGC time is not as long but more 
frequent.
Graph memory use with Grafana or something and let people know what's happening.

https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsTuneJVM.html
http://thelastpickle.com/blog/2018/04/11/gc-tuning.html

Your cluster:
Which version of Java?
How much RAM do your systems have?Is it the same on all nodes?What are your 
current heap settings?Anything else? 
Thanks, James Briggs.
--
Cassandra/MySQL DBA. Available in San Jose area or remote.
cass_top: https://github.com/jamesbriggs/cassandra-top

  From: Rajasekhar Kommineni 
 To: user@cassandra.apache.org 
 Sent: Monday, November 19, 2018 2:33 PM
 Subject: Long GC Pauses
   
Hi All,

My C cluster configuration.

1) 2 DC with 4 nodes each and Replication Factor of 3 per each DC
2) Writes(Bulk data load) are done to 2nd  DC and Application (reads) are done 
from 1st DC.
3) CMS GC

Issue : Observing long GC pauses during data load and timeouts from application 
(reads) during the same time.

Question : 

1)Why am I seeing GC pauses on 1st DC , even though I am using 
stream_throughput of 16 Mb/s. 
2) Is there any way to reduce the GC pause times other than changing it.

Thanks,



-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org


   

Re: Jepsen testing

2018-11-13 Thread James Briggs
For those relatively new to Cassandra, Riptano is the previous company name for 
Datastax, back in 2011. :) 
http://www.h-online.com/open/news/item/Cassandra-service-company-Riptano-changes-name-to-DataStax.html
Thanks, James Briggs.
--
Cassandra/MySQL DBA. Available in San Jose area or remote.
cass_top: https://github.com/jamesbriggs/cassandra-top

  From: Oleksandr Shulgin 
 To: User  
Cc: d...@cassandra.apache.org
 Sent: Friday, November 9, 2018 12:33 AM
 Subject: Re: Jepsen testing
   
On Thu, Nov 8, 2018 at 10:42 PM Yuji Ito  wrote:


We are working on Jepsen testing for 
Cassandra.https://github.com/scalar-labs/jepsen/tree/cassandra/cassandra

As you may know, Jepsen is a framework for distributed systems verification.It 
can inject network failure and so on and check data 
consistency.https://github.com/jepsen-io/jepsen

Our tests are based on riptano's great 
work.https://github.com/riptano/jepsen/tree/cassandra/cassandra
I refined it for the latest Jepsen and removed some tests.Next, I'll fix 
clock-drift tests.

I would like to get your feedback.

Cool stuff!  Do you have jepsen tests as part of regular testing in scalardb?  
How long does it take to run all of them on average?
I wonder if Apache Cassandra would be willing to include this as part of 
regular testing drill as well.
Cheers,--Alex


   

Re: JBOD disk failure - just say no

2018-08-20 Thread James Briggs
Cassandra JBOD has a bunch of issues, so I don't recommend it for production:
1) disks fill up with load (data) unevenly, meaning you can run out on a disk 
while some are half-full2) one bad disk can take out the whole node3) instead 
of a small failure probability on an LVM/RAID volume, with JBOD you end up near 
100% chance of failure after 3 years or so.4) generally you will not have 
enough warning of a looming failure with JBOD compared to LVM/RAID. 
(Somecompanies take a week or two to replace a failed disk.)
JBOD is easy to setup, but hard to manage. Thanks, James.


  From: kurt greaves 
 To: User  
 Sent: Friday, August 17, 2018 5:42 AM
 Subject: Re: JBOD disk failure
   
As far as I'm aware, yes. I recall hearing someone mention tying system tables 
to a particular disk but at the moment that doesn't exist.
On Fri., 17 Aug. 2018, 01:04 Eric Evans,  wrote:

On Wed, Aug 15, 2018 at 3:23 AM kurt greaves  wrote:
> Yep. It might require a full node replace depending on what data is lost from 
> the system tables. In some cases you might be able to recover from partially 
> lost system info, but it's not a sure thing.

Ugh, does it really just boil down to what part of `system` happens to
be on the disk in question?  In my mind, that makes the only sane
operational procedure for a failed disk to be: "replace the entire
node".  IOW, I don't think we can realistically claim you can survive
a failed a JBOD device if it relies on happenstance.

> On Wed., 15 Aug. 2018, 17:55 Christian Lorenz,  > wrote:
>>
>> Thank you for the answers. We are using the current version 3.11.3 So this 
>> one includes CASSANDRA-6696.
>>
>> So if I get this right, losing system tables will need a full node rebuild. 
>> Otherwise repair will get the node consistent again.
>
> [ ... ]

-- 
Eric Evans
john.eric.ev...@gmail.com

-- -- -
To unsubscribe, e-mail: user-unsubscribe@cassandra. apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org




   

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-19 Thread James Briggs
Kenneth:
What you said is not wrong.

Vertica and Riak are examples of distributed databases that don't require 
hand-holding.

Cassandra is for Java-programmer DIYers, or more often Datastax clients, at 
this point.
Thanks, James.

  From: Kenneth Brotman 
 To: user@cassandra.apache.org 
Cc: d...@cassandra.apache.org
 Sent: Monday, February 19, 2018 4:56 PM
 Subject: RE: Cassandra Needs to Grow Up by Version Five!
   
#yiv0297673896 #yiv0297673896 -- _filtered #yiv0297673896 
{font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;} _filtered #yiv0297673896 
{font-family:Tahoma;panose-1:2 11 6 4 3 5 4 4 2 4;}#yiv0297673896 
#yiv0297673896 p.yiv0297673896MsoNormal, #yiv0297673896 
li.yiv0297673896MsoNormal, #yiv0297673896 div.yiv0297673896MsoNormal 
{margin:0in;margin-bottom:.0001pt;font-size:12.0pt;}#yiv0297673896 a:link, 
#yiv0297673896 span.yiv0297673896MsoHyperlink 
{color:blue;text-decoration:underline;}#yiv0297673896 a:visited, #yiv0297673896 
span.yiv0297673896MsoHyperlinkFollowed 
{color:purple;text-decoration:underline;}#yiv0297673896 
p.yiv0297673896MsoAcetate, #yiv0297673896 li.yiv0297673896MsoAcetate, 
#yiv0297673896 div.yiv0297673896MsoAcetate 
{margin:0in;margin-bottom:.0001pt;font-size:8.0pt;}#yiv0297673896 
span.yiv0297673896EmailStyle17 {color:#1F497D;}#yiv0297673896 
span.yiv0297673896BalloonTextChar {}#yiv0297673896 
span.yiv0297673896EmailStyle20 {color:#1F497D;}#yiv0297673896 
.yiv0297673896MsoChpDefault {font-size:10.0pt;} _filtered #yiv0297673896 
{margin:1.0in 1.0in 1.0in 1.0in;}#yiv0297673896 div.yiv0297673896WordSection1 
{}#yiv0297673896 Jeff, you helped me figure out what I was missing.  It just 
took me a day to digest what you wrote.  I’m coming over from another type of 
engineering.  I didn’t know and it’s not really documented.  Cassandra runs in 
a data center.  Now days that means the nodes are going to be in managed 
containers, Docker containers, managed by Kerbernetes,  Meso or something, and 
for that reason anyone operating Cassandra in a real world setting would not 
encounter the issues I raised in the way I described.  Shouldn’t the 
architectural diagrams people reference indicate that in some way?  That would 
have help me.  Kenneth Brotman  From: Kenneth Brotman 
[mailto:kenbrot...@yahoo.com] 
Sent: Monday, February 19, 2018 10:43 AM
To: 'user@cassandra.apache.org'
Cc: 'd...@cassandra.apache.org'
Subject: RE: Cassandra Needs to Grow Up by Version Five!  Well said.  Very 
fair.  I wouldn’t mind hearing from others still.  You’re a good guy!  Kenneth 
Brotman  From: Jeff Jirsa [mailto:jji...@gmail.com] 
Sent: Monday, February 19, 2018 9:10 AM
To: cassandra
Cc: Cassandra DEV
Subject: Re: Cassandra Needs to Grow Up by Version Five!  There's a lot of 
things below I disagree with, but it's ok. I convinced myself not to nit-pick 
every point.  https://issues.apache.org/jira/browse/CASSANDRA-13971 has some of 
Stefan's work with cert management  Beyond that, I encourage you to do what 
Michael suggested: open JIRAs for things you care strongly about, work on them 
if you have time. Sometime this year we'll schedule a NGCC (Next Generation 
Cassandra Conference) where we talk about future project work and direction, I 
encourage you to attend if you're able (I encourage anyone who cares about the 
direction of Cassandra to attend, it's probably be either free or very low 
cost, just to cover a venue and some food). If nothing else, you'll meet some 
of the teams who are working on the project, and learn why they've selected the 
projects on which they're working. You'll have an opportunity to pitch your 
vision, and maybe you can talk some folks into helping out.   - Jeff        On 
Mon, Feb 19, 2018 at 1:01 AM, Kenneth Brotman  
wrote:Comments inline

>-Original Message-
>From: Jeff Jirsa [mailto:jji...@gmail.com]
>Sent: Sunday, February 18, 2018 10:58 PM
>To: user@cassandra.apache.org
>Cc: d...@cassandra.apache.org
>Subject: Re: Cassandra Needs to Grow Up by Version Five!
>
>Comments inline
>
>
>> On Feb 18, 2018, at 9:39 PM, Kenneth Brotman  
>> wrote:
>>
> >Cassandra feels like an unfinished program to me. The problem is not that 
> >it’s open source or cutting edge.  It’s an open source cutting edge program 
> >that lacks some of its basic functionality.  We are all stuck addressing 
> >fundamental mechanical tasks for Cassandra because the basic code that would 
> >do that part has not been contributed yet.
>>
>There’s probably 2-3 reasons why here:
>
>1) Historically the pmc has tried to keep the scope of the project very 
>narrow. It’s a database. We don’t ship drivers. We don’t ship developer tools. 
>We don’t ship fancy UIs. We ship a database. I think for the most part the 
>narrow vision has been for the best, but maybe it’s time to reconsider some of 
>the scope.
>
>Postgres will autovacuum to prevent wraparound (hopefully),  but everyone I 
>know running 

Re: Reg :- Multiple Node Cluster set up in Virtual Box

2017-11-06 Thread James Briggs
Nandan: The original Datastax training classes (when it was still called 
Riptano)
used 3 virtualbox Debian instances to setup a Cassandra cluster. Thanks, James 
Briggs.
--
Cassandra/MySQL DBA. Available in San Jose area or remote.
cass_top: https://github.com/jamesbriggs/cassandra-top

  From: kurt greaves <k...@instaclustr.com>
 To: User <user@cassandra.apache.org> 
 Sent: Monday, November 6, 2017 3:08 PM
 Subject: Re: Reg :- Multiple Node Cluster set up in Virtual Box
   
Worth keeping in mind that in 3.6 onwards nodes will not start unless they can 
contact a seed. Not quite SPOF but still problematic. CASSANDRA-13851​

   

Re: cassandra.yaml configuration for large machines (scale up vs. scale out)

2017-11-04 Thread James Briggs
> I know that Cassandra is built for scale out on commodity hardware
The term "commodity hardware" is not very useful, though the 
averageserver-class machine bought in 2017 can work.

Netflix found that SSD helped greatly with compactions in production.Generally 
servers use 10 GB networking in 2017.
128 GB is commonly used, but I would use 256+ GB in new servers.
 I don't recommend the Cassandra JBOD configuration since losingone drive means 
rebuilding the node immediately, which manyorganizations aren't responsive 
enough to do.

Thanks, James.
--
Cassandra/MySQL DBA. Available in San Jose area or remote.
cass_top: https://github.com/jamesbriggs/cassandra-top

  From: "Steinmaurer, Thomas" 
 To: "user@cassandra.apache.org"  
 Sent: Friday, November 3, 2017 6:34 AM
 Subject: cassandra.yaml configuration for large machines (scale up vs. scale 
out)
   
 Hello,    I know that Cassandra is built for scale out 
on commodity hardware, but I wonder if anyone can share some experience when 
running Cassandra on rather capable machines.    Let’s say we have a 3 node 
cluster with 128G RAM, 32 physical cores (16 per CPU socket), Large Raid with 
Spinning Disks (so somewhere beyond 2000 IOPS).    What are some recommended 
cassandra.yaml configuration / JVM settings, e.g. we have been using with 
something like that as a first baseline: ·31G heap, G1, 
-XX:MaxGCPauseMillis=2000 ·concurrent_compactors: 8 ·
compaction_throughput_mb_per_sec: 128 ·key_cache_size_in_mb: 2048 · 
   concurrent_reads: 256 ·concurrent_writes: 256 ·
native_transport_max_threads: 256    Anything else we should add to our first 
baseline of settings?    E.g. although we have a key cache of 2G, nodetool info 
gives me only 0.451 as hit rate:    Key Cache  : entries 2919619, 
size 1.99 GB, capacity 2 GB, 71493172 hits, 158411217 requests, 0.451 recent 
hit rate, 14400 save period in seconds       Thanks, Thomas    The contents of 
this e-mail are intended for the named addressee only. It contains information 
that may be confidential. Unless you are the named addressee or an authorized 
designee, you may not copy or use it, or disclose it to anyone else. If you 
received it in error please notify us immediately and then destroy it. 
Dynatrace Austria GmbH (registration number FN 91482h) is a company registered 
in Linz whose registered office is at 4040 Linz, Austria, Freistädterstraße 313

   

Re: Indexes Fragmentation

2014-09-30 Thread James Briggs
MySQL Cluster (don't use FKs yet) or Redis (in-memory databases) sound more 
appropriate
for data that churns a lot.

 
Thanks, James Briggs. 
-- 
Cassandra/MySQL DBA. Available in San Jose area or remote. 
cass_top: https://github.com/jamesbriggs/cassandra-top




 From: Robert Coli rc...@eventbrite.com
To: user@cassandra.apache.org user@cassandra.apache.org 
Sent: Monday, September 29, 2014 5:01 PM
Subject: Re: Indexes Fragmentation
 





On Sun, Sep 28, 2014 at 9:49 AM, Arthur Zubarev arthur.zuba...@aol.com wrote:
There are 200+ times more updates and 50x inserts than analytical loads.

In Cassandra to just be able to query (in CQL) on a column I have to have an 
index, the question is what tall the fragmentation coming from the frequent 
updates and inserts has on a CF? Do I also need to manually defrug? 


You have appeared to have just asked if maintaing indexes which have a high 
rate of change in a log structured database with immutable data files is likely 
to be more performant than maintaining them in a database with modify-in-place 
semantics.

No.

=Rob

Re: Help with approach to remove RDBMS schema from code to move to C*?

2014-09-19 Thread James Briggs
Most of the C* success stories are for greenfield applications.

Migrating from one database to another database is a lot of work. C* offers no 
magical path.


If you only have a few tables and minor RDBMS feature dependencies, it can be 
done.
Make sure your users and QA people are cooperative first though. Most companies 
don't have
a budget to re-QA applications a second time.


Maybe introduce C* to your organization on a new, small project first?


Thanks, James Briggs. 
-- 
Cassandra/MySQL DBA. Available in San Jose area or remote. 




 From: Les Hartzman lhartz...@gmail.com
To: user@cassandra.apache.org 
Sent: Friday, September 19, 2014 2:46 PM
Subject: Help with approach to remove RDBMS schema from code to move to C*?
 


My company is using an RDBMS for storing time-series data. This application was 
developed before Cassandra and NoSQL. I'd like to move to C*, but ...

The application supports data coming from multiple models of devices. Because 
there is enough variability in the data, the main table to hold the device data 
only has some core columns defined. The other columns are non-specific; a set 
of columns for numeric and a set for character. So for these non-specific 
columns, their use is defined in the code. The use of column 'numeric_1' might 
hold a millisecond time for one device and a fault code for another device. 
This appears to have been done to keep from modifying the schema whenever a new 
device was introduced. And they rolled their own db interface to support this 
mess.

Now, we could just use C* like an RDBMS - defining CFs to mimic the tables. But 
this just pushes a bad design from one platform to another.

Clearly there needs to be a code re-write. But what suggestions does anyone 
have on how to make this shift to C*?

Would you just layout all of the columns represented by the different devices, 
naming them as they are used, and having jagged rows? Or is there some other 
way to approach this?

Of course, the data miners already have scripts/methods for accessing the data 
from the RDBMS now in the user-unfriendly form it's in now. This would have to 
be addressed as well, but until I know how to store it, mining it gets ahead of 
things.

Thanks.

Les

Re: Blocking while a node finishes joining the cluster after restart.

2014-09-19 Thread James Briggs
Kevin: The serial approach would 
take a LONG time for large clusters. 

If you have sixty nodes, it could 
take an hour to do a rolling restart.

1) In Cassandra land, an hour is nothing. There's people doing repairs that 
practically
never finish - as soon as one finishes after a week, they have to start the 
next one.


2) I met some people at the conference who were embarrassed to operate only 12 
nodes.
I'm not sure why, since managing 12 is a lot easier and cheaper than 60.
In fact, I would be proud to operate a large site on 8 or 12 nodes. :)


3) After I finish my cass_top project this week, I'll take a look at scripting
what you mentioned in this thread.


Thanks, James Briggs. 
-- 
Cassandra/MySQL DBA. Available in San Jose area or remote. 




 From: Kevin Burton bur...@spinn3r.com
To: user@cassandra.apache.org user@cassandra.apache.org; James Briggs 
james.bri...@yahoo.com 
Sent: Friday, September 19, 2014 11:30 AM
Subject: Re: Blocking while a node finishes joining the cluster after restart.
 


This is great feedback…

I think it could actually be even easier than this…

You could have an ansible (or whatever cluster management system you’re using) 
role for just seeds.

Then you would serially restart all seeds one at a time.  You would need to run 
‘nodetool status’ and make sure the node is ‘U’ (up) I think.. but you might 
want to make sure the majority of other nodes have agreed that this node is up 
and available.

I think you can ONLY do this serially.. .for a LARGE number of hosts, this 
might take a while unless you can compute nodes which have mutually exclusive 
key ranges.

The serial approach would take a LONG time for large clusters.  If you have 
sixty nodes, it could take an hour to do a rolling restart.

Kevin


On Tue, Sep 16, 2014 at 12:21 PM, James Briggs james.bri...@yahoo.com wrote:

FYI: OpsCenter has a default of sleep 60 seconds after each node restart,
and an option of drain before stopping.



I haven't noticed if they do anything special with seeds.
(At least one seed needs to be running before you restart other nodes.)



I wondered the same thing as Kevin and came to these conclusions.


Fixing the startup script is non-trivial as far as startup scripts go.


For start, it would have to:


- parse cassandra.yaml for seeds
- if itself is not a seed, wait for a seed to start first. (could take minutes 
or never.)

- continue start.



For a no-downtime cluster restart script, it would have to:


- verify cluster health (ie. quorum/CL is met or you lose writes)

- parse cassandra.yaml for seeds and see if a seed is up
- stop gossip and thrift
- maybe do compaction before drain

- drain node
- stop/start or restart cassandra process.

http://comments.gmane.org/gmane.comp.db.cassandra.user/20144

Both of those scripts would be nice to have. :)

OpsCenter is flaky at doing rolling restart in my test cluster,
so an alternative is needed.

Also, the free OpsCenter doesn't have rolling repair option enabled.

ccm has the options to do drain, stop and start, but a bash
script would be needed to make it rolling.

https://github.com/pcmanus/ccm


Thanks, James. 
-- 
Cassandra/MySQL DBA. Available in San Jose area or remote.





 From: Duncan Sands duncan.sa...@gmail.com
To: user@cassandra.apache.org 
Sent: Tuesday, September 16, 2014 11:09 AM
Subject: Re: Blocking while a node finishes joining the cluster after restart.
 

Hi Kevin, if you are using the latest version of opscenter, then even the 
community (= free) edition can do a rolling restart of your cluster.  It's 
pretty convenient.

Ciao, Duncan.

On 16/09/14 19:44, Kevin Burton wrote:
 Say I want to do a rolling restart of Cassandra…

 I can’t just restart all of them because they need some time to gossip and 
 for

 that gossip to get to all nodes.

 What is the best strategy for this.

 It would be something like:

 /etc/init.d/cassandra restart  wait-for-cassandra.sh

 … or something along those lines.

 --

 Founder/CEO Spinn3r.com http://Spinn3r.com

 Location: *San Francisco, CA*
 blog:**http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com








-- 

Founder/CEO Spinn3r.com

Location: San Francisco, CA

blog: http://burtonator.wordpress.com
… or check out my Google+ profile

Re: what's cool about cassandra 2.1.0?

2014-09-19 Thread James Briggs
I'll be blunt. The reason to use the latest 2.0 or soon 2.1 is because
Apple has committed 20 patches that make Cassandra
operationally useful. Apple is the QA lab for Cassandra.

Their conference talk was very exciting. I hope a video of that
gets posted in October.


Thanks, James Briggs. 
-- 
Cassandra/MySQL DBA. Available in San Jose area or remote. 




 From: DuyHai Doan doanduy...@gmail.com
To: user@cassandra.apache.org 
Sent: Friday, September 19, 2014 7:07 AM
Subject: Re: what's cool about cassandra 2.1.0?
 


Hello Tim

 From this blog (http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1) 
you should find the pointers to other big topics of 2.1




On Fri, Sep 19, 2014 at 3:33 PM, Tim Dunphy bluethu...@gmail.com wrote:

Hey all, 


 I tried googling around to get an idea about what was new (and potentially 
 cool) in the newest release of cassandra - 2.1.0.


But all that I've been able to find so far is this kind of general statement 
about the new features. 


https://www.mail-archive.com/user@cassandra.apache.org/msg38448.html


It doesn't seem to have a lot of detail!  Particularly I'm curious about how 
CQL has been enhanced beyond just an incomplete list of new data types. I'd 
like to know what the performance improvements are, How the row cache has been 
improved. Etc. You get the idea! So where can I find a more complete 
description of how this update is of benefit?


Thanks!
Tim

-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B



Re: Blocking while a node finishes joining the cluster after restart.

2014-09-16 Thread James Briggs
FYI: OpsCenter has a default of sleep 60 seconds after each node restart,
and an option of drain before stopping.


I haven't noticed if they do anything special with seeds.
(At least one seed needs to be running before you restart other nodes.)


I wondered the same thing as Kevin and came to these conclusions.

Fixing the startup script is non-trivial as far as startup scripts go.

For start, it would have to:

- parse cassandra.yaml for seeds
- if itself is not a seed, wait for a seed to start first. (could take minutes 
or never.)

- continue start.


For a no-downtime cluster restart script, it would have to:

- verify cluster health (ie. quorum/CL is met or you lose writes)

- parse cassandra.yaml for seeds and see if a seed is up
- stop gossip and thrift
- maybe do compaction before drain

- drain node
- stop/start or restart cassandra process.

http://comments.gmane.org/gmane.comp.db.cassandra.user/20144

Both of those scripts would be nice to have. :)

OpsCenter is flaky at doing rolling restart in my test cluster,
so an alternative is needed.

Also, the free OpsCenter doesn't have rolling repair option enabled.

ccm has the options to do drain, stop and start, but a bash
script would be needed to make it rolling.

https://github.com/pcmanus/ccm


Thanks, James. 
-- 
Cassandra/MySQL DBA. Available in San Jose area or remote.




 From: Duncan Sands duncan.sa...@gmail.com
To: user@cassandra.apache.org 
Sent: Tuesday, September 16, 2014 11:09 AM
Subject: Re: Blocking while a node finishes joining the cluster after restart.
 

Hi Kevin, if you are using the latest version of opscenter, then even the 
community (= free) edition can do a rolling restart of your cluster.  It's 
pretty convenient.

Ciao, Duncan.

On 16/09/14 19:44, Kevin Burton wrote:
 Say I want to do a rolling restart of Cassandra…

 I can’t just restart all of them because they need some time to gossip and for
 that gossip to get to all nodes.

 What is the best strategy for this.

 It would be something like:

 /etc/init.d/cassandra restart  wait-for-cassandra.sh

 … or something along those lines.

 --

 Founder/CEO Spinn3r.com http://Spinn3r.com

 Location: *San Francisco, CA*
 blog:**http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com



Re: Blocking while a node finishes joining the cluster after restart.

2014-09-16 Thread James Briggs
Hi Robert.

I just did a test (shutdown all nodes, start one non-seed node.)


You're correct that an old non-seed node can start by itself.

So startup scripts don't have to be intelligent, but apps need to wait

until there's enough nodes up to serve the whole keyspace:

cqlsh:my_keyspace consistency
Current consistency level is ONE.

cqlsh:my_keyspace select * from numbers where v=1;

 v
---
 1

(1 rows)

cqlsh:my_keyspace select * from numbers where v=2;
Unable to complete request: one or more nodes were unavailable.


Thanks, James. 
-- 
Cassandra/MySQL DBA. Available in San Jose area or remote. 


Re: backport of CASSANDRA-6916

2014-09-16 Thread James Briggs
Paulo:

Out of curiosity, why not just upgrade to 2.1 if you want the new features?

You know you want to! :)

 
Thanks, James Briggs
-- 
Cassandra/MySQL DBA. Available in San Jose area or remote.



 From: Robert Coli rc...@eventbrite.com
To: user@cassandra.apache.org user@cassandra.apache.org 
Sent: Tuesday, September 16, 2014 4:13 PM
Subject: Re: backport of CASSANDRA-6916
 


On Tue, Sep 16, 2014 at 2:56 PM, Paulo Ricardo Motta Gomes 
paulo.mo...@chaordicsystems.com wrote:

Has anyone backported incremental replacement of compacted SSTables 
(CASSANDRA-6916) to 2.0? Is it doable or there are many dependencies introduced 
in 2.1?


Haven't checked the ticket detail yet, but just in case anyone has interesting 
info to share.

Are you looking to patch for public consumption, or for your own purposes?

I just took the temperature of #cassandra-dev and they were cold on the idea as 
a public patch, because of potential impact on stability.

=Rob

Announce: top for Cassandra - cass_top

2014-09-16 Thread James Briggs
I wrote cass_top, a poor man's version of OpsCenter, in bash (no dependencies.)


http://www.jebriggs.com/blog/2014/09/top-utility-for-cassandra-clusters-cass_top/

 
Actually, if it had node or cluster restart, it would do most of what the 
OpsCenter free version does. :)

The features of cass_top are:

- colorizes nodetool status output: UN nodes green, DN nodes red, other 
statuses blue
- no extra firewall holes needed (agent-less and server-less), unlike OpsCenter
- fast initial startup time (under 2 seconds), unlike OpsCenter
- uses bash, so no programming environment needed - run it anywhere nodetool 
works
- uses minimal screen real estate, so several rings can fit on one monitor
- free (Apache 2).

Please send me your comments and suggestions. The top-like infinite loop is
actually a read loop, so adding a few more features like cfstats or flush would 
be easy.

Enjoy, James Briggs. 
-- 
Cassandra/MySQL DBA. Available in San Jose area or remote. 

Re: no change observed in read latency after switching from EBS to SSD storage

2014-09-16 Thread James Briggs
To expand on what Robert said, Cassandra is a log-structured database:

- writes are append operations, so both correctly configured disk volumes and 
SSD are fast at that

- reads could be helped by SSD if they're not in cache (ie. on disk)

- but compaction is definitely helped by SSD with large data loads (compaction 
is the trade-off for fast writes)

 
Thanks, James Briggs. 
-- 
Cassandra/MySQL DBA. Available in San Jose area or remote. 
Mailbox dimensions: 10x12x14



 From: Robert Coli rc...@eventbrite.com
To: user@cassandra.apache.org user@cassandra.apache.org 
Sent: Tuesday, September 16, 2014 5:42 PM
Subject: Re: no change observed in read latency after switching from EBS to SSD 
storage
 





On Tue, Sep 16, 2014 at 5:35 PM, Mohammed Guller moham...@glassbeam.com wrote:

Does anyone have insight as to why we don't see any performance impact on the 
reads going from EBS to SSD?


What does it say when you enable tracing on this CQL query?

10 seconds is a really long time to access anything in Cassandra. There is, 
generally speaking, a reason why the default timeouts are lower than this.

My conjecture is that the data in question was previously being served from the 
page cache and is now being served from SSD. You have, in switching from 
EBS-plus-page-cache to SSD successfully proved that SSD and RAM are both very 
fast. There is also a strong suggestion that whatever access pattern you are 
using is not bounded by disk performance.

=Rob

Re: C 2.1

2014-09-15 Thread James Briggs
Hi Ram.

1) As an Operations DBA, I consider all versions of Cassandra to be alpha.

So whether you pick 2.0.10 or 2.1.0 doesn't really matter since you
will have to do your own acceptance testing.

2) Data modelling is everything when it comes to a distributed database
like Cassandra. You can read my blog post which is a quick way to get
up to speed with CQL:

Notes on “Getting Started with Time Series Data Modeling” in Cassandra
http://jbriggs.com/blog/2014/09/notes-on-getting-started-with-time-series-data-modeling-in-cassandra/
 
Thanks, James Briggs
--
Cassandra/MySQL DBA. Available in San Jose area or remote.




 From: Ram N yrami...@gmail.com
To: user@cassandra.apache.org 
Sent: Saturday, September 13, 2014 3:49 PM
Subject: C 2.1
 


Team,

I am pretty new to cassandra (with just 2 weeks of playing around with it on 
and off) and planning a fresh deployment with 2.1 release. The data-model is 
pretty simple for my use-case.  Questions I have in mind are

Is 2.1 a production ready release? 
Driver selection?
I played around with Hector, Astyanax and Java driver? 
 I don't see much activity happening on Hector,
 For Astyanax - Love the Fluent style of writing code and abstractions, 
recipes, pooling etc
 Datastax Java driver - I get too confused with CQL and the underlying 
storage model. I am also not clear on the indexing structure of columns. Does 
CQL indexes create a separate CF for the index table? How is it different from 
maintaining inverted index? Internally both are the same? Does cql stmt to 
create index, creates a separate CF and has an atomic way of updating/managing 
them? Which one is better to scale? (something like stargate-core or the ones 
done by usergrid? or the CQL approach?)

On a separate note just curious if I have 1000's of columns in a given row and 
a fixed set of indexed column  (say 30 - 50 columns) which approach should I be 
taking? Will cassandra scale with these many indexed column? Are there any 
limits? How much of an impact do CQL indexes create on the system? I am also 
not sure if these use cases are the right choice for cassandra but would really 
appreciate any response on these. Thanks.

-R

Re: C 2.1

2014-09-15 Thread James Briggs
Ram,

The reason secondary indexes are not recommended is that since
they can't use the partition key, the values have to be fetched from
all nodes. So you have higher latency, and likely timeouts.

The C* solutions are:

a) use a denormalized (materialized) table

b) use a clustered index if all the data related to the row key is
in the same partition (read my blog link from this thread for more)


That's the price of using distributed systems.

 
Oh, and then there's the need to rewrite the data access layer
of your entire existing app. :)

AOL and Blizzard talked about porting a couple apps to Cassandra
at the conference last week, but they sounded like trivial user-db
(UDB) apps, and even then Patrick was usually credited with the
data modelling.

I haven't heard of anybody porting a 100+ table Oracle or MySQL
app to C* yet. I'm sure it's been done, but most of the
apps written for C* are greenfield or v2.0 rewrites.

Thanks, James Briggs
--
Cassandra/MySQL DBA. Available in San Jose area or remote.




 From: Ram N yrami...@gmail.com
To: user user@cassandra.apache.org 
Sent: Monday, September 15, 2014 1:34 PM
Subject: Re: C 2.1
 




Jack, 

Using Solr or an external search/indexing service is an option but increases 
the complexity of managing different systems. I am curious to understand the 
impact of having wide-rows on a separate CF for inverted index purpose which if 
I understand correctly is what Rob's response, having a separate CF for index 
is better than using the default Secondary index option. 

Would be great to understand the design decision to go with present 
implementation on Secondary Index when the alternative is better? Looking at 
JIRAs is still confusing to come up with the why :) 

--R 






On Mon, Sep 15, 2014 at 11:17 AM, Jack Krupansky j...@basetechnology.com 
wrote:

If you’re indexing and querying on that many columns (dozens, or more than 
a handful), consider DSE/Solr, especially if you need to query on multiple 
columns in the same query.
 
-- Jack 
Krupansky
 
From: Robert Coli 
Sent: Monday, September 15, 2014 11:07 AM
To: user@cassandra.apache.org 
Subject: Re: C 2.1
 
On Sat, Sep 13, 2014 at 3:49 PM, Ram N yrami...@gmail.com wrote:

Is 2.1 a production ready release? 
 
https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/

 
 Datastax Java driver - I get too confused with  CQL and the underlying 
 storage model. I am also not clear on the indexing  structure of columns. 
 Does CQL indexes create a separate CF for the index  table? How is it 
 different from maintaining inverted index? Internally both  are the same? 
 Does cql stmt to create index, creates a separate CF and has an  atomic way 
 of updating/managing them? Which one is better to scale? (something  like 
 stargate-core or the ones done by usergrid? or the CQL  approach?)
 
New projects should use CQL. Access to underlying storage via Thrift is 
likely to eventually be removed from Cassandra.
 
On a separate note just curious if I have 1000's of columns in a given  row 
and a fixed set of indexed column  (say 30 - 50 columns) which  approach 
should I be taking? Will cassandra scale with these many indexed  column? Are 
there any limits? How much of an impact do CQL indexes create on  the system? 
I am also not sure if these use cases are the right choice for  cassandra but 
would really appreciate any response on these.  Thanks.
 
Use of the Secondary Indexes feature is generally an anti-pattern in 
Cassandra. 30-50 indexed columns in a row sounds insane to me. However 30-50 
column families into which one manually denormalized does not sound too insane 
to me...
 
=Rob
http://twitter.com/rcolidba

Re: Cassandra JBOD disk configuration

2014-09-09 Thread James Briggs
I've used JBOD before and here's the operational problems I noticed:

1) each volume/disk fills at a different rate, so the min might be 100 GB data,
and the max might be 200 GB.That means you cannot use anywhere near your
real hard disk capacity. (Then on top of that compaction requires space.)

2) when a disk dies you lose that node immediately, whereas with RAID you get
some warning.

Those issues made JBOD unusable for us, but if you're just using Cassandra
as a cache, or your operations team doesn't mind rebuilding nodes all the time
with no advance notice, or your data size is small compared to the disk size,
then it might be ok for you.

Thanks, James Briggs.


 From: Chris Lohfink clohf...@blackbirdit.com
To: user@cassandra.apache.org 
Sent: Tuesday, September 9, 2014 12:14 PM
Subject: Re: Cassandra JBOD disk configuration
 

It can get really unbalanced with STCS.  Whats more is even if there was a disk 
that could fit the 600gb sstable it doesn't pay attention to space (first) so 
may pick the 75% full one over the 10% one.  Its a better idea to use LCS with 
it unless data model really needs it in which case monitor it carefully.  If 
you want to more completely utilize your disks you will probably just want to 
use RAID.  I imagine you would get far better performance out of JBOD though... 
It Depends

Chris

On Sep 4, 2014, at 4:48 AM, Hannu Kröger hkro...@gmail.com wrote:

 Hi,
 
 Let's imagine that I have one keyspace with one big table configured with 
 size tiered compaction strategy and nothing else. The disk configuration 
 would to have 10x 500GB disks, each mounted to separate directory.
 
 Each directory would then be configured as a separate entry in cassandra.yaml.
 
 Over time data accumulates and I have at some point 4x 300GB sstables that 
 the cassandra would like to compact to one 1,2 TB sstable.
 
 Since each directory has max 500GB disk space, that would not work. Right?
 
 Is JBOD with more than 2 disks really usable with STCS? Probably LCS would 
 the only way to go in this case?
 
 Cheers,
 Hannu 

Re: cassandra on own distributed network

2014-09-09 Thread James Briggs
What you're describing depends on the load (data size) and latency.

Doing a bootstrap or backup would require a fair amount of bandwidth if
you want it done quickly with a lot of data. Also, latency would
be very high going over some kind of office VPN. But
there's no reason you can't do what you're describing.

You could setup a test cluster and see what the actual latency is.



Most people use 4 nodes per POP with NetworkTopologyStrategy (NTS)for a 
multi-DC setup with RF=3.
 

Thanks, James Briggs
--
Cassandra/MySQL DBA. Available in San Jose area or remote.




 From: David M da3bob...@gmail.com
To: user@cassandra.apache.org 
Sent: Tuesday, September 9, 2014 5:49 PM
Subject: cassandra on own distributed network
 


Hi everyone

I am at a loss for locating use cases/examples/documentation/books/etc for 
deploying Cassandra where multi-dc nodes of a single cluster are on your own 
network at points around the world.
In my example a Cassandra dc equates to a building.

Of interest to me is how installations are inter-connecting their dcs (circuit 
bandwidth, latency requirements) for optimal replication/gossip/etc and any 
lessons learned they can share.

I know there isn't going to be a single config that applies to every 
deployment/usage pattern/etc but surely there are at least loose rules of thumb 
that will get me going (or maybe alternative deployments).


The interesting posts/blogs/books/etc seem to reference Cassandra in the cloud 
(eg specifying AWS instance types) leaving out descriptions/usage/requirements 
at the network layer.

If anyone knows of any information on this topic that I've missed I'd 
appreciate your sharing.


Thanks,

David

Re: hardware sizing for cassandra

2014-09-09 Thread James Briggs
Regarding what Netflix does, the last time I checked:

1) sure, they use AWS VMs, but they take the whole machine.
So is that really using a VM? :)

2) they use SSD mainly to reduce compaction time. We don't
even notice it with SSD any more.

When sizing nodes and clusters, the main factors I've seen are:

a) What read latency are you trying to achieve? With 400 GB data per node,
10 ms is easy, but 1 ms is hard. Your whole design will revolve around this
if you want low latency.

b) How much data load per node is there? Bootstrapping and backup/restore
gets time-consuming and hard with more than 400 GB per node.

c) Are you planning to delete data? If so, that's harder to manage.

Other than that, the previous comments on RAM are pretty accurate.
I would want more cores with vnodes to do more parallel operations.

Thanks, James Briggs.
--
Cassandra/MySQL DBA. Available in San Jose area or remote.



 From: Robert Coli rc...@eventbrite.com
To: user@cassandra.apache.org user@cassandra.apache.org 
Sent: Tuesday, September 9, 2014 2:44 PM
Subject: Re: hardware sizing for cassandra
 


On Tue, Sep 9, 2014 at 2:16 PM, Russell Bradberry rbradbe...@gmail.com wrote:

Because RAM is expensive and the JVM heap is limited to 8gb. While you do get 
benefit out of using extra RAM as page cache, it's often not cost efficient to 
do so


Again, this is so use-case dependent. I have met several people that run small 
nodes with fat ram to get it all in memory to serve things in as few 
milliseconds as possible.  This is a very common pattern in ad-tech where 
every millisecond counts.  The tunable consistency and cross-datacenter 
replication make Cassandra very appealing as it is difficult to set this up 
with other DBs. 

Sure, it's also very common to run RDBMS in such a mode that hundreds of 
gigabytes of RAM are available as either page cache or buffer pool. But things 
are fast when you don't access slow disks is not really a commentary on 
Cassandra specifically, 8gb is the largest practical heap size with CMS GC 
is.. :D

The recommended setup is 3 nodes and an RF of 3 to be able to make quorum 
reads/writes and survive an outage. But again, this is completely use-case 
dependent.

IMO, minimum number of nodes you actually want to use in production with RF=3 
is =4, probably closer to 6. But as you say, use case dependent.


=Rob