Hello, I'm no longer operating "my" own cluster, but now doing consulting for TLP. Here is what is would say with my own experience:
1. Do the same people where you work operate the cluster and write > the code to develop the application? > It's not the same set of skills that is required to operate and to use the driver, write query and develop the needed code. Some people have all the requested skills, yet the amount of work one can do in a day is limited anyway. Some thoughts: - The design/model should always be done with all the people involved in this feature/project. --- Operators will know about the best practices and will assume the responsibility of having it working ultimately. They are the fire extinguishers and should help building the house, because they know what will burn and what is reliable. --- Devs are the most qualified to build the code, interact with the drivers and potentially write the code (and tests) needed to query Cassandra in the 'right way', as defined together with operators. --- Lawyers/Legal teams can help answer questions around the TTL to use (Time To Live). Not many data are requested to live forever, and setting a TTL is a good way to keep the data size under control. --- If the same person cares of both DEV/OPS (start-ups, small teams generally), it's good for this team to be at 2+ people big. One alone cannot exchange ideas, nor be up 24/7... - A team of operators, that just knows the basic can do level 1 support if procedures are well documented and if the proper tooling is in place. There is a fair amount of repetitive work, many times where the 'protocol' is the same one to react to X or Y. Ultimately, they can escalate to the people who are responsible for the cluster. > 2. Do you have a metrics stack that allows you to see graphs of > various metrics with all the nodes displayed together? > I definitely recommend and advocate for this. It is the best way to get a feeling of the health of. your cluster at first sight. To understand the patterns, the bottlenecks, to see the impacts of optimisations and to diagnose issues efficiently. We built Datadog default dashboards to help people using Datadog to monitor their Cassandra clusters. The release post is here: http://thelastpickle.com/blog/2017/12/05/datadog-tlp-dashboards.html Also if you prefer videos, here is what I think about why, what and how to monitor: https://www.youtube.com/watch?v=Q9AAR4UQzMk If you're not using Datadog, they are Grafana dashboards available and prometheus metric exporters as well. - "grafana cassandra dashboards <https://www.google.com/search?client=safari&rls=en&q=grafana+cassandra+dashboards&ie=UTF-8&oe=UTF-8>" on any search engine should give you a few options - https://github.com/instaclustr/cassandra-exporter 3. Do you have a log stack that allows you to see the logs for all > the nodes together? > I would say it's a 'should-have' by opposition to a good monitoring system for example that is a 'must-have'. I never had one or really used one, despite the fact that as a consultant I worked on multiple clusters. If you have it in place for other services, then maybe just plug in C* nodes as well. It will help you if a machine becomes completely unreachable or to easily aggregate and make statistics for the whole cluster. it can be really nice. Then just be aware of the amount of logs that Cassandra generates, the debug level you want to have and think about the appropriate retention policy. But it's definitely not the first thing I would care about, as tools allow you to query from all nodes through ssh to get information about each node. Or you can always jump on a faulty node. > 4. Do you regularly repair your clusters - such as by using Reaper? > Most of the people do I believe, one way or another. With cron, house built tools, reaper, oss scripts to handle "range repairs". It is not mandatory as long as you do not delete data. It's maybe not needed if you use strong consistency. I always like to do it regularly. I like to think that my nodes are having the same data, that entropy is as low as possible. It always worked well for me, making me more confident when operating the cluster (moving token ranges, removing forcefully a node, etc) and I did not lose data in 6 years (apart from counters, but they were known to be not 'accurate' not to say 'broken' already by then) and despite the fact I started with C*0.8 (and fresh first counters implementation yay!). I would keep routine repairs as a good practice when it is useful (deletes, not strongly consistent read) but also when it theory it's not needed, to help keeping the data where it belongs, and despite Cassandra is now pretty resilient. Yet some people are doing perfectly fine... until they run a first repair! Be sure to read about it before. With default number of vnodes and default repair options in older versions of cassandra, you could really harm your cluster. > 5. Do you use artificial intelligence to help manage your clusters? So far I only used "human intelligence" (mine, the collective one from this mailing list and my colleague's one - really often) to manage my and other people clusters ;-). But there's a small part what I do that I could trust a machine to do for me, and better than I would do. There is a lot of tools out there that do "things" for us (Reaper, OpsCenter, in-house shared/oss tools, Netflix opens a lot of tools for many years, but also dashboards that you just have to plug, etc) that bring some intelligence from other people who face, not real IA per se as it won't learn by itself, maybe. Also, there is an ongoing work to make operating Cassandra greater, search for "management tool" off the top of my head... I never used any IA to help managing my clusters. The closest thing I had installed at some point was the OpsCenter adviser once, multiple years ago. But I was knowing more than 'it' (the IA) about Cassandra by then ;-). I never had the chance to see a really great IA that would actually help me with cluster management. Thus I see the interest of some tools to help people managing their cluster. Other alternatives are fully managed Cassandra clusters services if that's of interest for you, using the mailing list (as you did), working with consultant is another option (but I could be a bit biased recommending you to work with consultants ;-)). Hope some of it helps (I always write too much ¯\_(ツ)_/¯), C*heers, ----------------------- Alain Rodriguez - [email protected] France / Spain The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com Le lun. 1 avr. 2019 à 20:45, Rahul Singh <[email protected]> a écrit : > Answers inline. > > > 1. Do the same people where you work operate the cluster and write > the code to develop the application? > > > No but the operators need to know development , data-modeling, and > generally how to "code" the application. (Coding is a low-level task of > assigning a code to a concept.. so I don't think that's the proper verb in > these scenarios.. engineering, or software development, or even programing > is a better term). It's because the developers are hired dime a dozen at > the B / C level and then replaced by D /E / F level developers as things go > on.. so the Data team eventually ends up being the expert of the > application and the data platform, and a "Center of Excellence" for the > development / architects to work with on a collaborative basis. > > > > 2. Do you have a metrics stack that allows you to see graphs of > various metrics with all the nodes displayed together? > > > > Yes. OpsCenter, ELK, Grafana, custom node data visualizers in excel > (because lines and charts don't tell you everything) > > > 3. Do you have a log stack that allows you to see the logs for all > the nodes together? > > ELK. CloudWatch > > > 4. Do you regularly repair your clusters - such as by using Reaper? > > Depends. Cron, Reaper, OpsCenter Repair, and now NodeSync > > > 5. Do you use artificial intelligence to help manage your clusters? > > > Yes, I actually have made an artificial general intelligence called > Gravitron. It learns by ingesting all the news articles I aggregate about > Cassandra and the links I curate on cassandra.link into a solr/lucene index > and then using clustering find out the most popular and popularly connected > content. Once it does that there's a summarization of the content into > human readable content as well as interpreted bash code that gets pushed > into a "Recipe Book." As the master operator identifies scenarios using > english language, and then runs the bash commands, the machine slowly but > surely "wakes up" and starts to manage itself. It can also play Go , the > game, and beat IBM's AlphaGo at Go, and Donald Trump at golf while he was > cheating! > > > > [email protected] > > http://cassandra.link > > I'm speaking at #DataStaxAccelerate, the world’s premiere #ApacheCassandra > conference, and I want to see you there! Use my code Singh50 for 50% off > your registration. www.datastax.com/accelerate > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Happy april fools day. > > > > > > On Thu, Mar 28, 2019 at 5:03 AM Kenneth Brotman > <[email protected]> wrote: > >> I’m looking to get a better feel for how people use Cassandra in >> practice. I thought others would benefit as well so may I ask you the >> following five questions: >> >> >> >> 1. Do the same people where you work operate the cluster and write >> the code to develop the application? >> >> >> >> 2. Do you have a metrics stack that allows you to see graphs of >> various metrics with all the nodes displayed together? >> >> >> >> 3. Do you have a log stack that allows you to see the logs for all >> the nodes together? >> >> >> >> 4. Do you regularly repair your clusters - such as by using Reaper? >> >> >> >> 5. Do you use artificial intelligence to help manage your clusters? >> >> >> >> >> >> Thank you for taking your time to share this information! >> >> >> >> Kenneth Brotman >> >
