Most teams are either using things like ansible/python scripts, or have
bespoke infrastructure.

Some of what you're describing is included in the intent of the
`cassandra-sidecar` project:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652224

====
Goals
We target two main goals for the first version of the sidecar, both work
towards having a easy to use control plane for managing Cassandra’s data
plane.
Provide an extensible and pluggable architecture for developers and
operators to easily operate Cassandra as well as easing integration with
their existing infrastructure. One major sub-goal of this goal is:
The proposal should pass the “curl test”: meaning that it is accessible to
standard tooling and out of the box libraries available for practically
every environment or programming language (including python, ruby, bash).
This means that as a public interface we cannot choose Java specific (jmx)
or Cassandra specific (CQL) APIs.
Provide basic but essential and useful functionality. Some proposed scope
in this document:
Run health checks on replicas and the cluster
Run diagnostic commands on individual nodes as well as all nodes in the
cluster (bulk commands)
Export metrics via pluggable agents rather than polling JMX
Schedule periodic management activities such as running clean ups
(as a stretch goal) safely restart all nodes in the cluster.
====

The health checker seems to be implemented, I'm not sure if the coordinated
cleanup or similar exist yet (or if there are JIRAs around for them). In
theory, this type of work - outside the database, in automation - should be
really easy for newcomers who are solving their own problems.

Other things that sorta fall into this space, but may be not quite what
you're looking for:

- https://github.com/Netflix/Priam (if you run very much like netflix runs,
especially on AWS)
- https://github.com/thelastpickle/cassandra-reaper for the repair
automation
- https://github.com/JeremyGrosser/tablesnap (old-ish, for backups)



On Tue, Mar 1, 2022 at 11:05 AM Joe Obernberger <
joseph.obernber...@gmail.com> wrote:

> Thanks all - I'll take a look at Ansible.  Back in my Hadoop days, we
> would use Cloudera manager (course that now costs $). Sounds like we
> need a new open source project!  :)
>
> -Joe
>
> On 3/1/2022 7:46 AM, Bowen Song wrote:
> > We use Ansible to manage a fairly large (200+ nodes) cluster. We
> > created our own Ansible playbooks for common tasks, such as rolling
> > restart. We also use Cassandra Reaper for scheduling and running
> > repairs on the same cluster. We occasionally also use pssh (parallel
> > SSH) for inspecting the logs or configurations on selected nodes.
> > Running pssh on very larger number of servers is obviously not
> > practical due the the available screen space constraint.
> >
> > On 28/02/2022 21:59, Joe Obernberger wrote:
> >> Hi all - curious what tools are folks using to manage large Cassandra
> >> clusters?  For example, to do tasks such as nodetool cleanup after a
> >> node or nodes are added to the cluster, or simply rolling start/stops
> >> after an update to the config or a new version?
> >> We've used puppet before; is that what other folks are using?
> >> Thanks for any suggestions.
> >>
> >> -Joe
> >>
> >
>

Reply via email to