Re: How seed nodes are working and how to upgrade/replace them?
On Tue, 8 Jan 2019 at 18:29, Jeff Jirsa wrote: > Given Consul's popularity, seems like someone could make an argument that > we should be shipping a consul-aware seed provider. > Elasticsearch has a very handy dedicated file-based discovery system: https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery-zen.html#file-based-hosts-provider It's similar to what Cassandra's built-in SimpleSeedProvider does, but it doesn't require to keep up-to-date the *whole* cassandra.yaml file and that could probably be simpler to dynamically watch for changes. Ultimately, there are plenty of external applications that could be used to pull-in information from your favorite service discovery tool (etcd, Consul, etc.) or configuration management and keep this file up to date, without having to need a plugin for every system out there.
Re: How seed nodes are working and how to upgrade/replace them?
On Tue, 8 Jan 2019 at 18:39, Jeff Jirsa wrote: > On Tue, Jan 8, 2019 at 8:19 AM Jonathan Ballet wrote: > >> Hi Jeff, >> >> thanks for answering to most of my points! >> From the reloadseeds' ticket, I followed to >> https://issues.apache.org/jira/browse/CASSANDRA-3829 which was very >> instructive, although a bit old. >> >> >> On Mon, 7 Jan 2019 at 17:23, Jeff Jirsa wrote: >> >>> > On Jan 7, 2019, at 6:37 AM, Jonathan Ballet >>> wrote: >>> > >>> [...] >>> >>> > In essence, in my example that would be: >>> > >>> > - decide that #2 and #3 will be the new seed nodes >>> > - update all the configuration files of all the nodes to write the >>> IP addresses of #2 and #3 >>> > - DON'T restart any node - the new seed configuration will be picked >>> up only if the Cassandra process restarts >>> > >>> > * If I can manage to sort my Cassandra nodes by their age, could it be >>> a strategy to have the seeds set to the 2 oldest nodes in the cluster? >>> (This implies these nodes would change as the cluster's nodes get >>> upgraded/replaced). >>> >>> You could do this, seems like a lot of headache for little benefit. >>> Could be done with simple seed provider and config management >>> (puppet/chef/ansible) laying down new yaml or with your own seed provider >>> >> >> So, just to make it clear: sorting by age isn't a goal in itself, it was >> just an example on how I could get a stable list. >> >> Right now, we have a dedicated group of seed nodes + a dedicated group >> for non-seeds: doing rolling-upgrade of the nodes from the second list is >> relatively painless (although slow) whereas we are facing the issues >> discussed in CASSANDRA-3829 for the first group which are non-seeds nodes >> are not bootstrapping automatically and we need to operate them in a more >> careful way. >> > Rolling upgrade shouldn't need to re-bootstrap. Only replacing a host > should need a new bootstrap. That should be a new host in your list, so it > seems like this should be fairly rare? > Sorry, that's internal pigdin, by "rolling upgrade" I meant replacing in a rolling fashion all the nodes. > What I'm really looking for is a way to simplify adding and removing nodes >> into our (small) cluster: I can easily provide a small list of nodes from >> our cluster with our config management tool so that new nodes are >> discovering the rest of the cluster, but the documentation seems to imply >> that seed nodes also have other functions and I'm not sure what problems we >> could face trying to simplify this approach. >> >> Ideally, what I would like to have would be: >> >> * Considering a stable cluster (no new nodes, no nodes leaving), the N >> seeds should be always the same N nodes >> * Adding new nodes should not change that list >> * Stopping/removing one of these N nodes should "promote" another >> (non-seed) node as a seed >> - that would not restart the already running Cassandra nodes but would >> update their configuration files. >> - if a node restart for whatever reason it would pick up this new >> configuration >> >> So: no node would start its life as a seed, only a few already existing >> node would have this status. We would not have to deal with the "a seed >> node doesn't bootstrap" problem and it would make our operation process >> simpler. >> >> >>> > I also have some more general questions about seed nodes and how they >>> work: >>> > >>> > * I understand that seed nodes are used when a node starts and needs >>> to discover the rest of the cluster's nodes. Once the node has joined and >>> the cluster is stable, are seed nodes still playing a role in day to day >>> operations? >>> >>> They’re used probabilistically in gossip to encourage convergence. >>> Mostly useful in large clusters. >>> >> >> How "large" are we speaking here? How many nodes would it start to be >> considered "large"? >> > > ~800-1000 > Alllrriigght, we still have a long way :) Jonathan
Re: How seed nodes are working and how to upgrade/replace them?
I've done some gossip simulations in the past and found virtually no difference in the time it takes for messages to propagate in almost any sized cluster. IIRC it always converges by 17 iterations. Thus, I completely agree with Jeff's comment here. If you aren't pushing 800-1000 nodes, it's not even worth bothering with. Just be sure you have seeds in each DC. Something to be aware of - there's only a chance to gossip with a seed. That chance goes down as cluster size increases, meaning seeds have less and less of an impact as the cluster grows. Once you get to 100+ nodes, a given node is very rarely talking to a seed. Just make sure when you start a node it's not in its own seed list and you're good. On Tue, Jan 8, 2019 at 9:39 AM Jeff Jirsa wrote: > > > On Tue, Jan 8, 2019 at 8:19 AM Jonathan Ballet wrote: > >> Hi Jeff, >> >> thanks for answering to most of my points! >> From the reloadseeds' ticket, I followed to >> https://issues.apache.org/jira/browse/CASSANDRA-3829 which was very >> instructive, although a bit old. >> >> >> On Mon, 7 Jan 2019 at 17:23, Jeff Jirsa wrote: >> >>> > On Jan 7, 2019, at 6:37 AM, Jonathan Ballet >>> wrote: >>> > >>> [...] >>> >>> > In essence, in my example that would be: >>> > >>> > - decide that #2 and #3 will be the new seed nodes >>> > - update all the configuration files of all the nodes to write the >>> IP addresses of #2 and #3 >>> > - DON'T restart any node - the new seed configuration will be picked >>> up only if the Cassandra process restarts >>> > >>> > * If I can manage to sort my Cassandra nodes by their age, could it be >>> a strategy to have the seeds set to the 2 oldest nodes in the cluster? >>> (This implies these nodes would change as the cluster's nodes get >>> upgraded/replaced). >>> >>> You could do this, seems like a lot of headache for little benefit. >>> Could be done with simple seed provider and config management >>> (puppet/chef/ansible) laying down new yaml or with your own seed provider >>> >> >> So, just to make it clear: sorting by age isn't a goal in itself, it was >> just an example on how I could get a stable list. >> >> Right now, we have a dedicated group of seed nodes + a dedicated group >> for non-seeds: doing rolling-upgrade of the nodes from the second list is >> relatively painless (although slow) whereas we are facing the issues >> discussed in CASSANDRA-3829 for the first group which are non-seeds nodes >> are not bootstrapping automatically and we need to operate them in a more >> careful way. >> >> > Rolling upgrade shouldn't need to re-bootstrap. Only replacing a host > should need a new bootstrap. That should be a new host in your list, so it > seems like this should be fairly rare? > > >> What I'm really looking for is a way to simplify adding and removing >> nodes into our (small) cluster: I can easily provide a small list of nodes >> from our cluster with our config management tool so that new nodes are >> discovering the rest of the cluster, but the documentation seems to imply >> that seed nodes also have other functions and I'm not sure what problems we >> could face trying to simplify this approach. >> >> Ideally, what I would like to have would be: >> >> * Considering a stable cluster (no new nodes, no nodes leaving), the N >> seeds should be always the same N nodes >> * Adding new nodes should not change that list >> * Stopping/removing one of these N nodes should "promote" another >> (non-seed) node as a seed >> - that would not restart the already running Cassandra nodes but would >> update their configuration files. >> - if a node restart for whatever reason it would pick up this new >> configuration >> >> So: no node would start its life as a seed, only a few already existing >> node would have this status. We would not have to deal with the "a seed >> node doesn't bootstrap" problem and it would make our operation process >> simpler. >> >> >>> > I also have some more general questions about seed nodes and how they >>> work: >>> > >>> > * I understand that seed nodes are used when a node starts and needs >>> to discover the rest of the cluster's nodes. Once the node has joined and >>> the cluster is stable, are seed nodes still playing a role in day to day >>> operations? >>> >>> They’re used probabilistically in gossip to encourage convergence. >>> Mostly useful in large clusters. >>> >> >> How "large" are we speaking here? How many nodes would it start to be >> considered "large"? >> > > ~800-1000 > > >> Also, about the convergence: is this related to how fast/often the >> cluster topology is changing? (new nodes, leaving nodes, underlying IP >> addresses changing, etc.) >> >> > New nodes, nodes going up/down, and schema propagation. > > >> Thanks for your answers! >> >> Jonathan >> > -- Jon Haddad http://www.rustyrazorblade.com twitter: rustyrazorblade
Re: How seed nodes are working and how to upgrade/replace them?
On Tue, Jan 8, 2019 at 8:19 AM Jonathan Ballet wrote: > Hi Jeff, > > thanks for answering to most of my points! > From the reloadseeds' ticket, I followed to > https://issues.apache.org/jira/browse/CASSANDRA-3829 which was very > instructive, although a bit old. > > > On Mon, 7 Jan 2019 at 17:23, Jeff Jirsa wrote: > >> > On Jan 7, 2019, at 6:37 AM, Jonathan Ballet wrote: >> > >> [...] >> >> > In essence, in my example that would be: >> > >> > - decide that #2 and #3 will be the new seed nodes >> > - update all the configuration files of all the nodes to write the IP >> addresses of #2 and #3 >> > - DON'T restart any node - the new seed configuration will be picked >> up only if the Cassandra process restarts >> > >> > * If I can manage to sort my Cassandra nodes by their age, could it be >> a strategy to have the seeds set to the 2 oldest nodes in the cluster? >> (This implies these nodes would change as the cluster's nodes get >> upgraded/replaced). >> >> You could do this, seems like a lot of headache for little benefit. Could >> be done with simple seed provider and config management >> (puppet/chef/ansible) laying down new yaml or with your own seed provider >> > > So, just to make it clear: sorting by age isn't a goal in itself, it was > just an example on how I could get a stable list. > > Right now, we have a dedicated group of seed nodes + a dedicated group for > non-seeds: doing rolling-upgrade of the nodes from the second list is > relatively painless (although slow) whereas we are facing the issues > discussed in CASSANDRA-3829 for the first group which are non-seeds nodes > are not bootstrapping automatically and we need to operate them in a more > careful way. > > Rolling upgrade shouldn't need to re-bootstrap. Only replacing a host should need a new bootstrap. That should be a new host in your list, so it seems like this should be fairly rare? > What I'm really looking for is a way to simplify adding and removing nodes > into our (small) cluster: I can easily provide a small list of nodes from > our cluster with our config management tool so that new nodes are > discovering the rest of the cluster, but the documentation seems to imply > that seed nodes also have other functions and I'm not sure what problems we > could face trying to simplify this approach. > > Ideally, what I would like to have would be: > > * Considering a stable cluster (no new nodes, no nodes leaving), the N > seeds should be always the same N nodes > * Adding new nodes should not change that list > * Stopping/removing one of these N nodes should "promote" another > (non-seed) node as a seed > - that would not restart the already running Cassandra nodes but would > update their configuration files. > - if a node restart for whatever reason it would pick up this new > configuration > > So: no node would start its life as a seed, only a few already existing > node would have this status. We would not have to deal with the "a seed > node doesn't bootstrap" problem and it would make our operation process > simpler. > > >> > I also have some more general questions about seed nodes and how they >> work: >> > >> > * I understand that seed nodes are used when a node starts and needs to >> discover the rest of the cluster's nodes. Once the node has joined and the >> cluster is stable, are seed nodes still playing a role in day to day >> operations? >> >> They’re used probabilistically in gossip to encourage convergence. Mostly >> useful in large clusters. >> > > How "large" are we speaking here? How many nodes would it start to be > considered "large"? > ~800-1000 > Also, about the convergence: is this related to how fast/often the cluster > topology is changing? (new nodes, leaving nodes, underlying IP addresses > changing, etc.) > > New nodes, nodes going up/down, and schema propagation. > Thanks for your answers! > > Jonathan >
Re: How seed nodes are working and how to upgrade/replace them?
Given Consul's popularity, seems like someone could make an argument that we should be shipping a consul-aware seed provider. On Tue, Jan 8, 2019 at 7:39 AM Jonathan Ballet wrote: > On Mon, 7 Jan 2019 at 16:51, Oleksandr Shulgin < > oleksandr.shul...@zalando.de> wrote: > >> On Mon, Jan 7, 2019 at 3:37 PM Jonathan Ballet >> wrote: >> >>> >>> I'm working on how we could improve the upgrades of our servers and how >>> to replace them completely (new instance with a new IP address). >>> What I would like to do is to replace the machines holding our current >>> seeds (#1 and #2 at the moment) in a rolling upgrade fashion, on a regular >>> basis: >>> >>> * Is it possible to "promote" any non-seed node as a seed node? >>> >>> * Is it possible to "promote" a new seed node without having to restart >>> all the nodes? >>> In essence, in my example that would be: >>> >>> - decide that #2 and #3 will be the new seed nodes >>> - update all the configuration files of all the nodes to write the IP >>> addresses of #2 and #3 >>> - DON'T restart any node - the new seed configuration will be picked >>> up only if the Cassandra process restarts >>> >> >> You can provide a custom implementation of the seed provider protocol: >> org.apache.cassandra.locator.SeedProvider >> >> We were exploring that approach few years ago with etcd, which I think >> provides capabilities similar to that of Consul: >> https://github.com/a1exsh/cassandra-etcd-seed-provider/blob/master/src/main/java/org/zalando/cassandra/locator/EtcdSeedProvider.java >> > > Hi Alex, > > we were using also a dedicated Consul seed provider but we weren't > confident enough about maintaining our version so we removed it in favor of > something simpler. > Ultimately, we hope(d) that delegating the maintenance of that list to an > external process (like Consul Template), directly updating the > configuration file, is (should be?) mostly similar without having to > maintain our own copy, built with the right version of Cassandra, etc. > > Thanks for the info though! > > Jonathan > >
Re: How seed nodes are working and how to upgrade/replace them?
Hi Jeff, thanks for answering to most of my points! >From the reloadseeds' ticket, I followed to https://issues.apache.org/jira/browse/CASSANDRA-3829 which was very instructive, although a bit old. On Mon, 7 Jan 2019 at 17:23, Jeff Jirsa wrote: > > On Jan 7, 2019, at 6:37 AM, Jonathan Ballet wrote: > > > [...] > > > In essence, in my example that would be: > > > > - decide that #2 and #3 will be the new seed nodes > > - update all the configuration files of all the nodes to write the IP > addresses of #2 and #3 > > - DON'T restart any node - the new seed configuration will be picked > up only if the Cassandra process restarts > > > > * If I can manage to sort my Cassandra nodes by their age, could it be a > strategy to have the seeds set to the 2 oldest nodes in the cluster? (This > implies these nodes would change as the cluster's nodes get > upgraded/replaced). > > You could do this, seems like a lot of headache for little benefit. Could > be done with simple seed provider and config management > (puppet/chef/ansible) laying down new yaml or with your own seed provider > So, just to make it clear: sorting by age isn't a goal in itself, it was just an example on how I could get a stable list. Right now, we have a dedicated group of seed nodes + a dedicated group for non-seeds: doing rolling-upgrade of the nodes from the second list is relatively painless (although slow) whereas we are facing the issues discussed in CASSANDRA-3829 for the first group which are non-seeds nodes are not bootstrapping automatically and we need to operate them in a more careful way. What I'm really looking for is a way to simplify adding and removing nodes into our (small) cluster: I can easily provide a small list of nodes from our cluster with our config management tool so that new nodes are discovering the rest of the cluster, but the documentation seems to imply that seed nodes also have other functions and I'm not sure what problems we could face trying to simplify this approach. Ideally, what I would like to have would be: * Considering a stable cluster (no new nodes, no nodes leaving), the N seeds should be always the same N nodes * Adding new nodes should not change that list * Stopping/removing one of these N nodes should "promote" another (non-seed) node as a seed - that would not restart the already running Cassandra nodes but would update their configuration files. - if a node restart for whatever reason it would pick up this new configuration So: no node would start its life as a seed, only a few already existing node would have this status. We would not have to deal with the "a seed node doesn't bootstrap" problem and it would make our operation process simpler. > > I also have some more general questions about seed nodes and how they > work: > > > > * I understand that seed nodes are used when a node starts and needs to > discover the rest of the cluster's nodes. Once the node has joined and the > cluster is stable, are seed nodes still playing a role in day to day > operations? > > They’re used probabilistically in gossip to encourage convergence. Mostly > useful in large clusters. > How "large" are we speaking here? How many nodes would it start to be considered "large"? Also, about the convergence: is this related to how fast/often the cluster topology is changing? (new nodes, leaving nodes, underlying IP addresses changing, etc.) Thanks for your answers! Jonathan
Re: How seed nodes are working and how to upgrade/replace them?
On Mon, 7 Jan 2019 at 16:51, Oleksandr Shulgin wrote: > On Mon, Jan 7, 2019 at 3:37 PM Jonathan Ballet wrote: > >> >> I'm working on how we could improve the upgrades of our servers and how >> to replace them completely (new instance with a new IP address). >> What I would like to do is to replace the machines holding our current >> seeds (#1 and #2 at the moment) in a rolling upgrade fashion, on a regular >> basis: >> >> * Is it possible to "promote" any non-seed node as a seed node? >> >> * Is it possible to "promote" a new seed node without having to restart >> all the nodes? >> In essence, in my example that would be: >> >> - decide that #2 and #3 will be the new seed nodes >> - update all the configuration files of all the nodes to write the IP >> addresses of #2 and #3 >> - DON'T restart any node - the new seed configuration will be picked up >> only if the Cassandra process restarts >> > > You can provide a custom implementation of the seed provider protocol: > org.apache.cassandra.locator.SeedProvider > > We were exploring that approach few years ago with etcd, which I think > provides capabilities similar to that of Consul: > https://github.com/a1exsh/cassandra-etcd-seed-provider/blob/master/src/main/java/org/zalando/cassandra/locator/EtcdSeedProvider.java > Hi Alex, we were using also a dedicated Consul seed provider but we weren't confident enough about maintaining our version so we removed it in favor of something simpler. Ultimately, we hope(d) that delegating the maintenance of that list to an external process (like Consul Template), directly updating the configuration file, is (should be?) mostly similar without having to maintain our own copy, built with the right version of Cassandra, etc. Thanks for the info though! Jonathan
Re: How seed nodes are working and how to upgrade/replace them?
> On Jan 7, 2019, at 8:23 AM, Jeff Jirsa wrote: > > > > >> On Jan 7, 2019, at 6:37 AM, Jonathan Ballet wrote: >> >> Hi, >> >> I'm trying to understand how seed nodes are working, when and how do they >> play a part in a Cassandra cluster, and how they should be managed and >> propagated to other nodes. >> >> I have a cluster of 6 Cassandra nodes (let's call them #1 to #6), on which >> node #1 and #2 are seeds. All the configuration files of all the Cassandra >> nodes are currently configured with: >> >> ``` >> seed_provider: >> - class_name: org.apache.cassandra.locator.SimpleSeedProvider >> parameters: >> - seeds: 'IP #1,IP #2' >> ``` >> >> We are using a service discovery tool (Consul) which automatically registers >> new Cassandra nodes with its dedicated health-check and are able to generate >> new configuration based on the content of the service discovery status (with >> Consul-Template). >> >> >> I'm working on how we could improve the upgrades of our servers and how to >> replace them completely (new instance with a new IP address). >> What I would like to do is to replace the machines holding our current seeds >> (#1 and #2 at the moment) in a rolling upgrade fashion, on a regular basis: >> >> * Is it possible to "promote" any non-seed node as a seed node? >> > > Yes - generally you can make any node a seed if you want > >> * Is it possible to "promote" a new seed node without having to restart all >> the nodes? > > nodetool reloadseeds This is apparently in 4.0+ https://issues.apache.org/jira/browse/CASSANDRA-14190 > > There are a few weird edge cases where seeds are reloaded automatically and > we don’t document how or why (it’s a side effect of an error condition in > hosts going up/down, but it’s generally pretty minor unless your seed > provider is broken) > > > (Also true that you could write a seed provider that did this automatically) > > >> In essence, in my example that would be: >> >> - decide that #2 and #3 will be the new seed nodes >> - update all the configuration files of all the nodes to write the IP >> addresses of #2 and #3 >> - DON'T restart any node - the new seed configuration will be picked up >> only if the Cassandra process restarts >> >> * If I can manage to sort my Cassandra nodes by their age, could it be a >> strategy to have the seeds set to the 2 oldest nodes in the cluster? (This >> implies these nodes would change as the cluster's nodes get >> upgraded/replaced). > > You could do this, seems like a lot of headache for little benefit. Could be > done with simple seed provider and config management (puppet/chef/ansible) > laying down new yaml or with your own seed provider > >> >> >> I also have some more general questions about seed nodes and how they work: >> >> * I understand that seed nodes are used when a node starts and needs to >> discover the rest of the cluster's nodes. Once the node has joined and the >> cluster is stable, are seed nodes still playing a role in day to day >> operations? > > They’re used probabilistically in gossip to encourage convergence. Mostly > useful in large clusters. > >> >> * The documentation says multiple times that not all nodes should be seed >> nodes, but I didn't really find any place about the consequences it has to >> have "too many" seed nodes. > > Decreases effectiveness of probabilistic gossiping with seed for convergence > >> Also, relatively to the questions I asked above, is there any downsides of >> having changing seed nodes in a cluster? (with the exact same, at some point >> I define #1 and #2 to be seeds, then later #4 and #5, etc.) >> > > No > >> >> Thanks for helping me to understand better how seeds are working! >> >> Jonathan >>
Re: How seed nodes are working and how to upgrade/replace them?
> On Jan 7, 2019, at 6:37 AM, Jonathan Ballet wrote: > > Hi, > > I'm trying to understand how seed nodes are working, when and how do they > play a part in a Cassandra cluster, and how they should be managed and > propagated to other nodes. > > I have a cluster of 6 Cassandra nodes (let's call them #1 to #6), on which > node #1 and #2 are seeds. All the configuration files of all the Cassandra > nodes are currently configured with: > > ``` > seed_provider: > - class_name: org.apache.cassandra.locator.SimpleSeedProvider > parameters: > - seeds: 'IP #1,IP #2' > ``` > > We are using a service discovery tool (Consul) which automatically registers > new Cassandra nodes with its dedicated health-check and are able to generate > new configuration based on the content of the service discovery status (with > Consul-Template). > > > I'm working on how we could improve the upgrades of our servers and how to > replace them completely (new instance with a new IP address). > What I would like to do is to replace the machines holding our current seeds > (#1 and #2 at the moment) in a rolling upgrade fashion, on a regular basis: > > * Is it possible to "promote" any non-seed node as a seed node? > Yes - generally you can make any node a seed if you want > * Is it possible to "promote" a new seed node without having to restart all > the nodes? nodetool reloadseeds There are a few weird edge cases where seeds are reloaded automatically and we don’t document how or why (it’s a side effect of an error condition in hosts going up/down, but it’s generally pretty minor unless your seed provider is broken) (Also true that you could write a seed provider that did this automatically) > In essence, in my example that would be: > > - decide that #2 and #3 will be the new seed nodes > - update all the configuration files of all the nodes to write the IP > addresses of #2 and #3 > - DON'T restart any node - the new seed configuration will be picked up > only if the Cassandra process restarts > > * If I can manage to sort my Cassandra nodes by their age, could it be a > strategy to have the seeds set to the 2 oldest nodes in the cluster? (This > implies these nodes would change as the cluster's nodes get > upgraded/replaced). You could do this, seems like a lot of headache for little benefit. Could be done with simple seed provider and config management (puppet/chef/ansible) laying down new yaml or with your own seed provider > > > I also have some more general questions about seed nodes and how they work: > > * I understand that seed nodes are used when a node starts and needs to > discover the rest of the cluster's nodes. Once the node has joined and the > cluster is stable, are seed nodes still playing a role in day to day > operations? They’re used probabilistically in gossip to encourage convergence. Mostly useful in large clusters. > > * The documentation says multiple times that not all nodes should be seed > nodes, but I didn't really find any place about the consequences it has to > have "too many" seed nodes. Decreases effectiveness of probabilistic gossiping with seed for convergence > Also, relatively to the questions I asked above, is there any downsides of > having changing seed nodes in a cluster? (with the exact same, at some point > I define #1 and #2 to be seeds, then later #4 and #5, etc.) > No > > Thanks for helping me to understand better how seeds are working! > > Jonathan > - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: How seed nodes are working and how to upgrade/replace them?
On Mon, Jan 7, 2019 at 3:37 PM Jonathan Ballet wrote: > > I'm working on how we could improve the upgrades of our servers and how to > replace them completely (new instance with a new IP address). > What I would like to do is to replace the machines holding our current > seeds (#1 and #2 at the moment) in a rolling upgrade fashion, on a regular > basis: > > * Is it possible to "promote" any non-seed node as a seed node? > > * Is it possible to "promote" a new seed node without having to restart > all the nodes? > In essence, in my example that would be: > > - decide that #2 and #3 will be the new seed nodes > - update all the configuration files of all the nodes to write the IP > addresses of #2 and #3 > - DON'T restart any node - the new seed configuration will be picked up > only if the Cassandra process restarts > You can provide a custom implementation of the seed provider protocol: org.apache.cassandra.locator.SeedProvider We were exploring that approach few years ago with etcd, which I think provides capabilities similar to that of Consul: https://github.com/a1exsh/cassandra-etcd-seed-provider/blob/master/src/main/java/org/zalando/cassandra/locator/EtcdSeedProvider.java We are not using this anymore, but for other reasons (namely, being too optimistic about putting Cassandra cluster into an AWS AutoScaling Group). The SeedProvider itslef seem to have worked as we have expected. Hope this helps, -- Alex
How seed nodes are working and how to upgrade/replace them?
Hi, I'm trying to understand how seed nodes are working, when and how do they play a part in a Cassandra cluster, and how they should be managed and propagated to other nodes. I have a cluster of 6 Cassandra nodes (let's call them #1 to #6), on which node #1 and #2 are seeds. All the configuration files of all the Cassandra nodes are currently configured with: ``` seed_provider: - class_name: org.apache.cassandra.locator.SimpleSeedProvider parameters: - seeds: 'IP #1,IP #2' ``` We are using a service discovery tool (Consul) which automatically registers new Cassandra nodes with its dedicated health-check and are able to generate new configuration based on the content of the service discovery status (with Consul-Template). I'm working on how we could improve the upgrades of our servers and how to replace them completely (new instance with a new IP address). What I would like to do is to replace the machines holding our current seeds (#1 and #2 at the moment) in a rolling upgrade fashion, on a regular basis: * Is it possible to "promote" any non-seed node as a seed node? * Is it possible to "promote" a new seed node without having to restart all the nodes? In essence, in my example that would be: - decide that #2 and #3 will be the new seed nodes - update all the configuration files of all the nodes to write the IP addresses of #2 and #3 - DON'T restart any node - the new seed configuration will be picked up only if the Cassandra process restarts * If I can manage to sort my Cassandra nodes by their age, could it be a strategy to have the seeds set to the 2 oldest nodes in the cluster? (This implies these nodes would change as the cluster's nodes get upgraded/replaced). I also have some more general questions about seed nodes and how they work: * I understand that seed nodes are used when a node starts and needs to discover the rest of the cluster's nodes. Once the node has joined and the cluster is stable, are seed nodes still playing a role in day to day operations? * The documentation says multiple times that not all nodes should be seed nodes, but I didn't really find any place about the consequences it has to have "too many" seed nodes. Also, relatively to the questions I asked above, is there any downsides of having changing seed nodes in a cluster? (with the exact same, at some point I define #1 and #2 to be seeds, then later #4 and #5, etc.) Thanks for helping me to understand better how seeds are working! Jonathan