[jira] [Comment Edited] (CASSANDRA-5836) Seed nodes should be able to bootstrap without manual intervention
[ https://issues.apache.org/jira/browse/CASSANDRA-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16400082#comment-16400082 ] Oleksandr Shulgin edited comment on CASSANDRA-5836 at 3/15/18 8:35 AM: --- {quote}system.available_ranges works off keyspaces, so rebuild will still work fine as long as you didn't add RF before provisioning the DC (e.g you didn't bootstrap the NTS keyspaces){quote} You are correct, I had a false assumption here. But then I don't see at all where does the recommendation to set {{auto_bootstrap=false}} for new DC come from? I believed the reason was that {{nodetool rebuild}} won't work otherwise, but it's not the case apparently. If we can simply drop this recommendation from the docs that would be a great thing, IMO. By following the doc in its current form it is not unlikely that one can accidentally add some nodes with {{auto_bootstrap=false}} to *existing* DC, simply by messing up the DC suffix parameter. With the default setting of {{auto_bootstrap} such a configuration error is mostly harmless and is easy to rollback. {quote}> At the same time, new cluster startup process can be arbitrarily complex No, it can't. {quote} Unfortunately, it already is. For example, look at our home-grown automation code to create new Cassandra clusters on AWS: https://github.com/zalando-stups/planb-cassandra/blob/master/planb/create_cluster.py That's already close to 1,000 lines of Python. {quote}Cassandra is hard enough to use as it is, and we really shouldn't be making operations more complex.{quote} Creating a new cluster is the operation with the least possible potential impact of all, and you do it only once in a lifetime of a cluster. I would go as far as saying it doesn't even belong to "ops". Restarts, upgrades, bootstrapping new nodes and DCs: these are the operations and we shouldn't make "introduction to Cassandra" easier at the cost of making *these* more complex or risky. {quote}Far more logical to be able to say that "All nodes will respect the auto_bootstrap setting regardless of their configuration". The only caveat is that the first node won't bootstrap,..{quote} That's already a contradiction, don't you think? And more precisely it should be spelled as "if a node *believes* it is the very first one". A big question to me still: can this be done in the code reliably? {quote}... but to users this is irrelevant and they don't need to know about it.{quote} This attitude is exactly what makes Cassandra hard to use in my experience. :( I cannot even count the number of times when I had to dive deeply into the source code trying to figure some detail which was not properly documented, because the devs thought the same: users don't need to know about it... was (Author: oshulgin): {quote}system.available_ranges works off keyspaces, so rebuild will still work fine as long as you didn't add RF before provisioning the DC (e.g you didn't bootstrap the NTS keyspaces){quote} You are correct, I had a false assumption here. But then I don't see at all where does the recommendation to set {{auto_bootstrap=false}} for new DC come from? I believed the reason was that {{nodetool rebuild}} won't work otherwise, but it's not the case apparently. If we can simply drop this recommendation from the docs that would be a great thing, IMO. By following the doc in its current form it is not unlikely that one can accidentally add some nodes with {{auto_bootstrap=false}} to *existing* DC, simply by messing up the DC suffix parameter. With the default setting of {{auto_bootstrap} such a configuration error is mostly harmless and is easy to rollback. {quote}> At the same time, new cluster startup process can be arbitrarily complex No, it can't. {quote} Unfortunately, it already is. For example, look at our home-grown automation code to create new Cassandra clusters on AWS: https://github.com/zalando-stups/planb-cassandra/blob/master/planb/create_cluster.py That's already close to 1,000 lines of Python. {quote}Cassandra is hard enough to use as it is, and we really shouldn't be making operations more complex.{quote} Creating a new cluster is the operation with the least possible potential impact of all, and you do it only once in a lifetime of a cluster. I would go as far as saying it doesn't even belong to "ops". Restarts, upgrades, bootstrapping new nodes and DCs: these are the operations and we shouldn't make "introduction to Cassandra" easier at the cost of making *these* more complex or risky. {quote}Far more logical to be able to say that "All nodes will respect the auto_bootstrap setting regardless of their configuration". The only caveat is that the first node won't bootstrap,..{quote} That's already a contradiction, don't you think? And more precisely it should be spelled as "if a node *believes
[jira] [Comment Edited] (CASSANDRA-5836) Seed nodes should be able to bootstrap without manual intervention
[ https://issues.apache.org/jira/browse/CASSANDRA-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385559#comment-16385559 ] Kurt Greaves edited comment on CASSANDRA-5836 at 3/5/18 4:42 AM: - Glad some more discussion is happening again here. This has always been a pain point for operators. I doubt many people can actually list every single different startup case in Cassandra, there are a hell of a lot. On the seed issue; in the past I've gone so far as to write a patch for making only the first seed a special case, such as already mentioned by [~oshulgin] to at least fix the issue of maintenance on seed nodes. There's a patch [here|https://github.com/apache/cassandra/compare/trunk...kgreav:13851-extension-3.11] if anyone cares, think it's more or less working and only missing tests at the moment (but I'm probably wrong). Note it's built on my patch for CASSANDRA-13851, so not all code there is relevant. {quote}My understanding is that the 'seed node' role has a significant initial-topology-discovery responsibility which I have not seen mentioned in recent discussions. {quote} Not my understanding. A seed simply defines a node to connect to to join the cluster. No topology information is transmitted between the seed and the node contacting it. If it did, this would bring it's own complexities and likely make Gossip really expensive (especially on large clusters). {quote}Also, as of my last knowledge of this code, a given node will gossip with a Seed node more frequently than its other peers, which I believe is "just an optimization of gossip" but seems notable. {quote} Yep, just an optimisation but is important. For the most part however it shouldn't have any effect on the bootstrap case. {quote}I also recall past dev discussion (with driftx?) suggesting that the "correct" solution in their view is an external seed provider. {quote} Yeah, the correct solution is external seed provider or not breaking your config management, but we can still do better here. Especially in the replaces case, and probably the new DC case. {quote}Another case when you add seed nodes is when adding a new DC. In this case they are not the first ones to start so they could bootstrap, but most of the time this is not what you want, so you set auto_bootstrap=false for every node in the new DC, including the new seeds. {quote} It's worth noting here that there is the case of {{SimpleStrategy}} in which you wouldn't want auto_bootstrap=false (this affects auth, traces, system_distributed). This is specifically why you would want every node to bootstrap in a new DC (including seeds). The alternative is to get rid of {{SimpleStrategy}} (or at least stop using it as a default). {quote}In the case where seeds nodes can not be contacted, how do you determine if this is the first node in a cluster (so we should special case and skip bootstrap) vs a mis-configuration or other seeds are down issues and therefor the bootstrap should fail? {quote} If the listed seed isn't itself then you fail. This is how it currently works as well. That is, the first node in the cluster has itself as a seed and also can't contact any other seeds in its seed list. I'm pretty sure my patch above works this way as if there are seeds they should be present in the {{endpointShadowStateMap}} after the SR. There may be some edge cases to think of here though like starting multiple seeds at the same time. Also related is CASSANDRA-14073, which will fix the case where you replace a seed node and it doesn't bootstrap. This one is more important IMO as it's more likely for config management not to handle this case. was (Author: kurtg): Glad some more discussion is happening again here. This has always been a pain point for operators. I doubt many people can actually list every single different startup case in Cassandra, there are a hell of a lot. On the seed issue; in the past I've gone so far as to write a patch for making only the first seed a special case, such as already mentioned by [~oshulgin] to at least fix the issue of replacing seed nodes. There's a patch [here|https://github.com/apache/cassandra/compare/trunk...kgreav:13851-extension-3.11] if anyone cares, think it's more or less working and only missing tests at the moment (but I'm probably wrong). Note it's built on my patch for CASSANDRA-13851, so not all code there is relevant. {quote}My understanding is that the 'seed node' role has a significant initial-topology-discovery responsibility which I have not seen mentioned in recent discussions. {quote} Not my understanding. A seed simply defines a node to connect to to join the cluster. No topology information is transmitted between the seed and the node contacting it. If it did, this would bring it's own complexities and likely make Gossip really expensive (especially on large clusters).
[jira] [Comment Edited] (CASSANDRA-5836) Seed nodes should be able to bootstrap without manual intervention
[ https://issues.apache.org/jira/browse/CASSANDRA-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385559#comment-16385559 ] Kurt Greaves edited comment on CASSANDRA-5836 at 3/5/18 4:42 AM: - Glad some more discussion is happening again here. This has always been a pain point for operators. I doubt many people can actually list every single different startup case in Cassandra, there are a hell of a lot. On the seed issue; in the past I've gone so far as to write a patch for making only the first seed a special case, such as already mentioned by [~oshulgin] to at least fix the issue of maintenance on seed nodes/adding new nodes as seeds. There's a patch [here|https://github.com/apache/cassandra/compare/trunk...kgreav:13851-extension-3.11] if anyone cares, think it's more or less working and only missing tests at the moment (but I'm probably wrong). Note it's built on my patch for CASSANDRA-13851, so not all code there is relevant. {quote}My understanding is that the 'seed node' role has a significant initial-topology-discovery responsibility which I have not seen mentioned in recent discussions. {quote} Not my understanding. A seed simply defines a node to connect to to join the cluster. No topology information is transmitted between the seed and the node contacting it. If it did, this would bring it's own complexities and likely make Gossip really expensive (especially on large clusters). {quote}Also, as of my last knowledge of this code, a given node will gossip with a Seed node more frequently than its other peers, which I believe is "just an optimization of gossip" but seems notable. {quote} Yep, just an optimisation but is important. For the most part however it shouldn't have any effect on the bootstrap case. {quote}I also recall past dev discussion (with driftx?) suggesting that the "correct" solution in their view is an external seed provider. {quote} Yeah, the correct solution is external seed provider or not breaking your config management, but we can still do better here. Especially in the replaces case, and probably the new DC case. {quote}Another case when you add seed nodes is when adding a new DC. In this case they are not the first ones to start so they could bootstrap, but most of the time this is not what you want, so you set auto_bootstrap=false for every node in the new DC, including the new seeds. {quote} It's worth noting here that there is the case of {{SimpleStrategy}} in which you wouldn't want auto_bootstrap=false (this affects auth, traces, system_distributed). This is specifically why you would want every node to bootstrap in a new DC (including seeds). The alternative is to get rid of {{SimpleStrategy}} (or at least stop using it as a default). {quote}In the case where seeds nodes can not be contacted, how do you determine if this is the first node in a cluster (so we should special case and skip bootstrap) vs a mis-configuration or other seeds are down issues and therefor the bootstrap should fail? {quote} If the listed seed isn't itself then you fail. This is how it currently works as well. That is, the first node in the cluster has itself as a seed and also can't contact any other seeds in its seed list. I'm pretty sure my patch above works this way as if there are seeds they should be present in the {{endpointShadowStateMap}} after the SR. There may be some edge cases to think of here though like starting multiple seeds at the same time. Also related is CASSANDRA-14073, which will fix the case where you replace a seed node and it doesn't bootstrap. This one is more important IMO as it's more likely for config management not to handle this case. was (Author: kurtg): Glad some more discussion is happening again here. This has always been a pain point for operators. I doubt many people can actually list every single different startup case in Cassandra, there are a hell of a lot. On the seed issue; in the past I've gone so far as to write a patch for making only the first seed a special case, such as already mentioned by [~oshulgin] to at least fix the issue of maintenance on seed nodes. There's a patch [here|https://github.com/apache/cassandra/compare/trunk...kgreav:13851-extension-3.11] if anyone cares, think it's more or less working and only missing tests at the moment (but I'm probably wrong). Note it's built on my patch for CASSANDRA-13851, so not all code there is relevant. {quote}My understanding is that the 'seed node' role has a significant initial-topology-discovery responsibility which I have not seen mentioned in recent discussions. {quote} Not my understanding. A seed simply defines a node to connect to to join the cluster. No topology information is transmitted between the seed and the node contacting it. If it did, this would bring it's own complexities and likely make Gossip really expensive
[jira] [Comment Edited] (CASSANDRA-5836) Seed nodes should be able to bootstrap without manual intervention
[ https://issues.apache.org/jira/browse/CASSANDRA-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383984#comment-16383984 ] Robert Coli edited comment on CASSANDRA-5836 at 3/2/18 7:02 PM: I should probably join #cassandra-dev IRC and chat about this there, but I'd like to refer people to this comment up-ticket : https://issues.apache.org/jira/browse/CASSANDRA-5836?focusedCommentId=13727032&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13727032 Nobody really seems to understand why it's not safe for a seed node to bootstrap, because the workaround is to temporarily pretend the node isn't a seed and to bootstrap it. Usually one doesn't even inform the other nodes that it temporarily isn't a seed, and nothing unsafe seems to happen. I feel like clarity here starts with explaining in what cases a Seed node is actually "Seeding" and what "Seeding" means and does not mean. My understanding is that the 'seed node' role has a significant initial-topology-discovery responsibility which I have not seen mentioned in recent discussions. This dovetails with needing to understand seed node behavior in the "restoring from snapshot" case, where the topology is known from existing cluster information and therefore may or may not "need" to be discovered from a seed. Also, as of my last knowledge of this code, a given node will gossip with a Seed node more frequently than its other peers, which I believe is "just an optimization of gossip" but seems notable. I also recall past dev discussion (with driftx?) suggesting that the "correct" solution in their view is an external seed provider. So in summary my understanding of the complete responsibilities of a seed, independent from whether it's serving as a bootstrap source or bootstrapping itself : 1) provide other nodes which consider it a seed with initial topology 2) provide "faster" topology updates to nodes which have me listed in their seed provider The minimum requirement for a new node joining the cluster seems to be a single seed node that can inform it of topology in a timely manner. If that's correct and we imagine that all nodes use a seed provider that always returns at least one available node that can fulfill that role, the problem (?) of not being able to bootstrap seed nodes seems to disappear? was (Author: rcoli): I should probably join #cassandra-dev IRC and chat about this there, but I'd like to refer people to this comment up-ticket : https://issues.apache.org/jira/browse/CASSANDRA-5836?focusedCommentId=13727032&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13727032 Nobody really seems to understand why it's not safe for a seed node to bootstrap, because the workaround is to temporarily pretend the node isn't a seed and to bootstrap it. Usually one doesn't even inform the other nodes that it temporarily isn't a seed, and nothing unsafe seems to happen. I feel like clarity here starts with explaining in what cases a Seed node is actually "Seeding" and what "Seeding" means and does not mean. My understanding is that the 'seed node' role has a significant initial-topology-discovery responsibility which I have not seen mentioned in recent discussions. This dovetails with needing to understand seed node behavior in the "restoring from snapshot" case, where the topology is known from existing cluster information and therefore may or may not "need" to be discovered from a seed. Also, as of my last knowledge of this code, a given node will gossip with a Seed node more frequently than its other peers, which I believe is "just an optimization of gossip" but seems notable. I also recall past dev discussion (with driftx?) suggesting that the "correct" solution in their view is an external seed provider. So in summary my understanding of the complete responsibilities of a seed, independent from whether it's serving as a bootstrap source or bootstrapping itself : #) provide other nodes which consider it a seed with initial topology #) provide "faster" topology updates to nodes which have me listed in their seed provider The minimum requirement for a new node joining the cluster seems to be a single seed node that can inform it of topology in a timely manner. If that's correct and we imagine that all nodes use a seed provider that always returns at least one available node that can fulfill that role, the problem (?) of not being able to bootstrap seed nodes seems to disappear? > Seed nodes should be able to bootstrap without manual intervention > -- > > Key: CASSANDRA-5836 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5836 > Project: Cassandra > Issue Type: Bug >Reporter: Bill Hathaway >
[jira] [Comment Edited] (CASSANDRA-5836) Seed nodes should be able to bootstrap without manual intervention
[ https://issues.apache.org/jira/browse/CASSANDRA-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381561#comment-16381561 ] Robert Coli edited comment on CASSANDRA-5836 at 3/1/18 6:07 AM: [~oshulgin] : as I understand it, a bootstrapping node also receives "extra" copies of writes via the storage protocol, which is not technically "streaming in" the data. These "extra" copies do not count towards CL. While I'm commenting on this ticket, it seems appropriate to share my enthusiasm for resolving the question of the bootstrapping seed nodes. This has been a longstanding point of pain and confusion for operators and those who support them. was (Author: rcoli): [~oshulgin] : as I understand it, a bootstrapping node also receives "extra" copies of writes via the storage protocol, which is not technically "streaming in" the data. These "extra" copies do not count towards CL. While I'm commenting on this ticket, it seems appropriate to share my enthusiasm for resolving the question of the bootstrapping seed nodes. This has been a longstanding point of pain and confusion for operators and those who support them. > Seed nodes should be able to bootstrap without manual intervention > -- > > Key: CASSANDRA-5836 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5836 > Project: Cassandra > Issue Type: Bug >Reporter: Bill Hathaway >Priority: Minor > > The current logic doesn't allow a seed node to be bootstrapped. If a user > wants to bootstrap a node configured as a seed (for example to replace a seed > node via replace_token), they first need to remove the node's own IP from the > seed list, and then start the bootstrap process. This seems like an > unnecessary step since a node never uses itself as a seed. > I think it would be a better experience if the logic was changed to allow a > seed node to bootstrap without manual intervention when there are other seed > nodes up in a ring. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org