Re: [openstack-dev] [Fuel] Stop deployment can break production cluster. How we should avoid it?
Hello, Why don't we introduce additional state for nodes like 're-deploying'. If deployment was stopped we don't erase nodes with this state, but change the status to 'error' or 'ready' , for example. Or we can add warning message that 'stop' button would destroy every and each node. On Fri, Jan 22, 2016 at 8:15 PM, Vladimir Sharshov wrote: > Hi! > > I also vote for solution "mark a cluster 'operational' after successful > deployment". It is simple and guarantee that we do not erase main > components. > Also it will free resources to support stop/rerun(resume) feature on task > based deployment which will works much better (without node destroy as side > affect) > > On Fri, Jan 22, 2016 at 8:09 PM, Igor Kalnitsky > wrote: > >> Dmitry, >> >> > We can mark a cluster 'operational' after successful deployment. And we >> > can disable 'stop' button on this kind of clusters. >> >> I think this is a best solution so far. Moreover, I don't know how to >> fix it properly since there could be a lot of questions how this >> button should behave at all. >> >> Taking into account all this, I propose to solve this issue as a >> blueprint (so we can think and cover all edge cases in the spec) or >> drop stop button functionality at all. >> >> The latest, perhaps, may be a good solution. I don't know how often >> someone use Stop deployment. >> >> >> Bogdan, >> >> > This is the critical issue. The *worst* of possible situations for >> > cluster operations. I believe this should be covered by a dedicated >> > bulletin issued, the stop action shall be disabled for all releases as >> > emergency fix, and fixed by next maintenance updates. >> >> It wasn't always the case. Some time ago we didn't execute any tasks >> on controllers when adding new nodes. It's become a case, I assume, >> since Fuel 8.0, when we start executing netconfig and other puppet >> task on each deployment run. >> >> So we need to investigate in which release we have introduced >> re-execution some tasks on controllers, and only then thinking about >> bulletins. >> >> >> Thanks, >> Igor >> >> On Fri, Jan 22, 2016 at 1:06 PM, Bogdan Dobrelya >> wrote: >> > On 22.01.2016 11:45, Dmitry Pyzhov wrote: >> >> Guys, >> >> >> >> There is a tricky bug with our 'stop deployment' >> >> feature: https://bugs.launchpad.net/fuel/+bug/1529691 >> >> >> >> It cannot be fixed easily because it is a design flaw. By design we >> >> cannot leave a node in unpredictable state. So we move all nodes that >> >> are not in ready state back to bootstrap. >> >> >> >> But when user adding a node and deploying cluster system reruns puppet >> >> on controllers. If user press 'stop' button controllers will be erased. >> >> Cluster will be destroyed. Definitely this is not expected behaviour. >> > >> > This is the critical issue. The *worst* of possible situations for >> > cluster operations. I believe this should be covered by a dedicated >> > bulletin issued, the stop action shall be disabled for all releases as >> > emergency fix, and fixed by next maintenance updates. >> > >> >> >> >> Taking into account that we are going to rewrite this feature in 9.0 >> and >> >> we are close to HCF there is no value in major changes for this feature >> >> in 8.0. Let's do a simple workaround. >> >> >> >> We can mark a cluster 'operational' after successful deployment. And we >> >> can disable 'stop' button on this kind of clusters. >> >> >> >> Any concerns or other proposals? >> >> >> >> >> >> >> __ >> >> OpenStack Development Mailing List (not for usage questions) >> >> Unsubscribe: >> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> >> > >> > >> > -- >> > Best regards, >> > Bogdan Dobrelya, >> > Irc #bogdando >> > >> > >> __ >> > OpenStack Development Mailing List (not for usage questions) >> > Unsubscribe: >> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> __ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: >> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-b
Re: [openstack-dev] [Fuel] Stop deployment can break production cluster. How we should avoid it?
Hi! I also vote for solution "mark a cluster 'operational' after successful deployment". It is simple and guarantee that we do not erase main components. Also it will free resources to support stop/rerun(resume) feature on task based deployment which will works much better (without node destroy as side affect) On Fri, Jan 22, 2016 at 8:09 PM, Igor Kalnitsky wrote: > Dmitry, > > > We can mark a cluster 'operational' after successful deployment. And we > > can disable 'stop' button on this kind of clusters. > > I think this is a best solution so far. Moreover, I don't know how to > fix it properly since there could be a lot of questions how this > button should behave at all. > > Taking into account all this, I propose to solve this issue as a > blueprint (so we can think and cover all edge cases in the spec) or > drop stop button functionality at all. > > The latest, perhaps, may be a good solution. I don't know how often > someone use Stop deployment. > > > Bogdan, > > > This is the critical issue. The *worst* of possible situations for > > cluster operations. I believe this should be covered by a dedicated > > bulletin issued, the stop action shall be disabled for all releases as > > emergency fix, and fixed by next maintenance updates. > > It wasn't always the case. Some time ago we didn't execute any tasks > on controllers when adding new nodes. It's become a case, I assume, > since Fuel 8.0, when we start executing netconfig and other puppet > task on each deployment run. > > So we need to investigate in which release we have introduced > re-execution some tasks on controllers, and only then thinking about > bulletins. > > > Thanks, > Igor > > On Fri, Jan 22, 2016 at 1:06 PM, Bogdan Dobrelya > wrote: > > On 22.01.2016 11:45, Dmitry Pyzhov wrote: > >> Guys, > >> > >> There is a tricky bug with our 'stop deployment' > >> feature: https://bugs.launchpad.net/fuel/+bug/1529691 > >> > >> It cannot be fixed easily because it is a design flaw. By design we > >> cannot leave a node in unpredictable state. So we move all nodes that > >> are not in ready state back to bootstrap. > >> > >> But when user adding a node and deploying cluster system reruns puppet > >> on controllers. If user press 'stop' button controllers will be erased. > >> Cluster will be destroyed. Definitely this is not expected behaviour. > > > > This is the critical issue. The *worst* of possible situations for > > cluster operations. I believe this should be covered by a dedicated > > bulletin issued, the stop action shall be disabled for all releases as > > emergency fix, and fixed by next maintenance updates. > > > >> > >> Taking into account that we are going to rewrite this feature in 9.0 and > >> we are close to HCF there is no value in major changes for this feature > >> in 8.0. Let's do a simple workaround. > >> > >> We can mark a cluster 'operational' after successful deployment. And we > >> can disable 'stop' button on this kind of clusters. > >> > >> Any concerns or other proposals? > >> > >> > >> > __ > >> OpenStack Development Mailing List (not for usage questions) > >> Unsubscribe: > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >> > > > > > > -- > > Best regards, > > Bogdan Dobrelya, > > Irc #bogdando > > > > > __ > > OpenStack Development Mailing List (not for usage questions) > > Unsubscribe: > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Fuel] Stop deployment can break production cluster. How we should avoid it?
Dmitry, > We can mark a cluster 'operational' after successful deployment. And we > can disable 'stop' button on this kind of clusters. I think this is a best solution so far. Moreover, I don't know how to fix it properly since there could be a lot of questions how this button should behave at all. Taking into account all this, I propose to solve this issue as a blueprint (so we can think and cover all edge cases in the spec) or drop stop button functionality at all. The latest, perhaps, may be a good solution. I don't know how often someone use Stop deployment. Bogdan, > This is the critical issue. The *worst* of possible situations for > cluster operations. I believe this should be covered by a dedicated > bulletin issued, the stop action shall be disabled for all releases as > emergency fix, and fixed by next maintenance updates. It wasn't always the case. Some time ago we didn't execute any tasks on controllers when adding new nodes. It's become a case, I assume, since Fuel 8.0, when we start executing netconfig and other puppet task on each deployment run. So we need to investigate in which release we have introduced re-execution some tasks on controllers, and only then thinking about bulletins. Thanks, Igor On Fri, Jan 22, 2016 at 1:06 PM, Bogdan Dobrelya wrote: > On 22.01.2016 11:45, Dmitry Pyzhov wrote: >> Guys, >> >> There is a tricky bug with our 'stop deployment' >> feature: https://bugs.launchpad.net/fuel/+bug/1529691 >> >> It cannot be fixed easily because it is a design flaw. By design we >> cannot leave a node in unpredictable state. So we move all nodes that >> are not in ready state back to bootstrap. >> >> But when user adding a node and deploying cluster system reruns puppet >> on controllers. If user press 'stop' button controllers will be erased. >> Cluster will be destroyed. Definitely this is not expected behaviour. > > This is the critical issue. The *worst* of possible situations for > cluster operations. I believe this should be covered by a dedicated > bulletin issued, the stop action shall be disabled for all releases as > emergency fix, and fixed by next maintenance updates. > >> >> Taking into account that we are going to rewrite this feature in 9.0 and >> we are close to HCF there is no value in major changes for this feature >> in 8.0. Let's do a simple workaround. >> >> We can mark a cluster 'operational' after successful deployment. And we >> can disable 'stop' button on this kind of clusters. >> >> Any concerns or other proposals? >> >> >> __ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > > > -- > Best regards, > Bogdan Dobrelya, > Irc #bogdando > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Fuel] Stop deployment can break production cluster. How we should avoid it?
On 22.01.2016 11:45, Dmitry Pyzhov wrote: > Guys, > > There is a tricky bug with our 'stop deployment' > feature: https://bugs.launchpad.net/fuel/+bug/1529691 > > It cannot be fixed easily because it is a design flaw. By design we > cannot leave a node in unpredictable state. So we move all nodes that > are not in ready state back to bootstrap. > > But when user adding a node and deploying cluster system reruns puppet > on controllers. If user press 'stop' button controllers will be erased. > Cluster will be destroyed. Definitely this is not expected behaviour. This is the critical issue. The *worst* of possible situations for cluster operations. I believe this should be covered by a dedicated bulletin issued, the stop action shall be disabled for all releases as emergency fix, and fixed by next maintenance updates. > > Taking into account that we are going to rewrite this feature in 9.0 and > we are close to HCF there is no value in major changes for this feature > in 8.0. Let's do a simple workaround. > > We can mark a cluster 'operational' after successful deployment. And we > can disable 'stop' button on this kind of clusters. > > Any concerns or other proposals? > > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Fuel] Stop deployment can break production cluster. How we should avoid it?
Guys, There is a tricky bug with our 'stop deployment' feature: https://bugs.launchpad.net/fuel/+bug/1529691 It cannot be fixed easily because it is a design flaw. By design we cannot leave a node in unpredictable state. So we move all nodes that are not in ready state back to bootstrap. But when user adding a node and deploying cluster system reruns puppet on controllers. If user press 'stop' button controllers will be erased. Cluster will be destroyed. Definitely this is not expected behaviour. Taking into account that we are going to rewrite this feature in 9.0 and we are close to HCF there is no value in major changes for this feature in 8.0. Let's do a simple workaround. We can mark a cluster 'operational' after successful deployment. And we can disable 'stop' button on this kind of clusters. Any concerns or other proposals? __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev