Re: [openstack-dev] [Fuel] Stop deployment can break production cluster. How we should avoid it?

2016-01-23 Thread Kyrylo Galanov
Hello,

Why don't we introduce additional state for nodes like 're-deploying'. If
deployment was stopped we don't erase nodes with this state, but change the
status to 'error' or 'ready' , for example.
Or we can add warning message that 'stop' button would destroy every and
each node.

On Fri, Jan 22, 2016 at 8:15 PM, Vladimir Sharshov 
wrote:

> Hi!
>
> I also vote for solution "mark a cluster 'operational' after successful
> deployment". It is simple and guarantee that we do not erase main
> components.
> Also it will free resources to support stop/rerun(resume) feature on task
> based deployment which will works much better (without node destroy as side
> affect)
>
> On Fri, Jan 22, 2016 at 8:09 PM, Igor Kalnitsky 
> wrote:
>
>> Dmitry,
>>
>> > We can mark a cluster 'operational' after successful deployment. And we
>> > can disable 'stop' button on this kind of clusters.
>>
>> I think this is a best solution so far. Moreover, I don't know how to
>> fix it properly since there could be a lot of questions how this
>> button should behave at all.
>>
>> Taking into account all this, I propose to solve this issue as a
>> blueprint (so we can think and cover all edge cases in the spec) or
>> drop stop button functionality at all.
>>
>> The latest, perhaps, may be a good solution. I don't know how often
>> someone use Stop deployment.
>>
>>
>> Bogdan,
>>
>> > This is the critical issue. The *worst* of possible situations for
>> > cluster operations. I believe this should be covered by a dedicated
>> > bulletin issued, the stop action shall be disabled for all releases as
>> > emergency fix, and fixed by next maintenance updates.
>>
>> It wasn't always the case. Some time ago we didn't execute any tasks
>> on controllers when adding new nodes. It's become a case, I assume,
>> since Fuel 8.0, when we start executing netconfig and other puppet
>> task on each deployment run.
>>
>> So we need to investigate in which release we have introduced
>> re-execution some tasks on controllers, and only then thinking about
>> bulletins.
>>
>>
>> Thanks,
>> Igor
>>
>> On Fri, Jan 22, 2016 at 1:06 PM, Bogdan Dobrelya 
>> wrote:
>> > On 22.01.2016 11:45, Dmitry Pyzhov wrote:
>> >> Guys,
>> >>
>> >> There is a tricky bug with our 'stop deployment'
>> >> feature: https://bugs.launchpad.net/fuel/+bug/1529691
>> >>
>> >> It cannot be fixed easily because it is a design flaw. By design we
>> >> cannot leave a node in unpredictable state. So we move all nodes that
>> >> are not in ready state back to bootstrap.
>> >>
>> >> But when user adding a node and deploying cluster system reruns puppet
>> >> on controllers. If user press 'stop' button controllers will be erased.
>> >> Cluster will be destroyed. Definitely this is not expected behaviour.
>> >
>> > This is the critical issue. The *worst* of possible situations for
>> > cluster operations. I believe this should be covered by a dedicated
>> > bulletin issued, the stop action shall be disabled for all releases as
>> > emergency fix, and fixed by next maintenance updates.
>> >
>> >>
>> >> Taking into account that we are going to rewrite this feature in 9.0
>> and
>> >> we are close to HCF there is no value in major changes for this feature
>> >> in 8.0. Let's do a simple workaround.
>> >>
>> >> We can mark a cluster 'operational' after successful deployment. And we
>> >> can disable 'stop' button on this kind of clusters.
>> >>
>> >> Any concerns or other proposals?
>> >>
>> >>
>> >>
>> __
>> >> OpenStack Development Mailing List (not for usage questions)
>> >> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >>
>> >
>> >
>> > --
>> > Best regards,
>> > Bogdan Dobrelya,
>> > Irc #bogdando
>> >
>> >
>> __
>> > OpenStack Development Mailing List (not for usage questions)
>> > Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-b

Re: [openstack-dev] [Fuel] Stop deployment can break production cluster. How we should avoid it?

2016-01-22 Thread Vladimir Sharshov
Hi!

I also vote for solution "mark a cluster 'operational' after successful
deployment". It is simple and guarantee that we do not erase main
components.
Also it will free resources to support stop/rerun(resume) feature on task
based deployment which will works much better (without node destroy as side
affect)

On Fri, Jan 22, 2016 at 8:09 PM, Igor Kalnitsky 
wrote:

> Dmitry,
>
> > We can mark a cluster 'operational' after successful deployment. And we
> > can disable 'stop' button on this kind of clusters.
>
> I think this is a best solution so far. Moreover, I don't know how to
> fix it properly since there could be a lot of questions how this
> button should behave at all.
>
> Taking into account all this, I propose to solve this issue as a
> blueprint (so we can think and cover all edge cases in the spec) or
> drop stop button functionality at all.
>
> The latest, perhaps, may be a good solution. I don't know how often
> someone use Stop deployment.
>
>
> Bogdan,
>
> > This is the critical issue. The *worst* of possible situations for
> > cluster operations. I believe this should be covered by a dedicated
> > bulletin issued, the stop action shall be disabled for all releases as
> > emergency fix, and fixed by next maintenance updates.
>
> It wasn't always the case. Some time ago we didn't execute any tasks
> on controllers when adding new nodes. It's become a case, I assume,
> since Fuel 8.0, when we start executing netconfig and other puppet
> task on each deployment run.
>
> So we need to investigate in which release we have introduced
> re-execution some tasks on controllers, and only then thinking about
> bulletins.
>
>
> Thanks,
> Igor
>
> On Fri, Jan 22, 2016 at 1:06 PM, Bogdan Dobrelya 
> wrote:
> > On 22.01.2016 11:45, Dmitry Pyzhov wrote:
> >> Guys,
> >>
> >> There is a tricky bug with our 'stop deployment'
> >> feature: https://bugs.launchpad.net/fuel/+bug/1529691
> >>
> >> It cannot be fixed easily because it is a design flaw. By design we
> >> cannot leave a node in unpredictable state. So we move all nodes that
> >> are not in ready state back to bootstrap.
> >>
> >> But when user adding a node and deploying cluster system reruns puppet
> >> on controllers. If user press 'stop' button controllers will be erased.
> >> Cluster will be destroyed. Definitely this is not expected behaviour.
> >
> > This is the critical issue. The *worst* of possible situations for
> > cluster operations. I believe this should be covered by a dedicated
> > bulletin issued, the stop action shall be disabled for all releases as
> > emergency fix, and fixed by next maintenance updates.
> >
> >>
> >> Taking into account that we are going to rewrite this feature in 9.0 and
> >> we are close to HCF there is no value in major changes for this feature
> >> in 8.0. Let's do a simple workaround.
> >>
> >> We can mark a cluster 'operational' after successful deployment. And we
> >> can disable 'stop' button on this kind of clusters.
> >>
> >> Any concerns or other proposals?
> >>
> >>
> >>
> __
> >> OpenStack Development Mailing List (not for usage questions)
> >> Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>
> >
> >
> > --
> > Best regards,
> > Bogdan Dobrelya,
> > Irc #bogdando
> >
> >
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Fuel] Stop deployment can break production cluster. How we should avoid it?

2016-01-22 Thread Igor Kalnitsky
Dmitry,

> We can mark a cluster 'operational' after successful deployment. And we
> can disable 'stop' button on this kind of clusters.

I think this is a best solution so far. Moreover, I don't know how to
fix it properly since there could be a lot of questions how this
button should behave at all.

Taking into account all this, I propose to solve this issue as a
blueprint (so we can think and cover all edge cases in the spec) or
drop stop button functionality at all.

The latest, perhaps, may be a good solution. I don't know how often
someone use Stop deployment.


Bogdan,

> This is the critical issue. The *worst* of possible situations for
> cluster operations. I believe this should be covered by a dedicated
> bulletin issued, the stop action shall be disabled for all releases as
> emergency fix, and fixed by next maintenance updates.

It wasn't always the case. Some time ago we didn't execute any tasks
on controllers when adding new nodes. It's become a case, I assume,
since Fuel 8.0, when we start executing netconfig and other puppet
task on each deployment run.

So we need to investigate in which release we have introduced
re-execution some tasks on controllers, and only then thinking about
bulletins.


Thanks,
Igor

On Fri, Jan 22, 2016 at 1:06 PM, Bogdan Dobrelya  wrote:
> On 22.01.2016 11:45, Dmitry Pyzhov wrote:
>> Guys,
>>
>> There is a tricky bug with our 'stop deployment'
>> feature: https://bugs.launchpad.net/fuel/+bug/1529691
>>
>> It cannot be fixed easily because it is a design flaw. By design we
>> cannot leave a node in unpredictable state. So we move all nodes that
>> are not in ready state back to bootstrap.
>>
>> But when user adding a node and deploying cluster system reruns puppet
>> on controllers. If user press 'stop' button controllers will be erased.
>> Cluster will be destroyed. Definitely this is not expected behaviour.
>
> This is the critical issue. The *worst* of possible situations for
> cluster operations. I believe this should be covered by a dedicated
> bulletin issued, the stop action shall be disabled for all releases as
> emergency fix, and fixed by next maintenance updates.
>
>>
>> Taking into account that we are going to rewrite this feature in 9.0 and
>> we are close to HCF there is no value in major changes for this feature
>> in 8.0. Let's do a simple workaround.
>>
>> We can mark a cluster 'operational' after successful deployment. And we
>> can disable 'stop' button on this kind of clusters.
>>
>> Any concerns or other proposals?
>>
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
> --
> Best regards,
> Bogdan Dobrelya,
> Irc #bogdando
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Fuel] Stop deployment can break production cluster. How we should avoid it?

2016-01-22 Thread Bogdan Dobrelya
On 22.01.2016 11:45, Dmitry Pyzhov wrote:
> Guys,
> 
> There is a tricky bug with our 'stop deployment'
> feature: https://bugs.launchpad.net/fuel/+bug/1529691
> 
> It cannot be fixed easily because it is a design flaw. By design we
> cannot leave a node in unpredictable state. So we move all nodes that
> are not in ready state back to bootstrap.
> 
> But when user adding a node and deploying cluster system reruns puppet
> on controllers. If user press 'stop' button controllers will be erased.
> Cluster will be destroyed. Definitely this is not expected behaviour.

This is the critical issue. The *worst* of possible situations for
cluster operations. I believe this should be covered by a dedicated
bulletin issued, the stop action shall be disabled for all releases as
emergency fix, and fixed by next maintenance updates.

> 
> Taking into account that we are going to rewrite this feature in 9.0 and
> we are close to HCF there is no value in major changes for this feature
> in 8.0. Let's do a simple workaround.
> 
> We can mark a cluster 'operational' after successful deployment. And we
> can disable 'stop' button on this kind of clusters.
> 
> Any concerns or other proposals?
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 


-- 
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Fuel] Stop deployment can break production cluster. How we should avoid it?

2016-01-22 Thread Dmitry Pyzhov
Guys,

There is a tricky bug with our 'stop deployment' feature:
https://bugs.launchpad.net/fuel/+bug/1529691

It cannot be fixed easily because it is a design flaw. By design we cannot
leave a node in unpredictable state. So we move all nodes that are not in
ready state back to bootstrap.

But when user adding a node and deploying cluster system reruns puppet on
controllers. If user press 'stop' button controllers will be erased.
Cluster will be destroyed. Definitely this is not expected behaviour.

Taking into account that we are going to rewrite this feature in 9.0 and we
are close to HCF there is no value in major changes for this feature in
8.0. Let's do a simple workaround.

We can mark a cluster 'operational' after successful deployment. And we can
disable 'stop' button on this kind of clusters.

Any concerns or other proposals?
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev