On Tue, Dec 29, 2015 at 5:35 AM Sergii Golovatiuk <sgolovat...@mirantis.com> wrote:
> Hi, > > Let me comment inline. > > > On Mon, Dec 28, 2015 at 7:06 PM, Andrew Woodward <xar...@gmail.com> wrote: > >> In order to ensure that LVM can be configured as desired, its necessary >> to purge them and then reboot the node, otherwise the partitioning commands >> will most likely fail on the next attempt as they will be initialized >> before we can start partitioning the node. Hence, when a node is removed >> from the environment, it is supposed to have this data destroyed. Since >> it's a running system, the most effective way was to blast the first 1Mb of >> each partition. (with out many more reboots) >> >> As to the fallback to SSH, there are two times we use this process, with >> the node reboot (after cobbler/IBP finishes), and with the wipe as we are >> discussing here. These are for the odd occurrences of the nodes failing to >> restart after the MCO command. I don't think anyone has had much success >> trying to figure out why this occurs, but I've seen nodes get stuck in >> provisioning and remove in multiple environments using 6.1 where they >> managed to break the SSH Fallback. It would occur around 1/20 nodes >> seemingly randomly. So with the SSH fallback I nearly never see the failure >> in node reboot. >> > > If we talk about 6.1-7.0 release there shouldn't be any problems with mco > reboot. SSH fallback must be deprecated at all. > As I noted, I've see several 6.1 deployments where it was needed, I'd consider it still very much in use. In other cases it might be necessary to attempt to deal with a node who's MCO agent is dead, IMO they should be kept. >> > >> > >> > >> On Thu, Dec 24, 2015 at 6:28 AM Alex Schultz <aschu...@mirantis.com> >> wrote: >> >>> On Thu, Dec 24, 2015 at 1:29 AM, Artur Svechnikov >>> <asvechni...@mirantis.com> wrote: >>> > Hi, >>> > We have faced the issue that nodes' disks are wiped after stop >>> deployment. >>> > It occurs due to the logic of nodes removing (this is old logic and >>> it's not >>> > actual already as I understand). This logic contains step which calls >>> > erase_node[0], also there is another method with wipe of disks [1]. >>> AFAIK it >>> > was needed for smooth cobbler provision and ensure that nodes will not >>> be >>> > booted from disk when it shouldn't. Instead of cobbler we use IBP from >>> > fuel-agent where current partition table is wiped before provision >>> stage. >>> > And use disks wiping for insurance that nodes will not booted from disk >>> > doesn't seem good solution. I want to propose not to wipe disks and >>> simply >>> > unset bootable flag from node disks. >>> >> > Disks must be wiped as boot flag doesn't guarantee anything. If bootlag is > not set, BIOS will ignore ignore the device in boot-order. More over, 2 > partitions may have bootflag or operator may set to skip boot-order in BIOS. > > > >>> > Please share your thoughts. Perhaps some other components use the fact >>> that >>> > disks are wiped after node removing or stop deployment. If it's so, >>> then >>> > please tell about it. >>> > >>> > [0] >>> > >>> https://github.com/openstack/fuel-astute/blob/master/lib/astute/nodes_remover.rb#L132-L137 >>> > [1] >>> > >>> https://github.com/openstack/fuel-astute/blob/master/lib/astute/ssh_actions/ssh_erase_nodes.rb >>> > >>> >>> I thought the erase_node[0] mcollective action was the process that >>> cleared a node's disks after their removal from an environment. When >>> do we use the ssh_erase_nodes? Is it a fall back mechanism if the >>> mcollective fails? My understanding on the history is based around >>> needing to have the partitions and data wiped so that the LVM groups >>> and other partition information does not interfere with the >>> installation process the next time the node is provisioned. That >>> might have been a side effect of cobbler and we should test if it's >>> still an issue for IBP. >>> >> > Since we do not use classical provision anymore, we have mco connection > all the time. During IBP we have it as part of bootstrap, after reboot, mco > is still present so all actions should be done via mco. > > >> >>> >>> Thanks, >>> -Alex >>> >>> [0] >>> https://github.com/openstack/fuel-astute/blob/master/mcagents/erase_node.rb >>> >>> > Best regards, >>> > Svechnikov Artur >>> > >>> > >>> __________________________________________________________________________ >>> > OpenStack Development Mailing List (not for usage questions) >>> > Unsubscribe: >>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>> > >>> >>> >>> __________________________________________________________________________ >>> OpenStack Development Mailing List (not for usage questions) >>> Unsubscribe: >>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>> >> -- >> >> -- >> >> Andrew Woodward >> >> Mirantis >> >> Fuel Community Ambassador >> >> Ceph Community >> >> __________________________________________________________________________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: >> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > -- -- Andrew Woodward Mirantis Fuel Community Ambassador Ceph Community
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev