Re: Controlled shutdown post-partial-switch failure...

David Merrill Thu, 20 Jun 2019 14:06:23 -0700

Checking in on this, I pulled the  MGMT NICs from the failed switch and the Xen 
hosts mgmt.-bond "switched" over to the other (working) network-switch. I now 
have control of the hosts in CloudStack & proceed with the controlled-shutdown 
so I can then get my switch repair on.


David Merrill
Senior Systems Engineer,
Managed and Private/Hybrid Cloud Services
OTELCO
92 Oak Street, Portland ME 04101
office 207.772.5678 <callto:207.772.5678>
www.otelco.com <http://www.otelco.com>/business/managed-services



Confidentiality Message
The information contained in this e-mail transmission may be confidential and 
legally privileged. If you are not the intended recipient, you are notified 
that any dissemination, distribution, copying or other use of this information, 
including attachments, is prohibited. If you received this message in error, 
please call me at 207.772.5678 <callto:207.772.5678> so this error can be 
corrected.
 

On 6/17/19, 9:20 AM, "Andrija Panic" <[email protected]> wrote:

    Been there, done that, a year ago (whole DC down == everything shutting
    down (storage things, S3 things, CloudStack things (all VMs and
    everything), etc, etc, etc) - very successfully but not funny...)
    
    On Mon, 17 Jun 2019 at 14:33, David Merrill <[email protected]>
    wrote:
    
    > Hi All,
    >
    > I’m fielding an worrisome emergency situation where a set of (2) stacked
    > switches (serving the MGMT & SAN networks) has partially failed (one of 
the
    > stack members was ejected).
    >
    > Here’s what I’ve got:
    >
    >
    >   *   The Xen hosts 2 MGMT NICs are bonded (active-passive) and connected
    > to both switches
    >   *   The Xen hosts 2 SAN NICs are NOT bonded and connected to both
    > switches
    >   *   PUB/GUEST NICs are connected to a different set of stacked switches
    > (and  are fine)
    >
    > Amazingly CloudStack has survived (guest VMs are still running, there’s
    > been no disk issues).
    >
    > However I’ve got one compute-cluster (of 6 Xen hosts) in an alert state
    > (as the pool master is affected) in the CloudStack UI (and cannot manage
    > guests there) and I cannot get to their MGMT interfaces (going to hop on
    > the hosts today and get more intel).
    >
    > A replacement switch is arriving in 24 hours & I’m preparing the
    > switch-swap process.
    >
    > I’d REALLY like to shut everything down (guest VM’s) before mucking about
    > with the switch-stack serving the SAN network, but I think I have to get 
to
    > the host MGMT NICs sorted first (my suspicion is that the bonded MGMT NICS
    > haven’t failed over – due t the nature of the switch failure maybe? – I’m
    > considering pulling the MGMT NIC connections on the failed switch to see 
if
    > I can get a path back).
    >
    > Anyway, not really much of an ask here, talking myself through it as I
    > ride the tiger.
    >
    > Thanks.
    > David
    >
    > David Merrill
    > Senior Systems Engineer,
    > Managed and Private/Hybrid Cloud Services
    > OTELCO
    > 92 Oak Street, Portland ME 04101
    > office 207.772.5678<callto:207.772.5678>
    > www.otelco.com<http://www.otelco.com>/business/managed-services
    >
    > Confidentiality Message
    > The information contained in this e-mail transmission may be confidential
    > and legally privileged. If you are not the intended recipient, you are
    > notified that any dissemination, distribution, copying or other use of 
this
    > information, including attachments, is prohibited. If you received this
    > message in error, please call me at 207.772.5678<callto:207.772.5678> so
    > this error can be corrected.
    >
    >
    
    -- 
    
    Andrija Panić

Re: Controlled shutdown post-partial-switch failure...

Reply via email to