Re: [Cloud] [Cloud-announce] Cloud VPS single hypervisor failure and (some) down instances (possibly resolved)

2018-02-14 Thread Andrew Bogott
The host in question has been repaired and restarted; all hosted VMs should now be up and running. We're not 100% certain that we've addressed the root cause of the problem, so we will see if it dies again.  In the meantime, though, everything should be back to normal. Sorry for the

Re: [Cloud] [Cloud-announce] Cloud VPS single hypervisor failure and (some) down instances

2018-02-14 Thread Andrew Bogott
On 2/14/18 6:58 AM, Chase Pettet wrote: We lost a KVM host at around 7:20 UTC.  Because we use local storage for instances there are a number of them that are down.  Toolforge suffered a few losses but it seems to have been few enough that GridEngine and Kubernetes users are unaffected at

[Cloud] [Cloud-announce] Cloud VPS single hypervisor failure and (some) down instances

2018-02-14 Thread Chase Pettet
We lost a KVM host at around 7:20 UTC. Because we use local storage for instances there are a number of them that are down. Toolforge suffered a few losses but it seems to have been few enough that GridEngine and Kubernetes users are unaffected at this time . The initial task is T187292 (with a