There are PDU’s that you can monitor power draw per port and that would kind of 
tell you if a PSU failed as the load would be 0

 

From: [email protected] [mailto:[email protected]] On Behalf Of 
Alex Crow
Sent: Thursday, September 17, 2015 12:31 PM
To: Yaniv Kaul <[email protected]>
Cc: [email protected]
Subject: Re: [ovirt-users] Automatically migrate VM between hosts in the same 
cluster

 

I don't really think this is practical:




- If the PSU failed, your UPS could alert you. If you have one...


If you have only one PSU in a host, a UPS is not going to stop you losing all 
the VMs on that host. OK, if you had N+1 PSUs, you may be able to monitor for 
this (IPMI/LOM/DRAC etc)and use the API to put a host into maintenance. Also a 
lot of people rely on low-cost white-box servers and decide that it's OK if a 
single PSU in a host dies, as, well, we have HA to start on other hosts. If 
they have N+1 PSUs in the hosts do they really have to migrate everything off? 
Swings and roundabouts really.

I'm also not sure I've seen any practical DC setups where a UPS can monitor the 
load for every single attached physical machine and figure out that one of the 
redundant PSUs in it has failed - I'd love to know if there are as that would 
be really cool.




- If the machine is going down in an ordinary flow, surely it can be done. 


Isn't that what "Maintenance mode" is for?




 


Even if it was a network failure and the host was still up, how would you live 
migrate a VM from a host you can't even talk to?

 

It could be suspended to disk (local) - if the disk is available.

Then the decision if it is to be resumed from local disk or not (as it might be 
HA'ed and is running elsewhere) need to be taken later, of course.


Yes, but that's not even remotely possible with Ovirt right now. I was trying 
to be practical as the OP has only just started using Ovirt and I think it 
might be a bit much to ask him to start coding up what he'd like.




 

 


The only way you could do it was if you somehow magically knew far enough in 
advance that the host was about to fail (!) and that gave enough time to 
migrate the machines off. But how would you ever know that "machine 
quux.bar.net <http://quux.bar.net>  is going to fail in 7 minutes"?

 

I completely agree there are situations in which you can't foresee the 
failure. 

But in many, you can. In those cases, it makes sense for the host to 
self-initiate 'move to maintenance' mode. The policy of what to do when 
'self-moving-to-maintenance-mode' could be pre-fetched from the engine.

Y.


Hmm, I would love that to be true. But I've seen so many so called 
"corner-cases" that I now think the failure area in a datacenter is a fractal 
with infinite corners. Yes, you could monitor SMART on local drives, pick up 
uncorrected ECC errors, use "sensors" to check for sagging voltages or high 
temps, but I don't think you can ever hope to catch everything, and you could 
end up doing a migration "storm" for . I've had more than enough of "Enterprise 
Spec" switches suddenly going nuts and spamming corrupt MACs all over the LAN 
to know you can't ever account for everything.

I think it's better to adopt the model of redundancy in software and services, 
so no-one even notices if a VM host goes away, there's always something else to 
take up the slack. Just like the origins of the Internet - the network should 
be dumb and the applications should cope with it! Any infrastructure that can't 
cope with the loss of a few VMs for a few minutes probably needs a refresh.

Cheers

Alex





. 

_______________________________________________
Users mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/users

Reply via email to