I've been using SmartOS for about 5 years and occasionally (probably 1-2
times per year, per server) I encounter an issue where the SmartOS host
becomes unresponsive and the machine needs to be hard reset.

When this happened last week I was able to SSH into one of the KVM VMs
running on the machine and issue commands, however, the VM was obviously
not running right as the load average was 800+ on a VM that is normally
about 0.2.

The SmartOS host would echo my keystrokes but was not responsive to showing
a logon prompt or anything. The SmartOS host would not accept a SSH
connection either.

The HDD light was flashing a couple of times a second (I've seen this both
on a machine that has only SSDs and on a machine that has only HDDs).

Does anyone have any input on what kind of situations could cause this kind
of issue, where the host machine is not responsive but a KVM VM could still
respond enough to allow SSH and commands?

Also is there any better course of action I can take next time to debug
this, before I hard reset the machine? Should I do a IPMI crash like:

$ ipmitool -I lan -H <hostname> chassis power drag

and see what is in the dump? Or is there some way to do sysrq commands to
kill the VMs but leave the host running for live debugging? Thank you,

Nick



-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125
Powered by Listbox: https://www.listbox.com

Reply via email to