Adding Nir who knows it far better than me. On Mon, Nov 23, 2015 at 8:37 PM, Duckworth, Douglas C <[email protected]> wrote:
> Hello -- > > Not sure if y'all can help with this issue we've been seeing with RHEV... > > On 11/13/2015, during Code Upgrade of Compellent SAN at our Disaster > Recovery Site, we Failed Over to Secondary SAN Controller. Most Virtual > Machines in our DR Cluster Resumed automatically after Pausing except VM > "BADVM" on Host "BADHOST." > > In Engine.log you can see that BADVM was sent into "VM_PAUSED_EIO" state > at 10:47:57: > > "VM BADVM has paused due to storage I/O problem." > > On this Red Hat Enterprise Virtualization Hypervisor 6.6 > (20150512.0.el6ev) Host, two other VMs paused but then automatically > resumed without System Administrator intervention... > > In our DR Cluster, 22 VMs also resumed automatically... > > None of these Guest VMs are engaged in high I/O as these are DR site VMs > not currently doing anything. > > We sent this information to Dell. Their response: > > "The root cause may reside within your virtualization solution, not the > parent OS (RHEV-Hypervisor disc) or Storage (Dell Compellent.)" > > We are doing this Failover again on Sunday November 29th so we would > like to know how to mitigate this issue, given we have to manually > resume paused VMs that don't resume automatically. > > Before we initiated SAN Controller Failover, all iSCSI paths to Targets > were present on Host tulhv2p03. > > VM logs on Host show in /var/log/libvirt/qemu/badhost.log that Storage > error was reported: > > block I/O error in device 'drive-virtio-disk0': Input/output error (5) > block I/O error in device 'drive-virtio-disk0': Input/output error (5) > block I/O error in device 'drive-virtio-disk0': Input/output error (5) > block I/O error in device 'drive-virtio-disk0': Input/output error (5) > > All disks used by this Guest VM are provided by single Storage Domain > COM_3TB4_DR with serial "270." In syslog we do see that all paths for > that Storage Domain Failed: > > Nov 13 16:47:40 multipathd: 36000d310005caf000000000000000270: remaining > active paths: 0 > > Though these recovered later: > > Nov 13 16:59:17 multipathd: 36000d310005caf000000000000000270: sdbg - > tur checker reports path is up > Nov 13 16:59:17 multipathd: 36000d310005caf000000000000000270: remaining > active paths: 8 > > Does anyone have an idea of why the VM would fail to automatically > resume if the iSCSI paths used by its Storage Domain recovered? > > Thanks > Doug > > -- > Thanks > > Douglas Charles Duckworth > Unix Administrator > Tulane University > Technology Services > 1555 Poydras Ave > NOLA -- 70112 > > E: [email protected] > O: 504-988-9341 > F: 504-988-8505 > _______________________________________________ > Users mailing list > [email protected] > http://lists.ovirt.org/mailman/listinfo/users >
_______________________________________________ Users mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/users

