On Wed, Jan 11, 2017 at 7:35 PM, Mark Greenall <m.green...@iontrading.com> wrote: > Hi Ovirt Champions, > > > > I am pulling my hair out and in need of advice / help. > > > > Host server: Dell PowerEdge R815 (40 cores and 768GB memory) > > Stoage: Dell Equallogic (Firmware V8.1.4) > > OS: Centos 7.3 (although the same thing happens on 7.2) > > Ovirt: 4.0.6.3-1 (although also happens on 4.0.5) > > > > I can’t exactly pinpoint when this started happening but it’s certainly been > happening with Ovirt 4.0.5 and CentOS 7.2. Today I updated Hosted Engine and > one host to 4.0.6 and CentOS 7.3 but we still see the same problem. Our > hosts are connected to Dell iSCSI Eqallogic storage. We have one storage > domain defined per VM guest, so do have quite a few LUN’s presented to the > cluster (around 45 in total). > > > > Problem Description: > > 1) Reboot a host. > > 2) Activate a host in Ovirt Admin Gui. > > 3) A few minutes later host is shown as activated. > > 4) Approx 10-15 mins later host goes offline complaining that it can’t > connect to storage. > > 5) Constantly then loops around (activating, non operational, > connecting, initialising) and the host ends up with a high CPU load and > large number of lvm commands in the process tree. > > 6) Multipath and iscsi show all storage is available and logged in. > > 7) Equallogic shows host connected and no errors. > > 8) Admin GUI ends up saying the host can’t connect to storage > ‘UNKNOWN’. > > > > The strange thing is that every now and again step 5 doesn’t happen and the > host will actually activate again and then stays up. However, it still > takes step 4 to take the host offline first. > > > > Expected Behaviour: > > 1) Reboot a host. > > 2) Activate a host in Ovirt Admin Gui. > > 3) A few minutes later host is shown as activated. > > 4) Begin using host with confidence. > > > > I’ve attached the engine.log from Hosted Engine and vdsm.log from the host. > The following is a timeline of the latest event. > > > > Host Activation : 15:07 > > Host Up: 15:10 > > Non-Operational: 15:17 > > > > Seriously hoping someone can spot something obvious as this is making the > clusters somewhat unstable and unreliable.
Can you share /var/log/messages and /var/log/sanlock.log? Nir _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users