Hi guys, Sorry to repost but I'm rather desperate here.
Thanks Regards. Neil Wilson. On 31 Jul 2017 16:51, "Neil" <nwilson...@gmail.com> wrote: > Hi guys, > > Please could someone assist me, my DC seems to be trying to re-negotiate > SPM and apparently it's failing. I tried to delete an old autogenerated > snapshot and shortly after that the issue seemed to start, however after > about an hour, the snapshot said successfully deleted, and then SPM > negotiated again albeit for a short period before it started trying to > re-negotiate again. > > Last week I upgraded from ovirt 3.5 to 3.6, I also upgraded one of my 4 > hosts using the 3.6 repo to the latest available from that repo and did a > yum update too. > > I have 4 nodes and my ovirt engine is a KVM guest on another physical > machine on the network. I'm using an FC SAN with ATTO HBA's and recently > we've started seeing some degraded IO. The SAN appears to be alright and > the disks all seem to check out, but we are having rather slow IOPS at the > moment, which we trying to track down. > > ovirt engine CentOS release 6.9 (Final) > ebay-cors-filter-1.0.1-0.1.ovirt.el6.noarch > ovirt-engine-3.6.7.5-1.el6.noarch > ovirt-engine-backend-3.6.7.5-1.el6.noarch > ovirt-engine-cli-3.6.2.0-1.el6.noarch > ovirt-engine-dbscripts-3.6.7.5-1.el6.noarch > ovirt-engine-extension-aaa-jdbc-1.0.7-1.el6.noarch > ovirt-engine-extensions-api-impl-3.6.7.5-1.el6.noarch > ovirt-engine-jboss-as-7.1.1-1.el6.x86_64 > ovirt-engine-lib-3.6.7.5-1.el6.noarch > ovirt-engine-restapi-3.6.7.5-1.el6.noarch > ovirt-engine-sdk-python-3.6.7.0-1.el6.noarch > ovirt-engine-setup-3.6.7.5-1.el6.noarch > ovirt-engine-setup-base-3.6.7.5-1.el6.noarch > ovirt-engine-setup-plugin-ovirt-engine-3.6.7.5-1.el6.noarch > ovirt-engine-setup-plugin-ovirt-engine-common-3.6.7.5-1.el6.noarch > ovirt-engine-setup-plugin-vmconsole-proxy-helper-3.6.7.5-1.el6.noarch > ovirt-engine-setup-plugin-websocket-proxy-3.6.7.5-1.el6.noarch > ovirt-engine-tools-3.6.7.5-1.el6.noarch > ovirt-engine-tools-backup-3.6.7.5-1.el6.noarch > ovirt-engine-userportal-3.6.7.5-1.el6.noarch > ovirt-engine-vmconsole-proxy-helper-3.6.7.5-1.el6.noarch > ovirt-engine-webadmin-portal-3.6.7.5-1.el6.noarch > ovirt-engine-websocket-proxy-3.6.7.5-1.el6.noarch > ovirt-engine-wildfly-8.2.1-1.el6.x86_64 > ovirt-engine-wildfly-overlay-8.0.5-1.el6.noarch > ovirt-host-deploy-1.4.1-1.el6.noarch > ovirt-host-deploy-java-1.4.1-1.el6.noarch > ovirt-image-uploader-3.6.0-1.el6.noarch > ovirt-iso-uploader-3.6.0-1.el6.noarch > ovirt-release34-1.0.3-1.noarch > ovirt-release35-006-1.noarch > ovirt-release36-3.6.7-1.noarch > ovirt-setup-lib-1.0.1-1.el6.noarch > ovirt-vmconsole-1.0.2-1.el6.noarch > ovirt-vmconsole-proxy-1.0.2-1.el6.noarch > > node01 (CentOS 6.9) > vdsm-4.16.30-0.el6.x86_64 > vdsm-cli-4.16.30-0.el6.noarch > vdsm-jsonrpc-4.16.30-0.el6.noarch > vdsm-python-4.16.30-0.el6.noarch > vdsm-python-zombiereaper-4.16.30-0.el6.noarch > vdsm-xmlrpc-4.16.30-0.el6.noarch > vdsm-yajsonrpc-4.16.30-0.el6.noarch > gpxe-roms-qemu-0.9.7-6.16.el6.noarch > qemu-img-rhev-0.12.1.2-2.479.el6_7.2.x86_64 > qemu-kvm-rhev-0.12.1.2-2.479.el6_7.2.x86_64 > qemu-kvm-rhev-tools-0.12.1.2-2.479.el6_7.2.x86_64 > libvirt-0.10.2-62.el6.x86_64 > libvirt-client-0.10.2-62.el6.x86_64 > libvirt-lock-sanlock-0.10.2-62.el6.x86_64 > libvirt-python-0.10.2-62.el6.x86_64 > node01 was upgraded out of desperation after I tried changing my DC and > cluster version to 3.6, but then found that none of my hosts could be > activated out of maintenance due to an incompatibility with 3.6 (I'm still > not sure why as searching seemed to indicate Centos 6.x was compatible. I > then had to remove all 4 hosts, and change the cluster version back to 3.5 > and then re-add them. When I tried changing the cluster version to 3.6 I > did get a complaint about using the "legacy protocol" so on each host under > Advanced, I changed them to use the JSON protocol, and this seemed to > resolve it, however once changing the DC/Cluster back to 3.5 the option to > change the protocol back to Legacy is no longer shown. > > node02 (Centos 6.7) > vdsm-4.16.30-0.el6.x86_64 > vdsm-cli-4.16.30-0.el6.noarch > vdsm-jsonrpc-4.16.30-0.el6.noarch > vdsm-python-4.16.30-0.el6.noarch > vdsm-python-zombiereaper-4.16.30-0.el6.noarch > vdsm-xmlrpc-4.16.30-0.el6.noarch > vdsm-yajsonrpc-4.16.30-0.el6.noarch > gpxe-roms-qemu-0.9.7-6.14.el6.noarch > qemu-img-rhev-0.12.1.2-2.479.el6_7.2.x86_64 > qemu-kvm-rhev-0.12.1.2-2.479.el6_7.2.x86_64 > qemu-kvm-rhev-tools-0.12.1.2-2.479.el6_7.2.x86_64 > libvirt-0.10.2-54.el6_7.6.x86_64 > libvirt-client-0.10.2-54.el6_7.6.x86_64 > libvirt-lock-sanlock-0.10.2-54.el6_7.6.x86_64 > libvirt-python-0.10.2-54.el6_7.6.x86_64 > > node03 CentOS 6.7 > vdsm-4.16.30-0.el6.x86_64 > vdsm-cli-4.16.30-0.el6.noarch > vdsm-jsonrpc-4.16.30-0.el6.noarch > vdsm-python-4.16.30-0.el6.noarch > vdsm-python-zombiereaper-4.16.30-0.el6.noarch > vdsm-xmlrpc-4.16.30-0.el6.noarch > vdsm-yajsonrpc-4.16.30-0.el6.noarch > gpxe-roms-qemu-0.9.7-6.14.el6.noarch > qemu-img-rhev-0.12.1.2-2.479.el6_7.2.x86_64 > qemu-kvm-rhev-0.12.1.2-2.479.el6_7.2.x86_64 > qemu-kvm-rhev-tools-0.12.1.2-2.479.el6_7.2.x86_64 > libvirt-0.10.2-54.el6_7.6.x86_64 > libvirt-client-0.10.2-54.el6_7.6.x86_64 > libvirt-lock-sanlock-0.10.2-54.el6_7.6.x86_64 > libvirt-python-0.10.2-54.el6_7.6.x86_64 > > node04 (Centos 6.7) > vdsm-4.16.20-1.git3a90f62.el6.x86_64 > vdsm-cli-4.16.20-1.git3a90f62.el6.noarch > vdsm-jsonrpc-4.16.20-1.git3a90f62.el6.noarch > vdsm-python-4.16.20-1.git3a90f62.el6.noarch > vdsm-python-zombiereaper-4.16.20-1.git3a90f62.el6.noarch > vdsm-xmlrpc-4.16.20-1.git3a90f62.el6.noarch > vdsm-yajsonrpc-4.16.20-1.git3a90f62.el6.noarch > gpxe-roms-qemu-0.9.7-6.15.el6.noarch > qemu-img-0.12.1.2-2.491.el6_8.1.x86_64 > qemu-kvm-0.12.1.2-2.491.el6_8.1.x86_64 > qemu-kvm-tools-0.12.1.2-2.503.el6_9.3.x86_64 > libvirt-0.10.2-60.el6.x86_64 > libvirt-client-0.10.2-60.el6.x86_64 > libvirt-lock-sanlock-0.10.2-60.el6.x86_64 > libvirt-python-0.10.2-60.el6.x86_64 > > I'm seeing a rather confusing error in the /var/log/messages on all 4 > hosts as follows.... > > Jul 31 16:41:36 node01 multipathd: 36001b4d80001c80d0000000000000000: sdb > - directio checker reports path is down > Jul 31 16:41:41 node01 kernel: sd 7:0:0:0: [sdb] Result: > hostbyte=DID_ERROR driverbyte=DRIVER_OK > Jul 31 16:41:41 node01 kernel: sd 7:0:0:0: [sdb] CDB: Read(10): 28 00 00 > 00 00 00 00 00 01 00 > Jul 31 16:41:41 node01 kernel: end_request: I/O error, dev sdb, sector 0 > > I say confusing, because I don't have a 3000GB LUN > > [root@node01 ~]# fdisk -l | grep 3000 > Disk /dev/sdb: 3000.0 GB, 2999999528960 bytes > > I did have one on Friday, last week, but I trashed it and changed it to a > 1500GB LUN instead, so I'm not sure if perhaps this error is still trying > to connect to the old LUN perhaps? > > My LUNS are as follows... > > Disk /dev/sdb: 3000.0 GB, 2999999528960 bytes (this one doesn't actually > exist anymore) > Disk /dev/sdc: 1000.0 GB, 999999668224 bytes > Disk /dev/sdd: 1000.0 GB, 999999668224 bytes > Disk /dev/sde: 1000.0 GB, 999999668224 bytes > Disk /dev/sdf: 1000.0 GB, 999999668224 bytes > Disk /dev/sdg: 1000.0 GB, 999999668224 bytes > Disk /dev/sdh: 1000.0 GB, 999999668224 bytes > Disk /dev/sdi: 1000.0 GB, 999999668224 bytes > Disk /dev/sdj: 1000.0 GB, 999999668224 bytes > Disk /dev/sdk: 1000.0 GB, 999999668224 bytes > Disk /dev/sdm: 1000.0 GB, 999999668224 bytes > Disk /dev/sdl: 1000.0 GB, 999999668224 bytes > Disk /dev/sdn: 1000.0 GB, 999999668224 bytes > Disk /dev/sdo: 1000.0 GB, 999999668224 bytes > Disk /dev/sdp: 1000.0 GB, 999999668224 bytes > Disk /dev/sdq: 1000.0 GB, 999999668224 bytes > Disk /dev/sdr: 1000.0 GB, 999988133888 bytes > Disk /dev/sds: 1500.0 GB, 1499999764480 bytes > Disk /dev/sdt: 1500.0 GB, 1499999502336 bytes > > I'm quite low on SAN disk space currently so I'm a little hesitant to > migrate VM's around for fear of the migrations creating too many snapshots > and filling up my SAN. We are in the process of expanding the SAN Array > too, but we trying to get to the bottom of the bad IOPS at the moment > before adding on addition overhead. > > Ping tests between hosts and engine all look alright, so I don't suspect > network issues. > > I know this is very vague, everything is currently operational, however as > you can see in the attached logs, I'm getting lots of ERROR messages. > > Any help or guidance is greatly appreciated. > > Thanks. > > Regards. > > Neil Wilson. > > >
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users