Il giorno gio 28 mar 2019 alle ore 17:48 <[email protected]> ha scritto:
> Dear All, > > I wanted to share my experience upgrading from 4.2.8 to 4.3.1. While > previous upgrades from 4.1 to 4.2 etc. went rather smooth, this one was a > different experience. After first trying a test upgrade on a 3 node setup, > which went fine. i headed to upgrade the 9 node production platform, > unaware of the backward compatibility issues between gluster 3.12.15 -> > 5.3. After upgrading 2 nodes, the HA engine stopped and wouldn't start. > Vdsm wasn't able to mount the engine storage domain, since /dom_md/metadata > was missing or couldn't be accessed. Restoring this file by getting a good > copy of the underlying bricks, removing the file from the underlying bricks > where the file was 0 bytes and mark with the stickybit, and the > corresponding gfid's. Removing the file from the mount point, and copying > back the file on the mount point. Manually mounting the engine domain, and > manually creating the corresponding symbolic links in /rhev/data-center and > /var/run/vdsm/storage and fixing the ownership back to vdsm.kvm (which was > root.root), i was able to start the HA engine again. Since the engine was > up again, and things seemed rather unstable i decided to continue the > upgrade on the other nodes suspecting an incompatibility in gluster > versions, i thought would be best to have them all on the same version > rather soonish. However things went from bad to worse, the engine stopped > again, and all vm’s stopped working as well. So on a machine outside the > setup and restored a backup of the engine taken from version 4.2.8 just > before the upgrade. With this engine I was at least able to start some vm’s > again, and finalize the upgrade. Once the upgraded, things didn’t stabilize > and also lose 2 vm’s during the process due to image corruption. After > figuring out gluster 5.3 had quite some issues I was as lucky to see > gluster 5.5 was about to be released, on the moment the RPM’s were > available I’ve installed those. This helped a lot in terms of stability, > for which I’m very grateful! However the performance is unfortunate > terrible, it’s about 15% of what the performance was running gluster > 3.12.15. It’s strange since a simple dd shows ok performance, but our > actual workload doesn’t. While I would expect the performance to be better, > due to all improvements made since gluster version 3.12. Does anybody share > the same experience? > I really hope gluster 6 will soon be tested with ovirt and released, and > things start to perform and stabilize again..like the good old days. Of > course when I can do anything, I’m happy to help. > Opened https://bugzilla.redhat.com/show_bug.cgi?id=1693998 to track the rebase on Gluster 6. > > I think the following short list of issues we have after the migration; > Gluster 5.5; > - Poor performance for our workload (mostly write dependent) > - VM’s randomly pause on unknown storage errors, which are “stale > file’s”. corresponding log; Lookup on shard 797 failed. Base file gfid = > 8a27b91a-ff02-42dc-bd4c-caa019424de8 [Stale file handle] > - Some files are listed twice in a directory (probably related the > stale file issue?) > Example; > ls -la > /rhev/data-center/59cd53a9-0003-02d7-00eb-0000000001e3/313f5d25-76af-4ecd-9a20-82a2fe815a3c/images/4add6751-3731-4bbd-ae94-aaeed12ea450/ > total 3081 > drwxr-x---. 2 vdsm kvm 4096 Mar 18 11:34 . > drwxr-xr-x. 13 vdsm kvm 4096 Mar 19 09:42 .. > -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 > 1a7cf259-6b29-421d-9688-b25dfaafb13c > -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 > 1a7cf259-6b29-421d-9688-b25dfaafb13c > -rw-rw----. 1 vdsm kvm 1048576 Jan 27 2018 > 1a7cf259-6b29-421d-9688-b25dfaafb13c.lease > -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 > 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta > -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 > 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta > > - brick processes sometimes starts multiple times. Sometimes I’ve 5 brick > processes for a single volume. Killing all glusterfsd’s for the volume on > the machine and running gluster v start <vol> force usually just starts one > after the event, from then on things look all right. > > May I kindly ask to open bugs on Gluster for above issues at https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS ? Sahina? > Ovirt 4.3.2.1-1.el7 > - All vms images ownership are changed to root.root after the vm is > shutdown, probably related to; > https://bugzilla.redhat.com/show_bug.cgi?id=1666795 but not only scoped > to the HA engine. I’m still in compatibility mode 4.2 for the cluster and > for the vm’s, but upgraded to version ovirt 4.3.2 > Ryan? > - The network provider is set to ovn, which is fine..actually cool, > only the “ovs-vswitchd” is a CPU hog, and utilizes 100% > Miguel? Dominik? > - It seems on all nodes vdsm tries to get the the stats for the HA > engine, which is filling the logs with (not sure if this is new); > [api.virt] FINISH getStats return={'status': {'message': "Virtual machine > does not exist: {'vmId': u'20d69acd-edfd-4aeb-a2ae-49e9c121b7e9'}", 'code': > 1}} from=::1,59290, vmId=20d69acd-edfd-4aeb-a2ae-49e9c121b7e9 (api:54) > Simone? > - It seems the package os_brick [root] managedvolume not supported: > Managed Volume Not Supported. Missing package os-brick.: ('Cannot import > os_brick',) (caps:149) which fills the vdsm.log, but for this I also saw > another message, so I suspect this will already be resolved shortly > - The machine I used to run the backup HA engine, doesn’t want to > get removed from the hosted-engine –vm-status, not even after running; > hosted-engine --clean-metadata --host-id=10 --force-clean or hosted-engine > --clean-metadata --force-clean from the machine itself. > Simone? > > Think that's about it. > > Don’t get me wrong, I don’t want to rant, I just wanted to share my > experience and see where things can made better. > If not already done, can you please open bugs for above issues at https://bugzilla.redhat.com/enter_bug.cgi?classification=oVirt ? > > > Best Olaf > _______________________________________________ > Users mailing list -- [email protected] > To unsubscribe send an email to [email protected] > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/[email protected]/message/3CO35Q7VZMWNHS4LPUJNO7S47MGLSKS5/ > -- SANDRO BONAZZOLA MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA <https://www.redhat.com/> [email protected] <https://red.ht/sig>
_______________________________________________ Users mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/ATPDSQ2X4ZHYEOCLKODZUQA64TGGQBG6/

