Hi Olaf, As per current attached "multi-glusterfsd-vol3.txt | multi-glusterfsd-vol4.txt" it is showing multiple processes are running for "ovirt-core ovirt-engine" brick names but there are no logs available in bricklogs.zip specific to this bricks, bricklogs.zip has a dump of ovirt-kube logs only
Kindly share brick logs specific to the bricks "ovirt-core ovirt-engine" and share glusterd logs also. Regards Mohit Agrawal On Tue, Apr 2, 2019 at 9:18 PM Olaf Buitelaar <[email protected]> wrote: > Dear Krutika, > > 1. > I've changed the volume settings, write performance seems to increased > somewhat, however the profile doesn't really support that since latencies > increased. However read performance has diminished, which does seem to be > supported by the profile runs (attached). > Also the IO does seem to behave more consistent than before. > I don't really understand the idea behind them, maybe you can explain why > these suggestions are good? > These settings seems to avoid as much local caching and access as possible > and push everything to the gluster processes. While i would expect local > access and local caches are a good thing, since it would lead to having > less network access or disk access. > I tried to investigate these settings a bit more, and this is what i > understood of them; > - network.remote-dio; when on it seems to ignore the O_DIRECT flag in the > client, thus causing the files to be cached and buffered in the page cache > on the client, i would expect this to be a good thing especially if the > server process would access the same page cache? > At least that is what grasp from this commit; > https://review.gluster.org/#/c/glusterfs/+/4206/2/xlators/protocol/client/src/client.c > line > 867 > Also found this commit; > https://github.com/gluster/glusterfs/commit/06c4ba589102bf92c58cd9fba5c60064bc7a504e#diff-938709e499b4383c3ed33c3979b9080c > suggesting > remote-dio actually improves performance, not sure it's a write or read > benchmark > When a file is opened with O_DIRECT it will also disable the write-behind > functionality > > - performance.strict-o-direct: when on, the AFR, will not ignore the > O_DIRECT flag. and will invoke: fop_writev_stub with the wb_writev_helper, > which seems to stack the operation, no idea why that is. But generally i > suppose not ignoring the O_DIRECT flag in the AFR is a good thing, when a > processes requests to have O_DIRECT. So this makes sense to me. > > - cluster.choose-local: when off, it doesn't prefer the local node, but > would always choose a brick. Since it's a 9 node cluster, with 3 > subvolumes, only a 1/3 could end-up local, and the other 2/3 should be > pushed to external nodes anyway. Or am I making the total wrong assumption > here? > > It seems to this config is moving to the gluster-block config side of > things, which does make sense. > Since we're running quite some mysql instances, which opens the files with > O_DIRECt i believe, it would mean the only layer of cache is within mysql > it self. Which you could argue is a good thing. But i would expect a little > of write-behind buffer, and maybe some of the data cached within gluster > would alleviate things a bit on gluster's side. But i wouldn't know if > that's the correct mind set, and so might be totally off here. > Also i would expect these gluster v set <VOL> command to be online > operations, but somehow the bricks went down, after applying these changes. > What appears to have happened is that after the update the brick process > was restarted, but due to multiple brick process start issue, multiple > processes were started, and the brick didn't came online again. > However i'll try to reproduce this, since i would like to test with > cluster.choose-local: on, and see how performance compares. And hopefully > when it occurs collect some useful info. > Question; are network.remote-dio and performance.strict-o-direct mutually > exclusive settings, or can they both be on? > > 2. I've attached all brick logs, the only thing relevant i found was; > [2019-03-28 20:20:07.170452] I [MSGID: 113030] > [posix-entry-ops.c:1146:posix_unlink] 0-ovirt-kube-posix: > open-fd-key-status: 0 for > /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 > [2019-03-28 20:20:07.170491] I [MSGID: 113031] > [posix-entry-ops.c:1053:posix_skip_non_linkto_unlink] 0-posix: linkto_xattr > status: 0 for > /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 > [2019-03-28 20:20:07.248480] I [MSGID: 113030] > [posix-entry-ops.c:1146:posix_unlink] 0-ovirt-kube-posix: > open-fd-key-status: 0 for > /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 > [2019-03-28 20:20:07.248491] I [MSGID: 113031] > [posix-entry-ops.c:1053:posix_skip_non_linkto_unlink] 0-posix: linkto_xattr > status: 0 for > /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 > > Thanks Olaf > > ps. sorry needed to resend since it exceed the file limit > > Op ma 1 apr. 2019 om 07:56 schreef Krutika Dhananjay <[email protected] > >: > >> Adding back gluster-users >> Comments inline ... >> >> On Fri, Mar 29, 2019 at 8:11 PM Olaf Buitelaar <[email protected]> >> wrote: >> >>> Dear Krutika, >>> >>> >>> >>> 1. I’ve made 2 profile runs of around 10 minutes (see files >>> profile_data.txt and profile_data2.txt). Looking at it, most time seems be >>> spent at the fop’s fsync and readdirp. >>> >>> Unfortunate I don’t have the profile info for the 3.12.15 version so >>> it’s a bit hard to compare. >>> >>> One additional thing I do notice on 1 machine (10.32.9.5) the iowait >>> time increased a lot, from an average below the 1% it’s now around the 12% >>> after the upgrade. >>> >>> So first suspicion with be lighting strikes twice, and I’ve also just >>> now a bad disk, but that doesn’t appear to be the case, since all smart >>> status report ok. >>> >>> Also dd shows performance I would more or less expect; >>> >>> dd if=/dev/zero of=/data/test_file bs=100M count=1 oflag=dsync >>> >>> 1+0 records in >>> >>> 1+0 records out >>> >>> 104857600 bytes (105 MB) copied, 0.686088 s, 153 MB/s >>> >>> dd if=/dev/zero of=/data/test_file bs=1G count=1 oflag=dsync >>> >>> 1+0 records in >>> >>> 1+0 records out >>> >>> 1073741824 bytes (1.1 GB) copied, 7.61138 s, 141 MB/s >>> >>> if=/dev/urandom of=/data/test_file bs=1024 count=1000000 >>> >>> 1000000+0 records in >>> >>> 1000000+0 records out >>> >>> 1024000000 bytes (1.0 GB) copied, 6.35051 s, 161 MB/s >>> >>> dd if=/dev/zero of=/data/test_file bs=1024 count=1000000 >>> >>> 1000000+0 records in >>> >>> 1000000+0 records out >>> >>> 1024000000 bytes (1.0 GB) copied, 1.6899 s, 606 MB/s >>> >>> When I disable this brick (service glusterd stop; pkill glusterfsd) >>> performance in gluster is better, but not on par with what it was. Also the >>> cpu usages on the “neighbor” nodes which hosts the other bricks in the same >>> subvolume increases quite a lot in this case, which I wouldn’t expect >>> actually since they shouldn't handle much more work, except flagging shards >>> to heal. Iowait also goes to idle once gluster is stopped, so it’s for >>> sure gluster which waits for io. >>> >>> >>> >> >> So I see that FSYNC %-latency is on the higher side. And I also noticed >> you don't have direct-io options enabled on the volume. >> Could you set the following options on the volume - >> # gluster volume set <VOLNAME> network.remote-dio off >> # gluster volume set <VOLNAME> performance.strict-o-direct on >> and also disable choose-local >> # gluster volume set <VOLNAME> cluster.choose-local off >> >> let me know if this helps. >> >> 2. I’ve attached the mnt log and volume info, but I couldn’t find >>> anything relevant in in those logs. I think this is because we run the VM’s >>> with libgfapi; >>> >>> [root@ovirt-host-01 ~]# engine-config -g LibgfApiSupported >>> >>> LibgfApiSupported: true version: 4.2 >>> >>> LibgfApiSupported: true version: 4.1 >>> >>> LibgfApiSupported: true version: 4.3 >>> >>> And I can confirm the qemu process is invoked with the gluster:// >>> address for the images. >>> >>> The message is logged in the /var/lib/libvert/qemu/<machine> file, >>> which I’ve also included. For a sample case see around; 2019-03-28 20:20:07 >>> >>> Which has the error; E [MSGID: 133010] >>> [shard.c:2294:shard_common_lookup_shards_cbk] 0-ovirt-kube-shard: Lookup on >>> shard 109886 failed. Base file gfid = a38d64bc-a28b-4ee1-a0bb-f919e7a1022c >>> [Stale file handle] >>> >> >> Could you also attach the brick logs for this volume? >> >> >>> >>> 3. yes I see multiple instances for the same brick directory, like; >>> >>> /usr/sbin/glusterfsd -s 10.32.9.6 --volfile-id >>> ovirt-core.10.32.9.6.data-gfs-bricks-brick1-ovirt-core -p >>> /var/run/gluster/vols/ovirt-core/10.32.9.6-data-gfs-bricks-brick1-ovirt-core.pid >>> -S /var/run/gluster/452591c9165945d9.socket --brick-name >>> /data/gfs/bricks/brick1/ovirt-core -l >>> /var/log/glusterfs/bricks/data-gfs-bricks-brick1-ovirt-core.log >>> --xlator-option *-posix.glusterd-uuid=fb513da6-f3bd-4571-b8a2-db5efaf60cc1 >>> --process-name brick --brick-port 49154 --xlator-option >>> ovirt-core-server.listen-port=49154 >>> >>> >>> >>> I’ve made an export of the output of ps from the time I observed these >>> multiple processes. >>> >>> In addition the brick_mux bug as noted by Atin. I might also have >>> another possible cause, as ovirt moves nodes from none-operational state or >>> maintenance state to active/activating, it also seems to restart gluster, >>> however I don’t have direct proof for this theory. >>> >>> >>> >> >> +Atin Mukherjee <[email protected]> ^^ >> +Mohit Agrawal <[email protected]> ^^ >> >> -Krutika >> >> Thanks Olaf >>> >>> Op vr 29 mrt. 2019 om 10:03 schreef Sandro Bonazzola < >>> [email protected]>: >>> >>>> >>>> >>>> Il giorno gio 28 mar 2019 alle ore 17:48 <[email protected]> ha >>>> scritto: >>>> >>>>> Dear All, >>>>> >>>>> I wanted to share my experience upgrading from 4.2.8 to 4.3.1. While >>>>> previous upgrades from 4.1 to 4.2 etc. went rather smooth, this one was a >>>>> different experience. After first trying a test upgrade on a 3 node setup, >>>>> which went fine. i headed to upgrade the 9 node production platform, >>>>> unaware of the backward compatibility issues between gluster 3.12.15 -> >>>>> 5.3. After upgrading 2 nodes, the HA engine stopped and wouldn't start. >>>>> Vdsm wasn't able to mount the engine storage domain, since >>>>> /dom_md/metadata >>>>> was missing or couldn't be accessed. Restoring this file by getting a good >>>>> copy of the underlying bricks, removing the file from the underlying >>>>> bricks >>>>> where the file was 0 bytes and mark with the stickybit, and the >>>>> corresponding gfid's. Removing the file from the mount point, and copying >>>>> back the file on the mount point. Manually mounting the engine domain, >>>>> and >>>>> manually creating the corresponding symbolic links in /rhev/data-center >>>>> and >>>>> /var/run/vdsm/storage and fixing the ownership back to vdsm.kvm (which was >>>>> root.root), i was able to start the HA engine again. Since the engine was >>>>> up again, and things seemed rather unstable i decided to continue the >>>>> upgrade on the other nodes suspecting an incompatibility in gluster >>>>> versions, i thought would be best to have them all on the same version >>>>> rather soonish. However things went from bad to worse, the engine stopped >>>>> again, and all vm’s stopped working as well. So on a machine outside the >>>>> setup and restored a backup of the engine taken from version 4.2.8 just >>>>> before the upgrade. With this engine I was at least able to start some >>>>> vm’s >>>>> again, and finalize the upgrade. Once the upgraded, things didn’t >>>>> stabilize >>>>> and also lose 2 vm’s during the process due to image corruption. After >>>>> figuring out gluster 5.3 had quite some issues I was as lucky to see >>>>> gluster 5.5 was about to be released, on the moment the RPM’s were >>>>> available I’ve installed those. This helped a lot in terms of stability, >>>>> for which I’m very grateful! However the performance is unfortunate >>>>> terrible, it’s about 15% of what the performance was running gluster >>>>> 3.12.15. It’s strange since a simple dd shows ok performance, but our >>>>> actual workload doesn’t. While I would expect the performance to be >>>>> better, >>>>> due to all improvements made since gluster version 3.12. Does anybody >>>>> share >>>>> the same experience? >>>>> I really hope gluster 6 will soon be tested with ovirt and released, >>>>> and things start to perform and stabilize again..like the good old days. >>>>> Of >>>>> course when I can do anything, I’m happy to help. >>>>> >>>> >>>> Opened https://bugzilla.redhat.com/show_bug.cgi?id=1693998 to track >>>> the rebase on Gluster 6. >>>> >>>> >>>> >>>>> >>>>> I think the following short list of issues we have after the migration; >>>>> Gluster 5.5; >>>>> - Poor performance for our workload (mostly write dependent) >>>>> - VM’s randomly pause on unknown storage errors, which are >>>>> “stale file’s”. corresponding log; Lookup on shard 797 failed. Base file >>>>> gfid = 8a27b91a-ff02-42dc-bd4c-caa019424de8 [Stale file handle] >>>>> - Some files are listed twice in a directory (probably related >>>>> the stale file issue?) >>>>> Example; >>>>> ls -la >>>>> /rhev/data-center/59cd53a9-0003-02d7-00eb-0000000001e3/313f5d25-76af-4ecd-9a20-82a2fe815a3c/images/4add6751-3731-4bbd-ae94-aaeed12ea450/ >>>>> total 3081 >>>>> drwxr-x---. 2 vdsm kvm 4096 Mar 18 11:34 . >>>>> drwxr-xr-x. 13 vdsm kvm 4096 Mar 19 09:42 .. >>>>> -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 >>>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c >>>>> -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 >>>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c >>>>> -rw-rw----. 1 vdsm kvm 1048576 Jan 27 2018 >>>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c.lease >>>>> -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 >>>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta >>>>> -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 >>>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta >>>>> >>>>> - brick processes sometimes starts multiple times. Sometimes I’ve 5 >>>>> brick processes for a single volume. Killing all glusterfsd’s for the >>>>> volume on the machine and running gluster v start <vol> force usually just >>>>> starts one after the event, from then on things look all right. >>>>> >>>>> >>>> May I kindly ask to open bugs on Gluster for above issues at >>>> https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS ? >>>> Sahina? >>>> >>>> >>>>> Ovirt 4.3.2.1-1.el7 >>>>> - All vms images ownership are changed to root.root after the vm >>>>> is shutdown, probably related to; >>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1666795 but not only >>>>> scoped to the HA engine. I’m still in compatibility mode 4.2 for the >>>>> cluster and for the vm’s, but upgraded to version ovirt 4.3.2 >>>>> >>>> >>>> Ryan? >>>> >>>> >>>>> - The network provider is set to ovn, which is fine..actually >>>>> cool, only the “ovs-vswitchd” is a CPU hog, and utilizes 100% >>>>> >>>> >>>> Miguel? Dominik? >>>> >>>> >>>>> - It seems on all nodes vdsm tries to get the the stats for the >>>>> HA engine, which is filling the logs with (not sure if this is new); >>>>> [api.virt] FINISH getStats return={'status': {'message': "Virtual >>>>> machine does not exist: {'vmId': >>>>> u'20d69acd-edfd-4aeb-a2ae-49e9c121b7e9'}", >>>>> 'code': 1}} from=::1,59290, vmId=20d69acd-edfd-4aeb-a2ae-49e9c121b7e9 >>>>> (api:54) >>>>> >>>> >>>> Simone? >>>> >>>> >>>>> - It seems the package os_brick [root] managedvolume not >>>>> supported: Managed Volume Not Supported. Missing package os-brick.: >>>>> ('Cannot import os_brick',) (caps:149) which fills the vdsm.log, but for >>>>> this I also saw another message, so I suspect this will already be >>>>> resolved >>>>> shortly >>>>> - The machine I used to run the backup HA engine, doesn’t want >>>>> to get removed from the hosted-engine –vm-status, not even after running; >>>>> hosted-engine --clean-metadata --host-id=10 --force-clean or hosted-engine >>>>> --clean-metadata --force-clean from the machine itself. >>>>> >>>> >>>> Simone? >>>> >>>> >>>>> >>>>> Think that's about it. >>>>> >>>>> Don’t get me wrong, I don’t want to rant, I just wanted to share my >>>>> experience and see where things can made better. >>>>> >>>> >>>> If not already done, can you please open bugs for above issues at >>>> https://bugzilla.redhat.com/enter_bug.cgi?classification=oVirt ? >>>> >>>> >>>>> >>>>> >>>>> Best Olaf >>>>> _______________________________________________ >>>>> Users mailing list -- [email protected] >>>>> To unsubscribe send an email to [email protected] >>>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >>>>> oVirt Code of Conduct: >>>>> https://www.ovirt.org/community/about/community-guidelines/ >>>>> List Archives: >>>>> https://lists.ovirt.org/archives/list/[email protected]/message/3CO35Q7VZMWNHS4LPUJNO7S47MGLSKS5/ >>>>> >>>> >>>> >>>> -- >>>> >>>> SANDRO BONAZZOLA >>>> >>>> MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV >>>> >>>> Red Hat EMEA <https://www.redhat.com/> >>>> >>>> [email protected] >>>> <https://red.ht/sig> >>>> >>> _______________________________________________ >>> Users mailing list -- [email protected] >>> To unsubscribe send an email to [email protected] >>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >>> oVirt Code of Conduct: >>> https://www.ovirt.org/community/about/community-guidelines/ >>> List Archives: >>> https://lists.ovirt.org/archives/list/[email protected]/message/HAGTA64LF7LLE6YMHQ6DLT26MD2GZ2PK/ >>> >>
_______________________________________________ Users mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/ULEOYIYKPBYF5YIHMPJ26EQUQHZ2VLBT/

