Usually I would blacklist the Gluster devices by creating the necessary stanzas in /etc/multipath/conf.d/blacklist.conf This way you will keep the situation simple.
For your problem, it's hard to odentify the problem based on the e-mails.What are your symptoms ? To debug GlusterFS, it is good to start from the brick logs (/var/log/gluster/bricks) and the current heal status. On a 3-way replica volume heals should be resolved by GlusterFS - if not, there is a bug. Best Regards,Strahil Nikolov Best Regards,Strahil Nikolov On Tue, May 31, 2022 at 16:32, jb<jonba...@gmail.com> wrote: I still have the same problems, but it look like that the errors comes a bit less often. I'm starting now to migrate the disk images to a NFS storage. When there is no other way, I would recreate the glusterFS cluster. The problem I have is, that I don't know where is the root of this problem and if recreating would fix the issue in longer terms. Am 29.05.22 um 20:26 schrieb Nir Soffer: > On Sun, May 29, 2022 at 9:03 PM Jonathan Baecker <jonba...@gmail.com> wrote: >> Am 29.05.22 um 19:24 schrieb Nir Soffer: >> >> On Sun, May 29, 2022 at 7:50 PM Jonathan Baecker <jonba...@gmail.com> wrote: >> >> Hello everybody, >> >> we run a 3 node self hosted cluster with GlusterFS. I had a lot of problem >> upgrading ovirt from 4.4.10 to 4.5.0.2 and now we have cluster instability. >> >> First I will write down the problems I had with upgrading, so you get a >> bigger picture: >> >> engine update when fine >> But nodes I could not update because of wrong version of imgbase, so I did a >> manual update to 4.5.0.1 and later to 4.5.0.2. First time after updating it >> was still booting into 4.4.10, so I did a reinstall. >> Then after second reboot I ended up in the emergency mode. After a long >> searching I figure out that lvm.conf using use_devicesfile now but there it >> uses the wrong filters. So I comment out this and add the old filters back. >> This procedure I have done on all 3 nodes. >> >> When use_devicesfile (default in 4.5) is enabled, lvm filter is not >> used. During installation >> the old lvm filter is removed. >> >> Can you share more info on why it does not work for you? >> >> The problem was, that the node could not mount the gluster volumes anymore >> and ended up in emergency mode. >> >> - output of lsblk >> >> NAME MAJ:MIN RM SIZE >> RO TYPE MOUNTPOINT >> sda 8:0 0 1.8T >> 0 disk >> `-XA1920LE10063_HKS028AV 253:0 0 1.8T >> 0 mpath >> |-gluster_vg_sda-gluster_thinpool_gluster_vg_sda_tmeta 253:16 0 9G >>0 lvm >> | `-gluster_vg_sda-gluster_thinpool_gluster_vg_sda-tpool 253:18 0 1.7T >>0 lvm >> | |-gluster_vg_sda-gluster_thinpool_gluster_vg_sda 253:19 0 1.7T 1 >>lvm >> | |-gluster_vg_sda-gluster_lv_data 253:20 0 100G 0 >>lvm /gluster_bricks/data >> | `-gluster_vg_sda-gluster_lv_vmstore 253:21 0 1.6T >>0 lvm /gluster_bricks/vmstore >> `-gluster_vg_sda-gluster_thinpool_gluster_vg_sda_tdata 253:17 0 1.7T >>0 lvm >> `-gluster_vg_sda-gluster_thinpool_gluster_vg_sda-tpool 253:18 0 1.7T >>0 lvm >> |-gluster_vg_sda-gluster_thinpool_gluster_vg_sda 253:19 0 1.7T >>1 lvm >> |-gluster_vg_sda-gluster_lv_data 253:20 0 100G >>0 lvm /gluster_bricks/data >> `-gluster_vg_sda-gluster_lv_vmstore 253:21 0 1.6T >>0 lvm /gluster_bricks/vmstore >> sr0 11:0 1 1024M >> 0 rom >> nvme0n1 259:0 0 238.5G >> 0 disk >> |-nvme0n1p1 259:1 0 1G >> 0 part /boot >> |-nvme0n1p2 259:2 0 134G >> 0 part >> | |-onn-pool00_tmeta 253:1 0 1G >> 0 lvm >> | | `-onn-pool00-tpool 253:3 0 87G >> 0 lvm >> | | |-onn-ovirt--node--ng--4.5.0.2--0.20220513.0+1 253:4 0 50G >> 0 lvm / >> | | |-onn-pool00 253:7 0 87G >> 1 lvm >> | | |-onn-home 253:8 0 1G 0 >> lvm /home >> | | |-onn-tmp 253:9 0 1G >> 0 lvm /tmp >> | | |-onn-var 253:10 0 15G >> 0 lvm /var >> | | |-onn-var_crash 253:11 0 10G >> 0 lvm /var/crash >> | | |-onn-var_log 253:12 0 8G 0 >> lvm /var/log >> | | |-onn-var_log_audit 253:13 0 2G 0 >> lvm /var/log/audit >> | | |-onn-ovirt--node--ng--4.5.0.1--0.20220511.0+1 253:14 0 50G 0 >> lvm >> | | `-onn-var_tmp 253:15 0 10G >> 0 lvm /var/tmp >> | |-onn-pool00_tdata 253:2 0 87G >> 0 lvm >> | | `-onn-pool00-tpool 253:3 0 87G >> 0 lvm >> | | |-onn-ovirt--node--ng--4.5.0.2--0.20220513.0+1 253:4 0 50G >> 0 lvm / >> | | |-onn-pool00 253:7 0 87G >> 1 lvm >> | | |-onn-home 253:8 0 1G 0 >> lvm /home >> | | |-onn-tmp 253:9 0 1G >> 0 lvm /tmp >> | | |-onn-var 253:10 0 15G >> 0 lvm /var >> | | |-onn-var_crash 253:11 0 10G >> 0 lvm /var/crash >> | | |-onn-var_log 253:12 0 8G 0 >> lvm /var/log >> | | |-onn-var_log_audit 253:13 0 2G 0 >> lvm /var/log/audit >> | | |-onn-ovirt--node--ng--4.5.0.1--0.20220511.0+1 253:14 0 50G 0 >> lvm >> | | `-onn-var_tmp 253:15 0 10G >> 0 lvm /var/tmp >> | `-onn-swap 253:5 0 20G >> 0 lvm [SWAP] >> `-nvme0n1p3 259:3 0 95G >> 0 part >> `-gluster_vg_nvme0n1p3-gluster_lv_engine 253:6 0 94G >> 0 lvm /gluster_bricks/engine > > >> - The old lvm filter used, and why it was needed >> >> filter = >> ["a|^/dev/disk/by-id/lvm-pv-uuid-Nn7tZl-TFdY-BujO-VZG5-EaGW-5YFd-Lo5pwa$|", >> "a|^/dev/disk/by-id/lvm-pv-uuid-Wcbxnx-2RhC-s1Re-s148-nLj9-Tr3f-jj4VvE$|", >> "a|^/dev/disk/by-id/lvm-pv-uuid-lX51wm-H7V4-3CTn-qYob-Rkpx-Tptd-t94jNL$|", >> "r|.*|"] >> >> I don't remember exactly any more why it was needed, but without the node >> was not working correctly. I think I even used vdsm-tool config-lvm-filter. > I think that if you list the devices in this filter: > > ls -lh >/dev/disk/by-id/lvm-pv-uuid-Nn7tZl-TFdY-BujO-VZG5-EaGW-5YFd-Lo5pwa \ > >/dev/disk/by-id/lvm-pv-uuid-Wcbxnx-2RhC-s1Re-s148-nLj9-Tr3f-jj4VvE > \ > >/dev/disk/by-id/lvm-pv-uuid-lX51wm-H7V4-3CTn-qYob-Rkpx-Tptd-t94jNL > > You will see that these are the devices used by these vgs: > > gluster_vg_sda, gluster_vg_nvme0n1p3, onn > >> - output of vdsm-tool config-lvm-filter >> >> Analyzing host... >> Found these mounted logical volumes on this host: >> >> logical volume: /dev/mapper/gluster_vg_nvme0n1p3-gluster_lv_engine >> mountpoint: /gluster_bricks/engine >> devices: /dev/nvme0n1p3 >> >> logical volume: /dev/mapper/gluster_vg_sda-gluster_lv_data >> mountpoint: /gluster_bricks/data >> devices: /dev/mapper/XA1920LE10063_HKS028AV >> >> logical volume: /dev/mapper/gluster_vg_sda-gluster_lv_vmstore >> mountpoint: /gluster_bricks/vmstore >> devices: /dev/mapper/XA1920LE10063_HKS028AV >> >> logical volume: /dev/mapper/onn-home >> mountpoint: /home >> devices: /dev/nvme0n1p2 >> >> logical volume: /dev/mapper/onn-ovirt--node--ng--4.5.0.2--0.20220513.0+1 >> mountpoint: / >> devices: /dev/nvme0n1p2 >> >> logical volume: /dev/mapper/onn-swap >> mountpoint: [SWAP] >> devices: /dev/nvme0n1p2 >> >> logical volume: /dev/mapper/onn-tmp >> mountpoint: /tmp >> devices: /dev/nvme0n1p2 >> >> logical volume: /dev/mapper/onn-var >> mountpoint: /var >> devices: /dev/nvme0n1p2 >> >> logical volume: /dev/mapper/onn-var_crash >> mountpoint: /var/crash >> devices: /dev/nvme0n1p2 >> >> logical volume: /dev/mapper/onn-var_log >> mountpoint: /var/log >> devices: /dev/nvme0n1p2 >> >> logical volume: /dev/mapper/onn-var_log_audit >> mountpoint: /var/log/audit >> devices: /dev/nvme0n1p2 >> >> logical volume: /dev/mapper/onn-var_tmp >> mountpoint: /var/tmp >> devices: /dev/nvme0n1p2 >> >> Configuring LVM system.devices. >> Devices for following VGs will be imported: >> >> gluster_vg_sda, gluster_vg_nvme0n1p3, onn >> >> To properly configure the host, we need to add multipath >> blacklist in /etc/multipath/conf.d/vdsm_blacklist.conf: >> >> blacklist { >> wwid "eui.0025388901b1e26f" >> } >> >> >> Configure host? [yes,NO] > If you run "vdsm-tool config-lvm-filter" and confirm with "yes", I > think all the vgs > will be imported properly into lvm devices file. > > I don't think it will solve the storage issues you have since Feb > 2022, but at least > you will have a standard configuration and the next upgrade will not revert > your > local settings. > >> If using lvm devices does not work for you, you can enable the lvm >> filter in vdsm configuration >> by adding a drop-in file: >> >> $ cat /etc/vdsm/vdsm.conf.d/99-local.conf >> [lvm] >> config_method = filter >> >> And run: >> >> vdsm-tool config-lvm-filter >> >> to configure the lvm filter in the best way for vdsm. If this does not create >> the right filter we would like to know why, but in general you should use >> lvm devices since it avoids the trouble of maintaining the filter and dealing >> with upgrades and user edited lvm filter. >> >> If you disable use_devicesfile, the next vdsm upgrade will enable it >> back unless >> you change the configuration. >> >> I would be happy to just use the default, when there is a way to make >> use_devicesfile to wok. >> >> Also even if you disable use_devicesfile in lvm.conf, vdsm still use >> --devices instead >> of filter when running lvm commands, and lvm commands run by vdsm ignore your >> lvm filter since the --devices option overrides the system settings. >> >> ... >> >> I notice some unsync volume warning, but because I had this in the past to, >> after upgrading, I though after some time they will disappear. The next day >> there still where there, so I decided to put the nodes again in the >> maintenance mode and restart the glusterd service. After some time the sync >> warnings where gone. >> >> Not clear what these warnings are, I guess Gluster warning? >> >> Yes was Gluster warnings under Storage -> Volumes it was saying that some >> entries are unsync. >> >> So now the actual problem: >> >> Since this time the cluster is unstable. I get different errors and warning, >> like: >> >> VM [name] is not responding >> out of nothing HA VM gets migrated >> VM migration can fail >> VM backup with snapshoting and export take very long >> >> How do you backup the vms? do you sue a backup application? how is it >> configured? >> >> I use a self made plython script, which uses the rest api. I create a >> snapshot from the VM, build a new VM from that snapshot and move the new one >> to the export domain. > This is not very efficient - this copy the entire vm at the point of > time of the snapshot > and then copy it again to the export domain. > > If you use a backup application supporting the incremental backup API, > the first full backup > will copy the entire vm once, but later incremental backup will copy > only the changes > since the last backup. > >> VMs are getting very slow some times >> Storage domain vmstore experienced a high latency of 9.14251 >> ovs|00001|db_ctl_base|ERR|no key "dpdk-init" in Open_vSwitch record "." >> column other_config >> 489279 [1064359]: s8 renewal error -202 delta_length 10 last_success 489249 >> 444853 [2243175]: s27 delta_renew read timeout 10 sec offset 0 >> /rhev/data-center/mnt/glusterSD/onode1.example.org:_vmstore/3cf83851-1cc8-4f97-8960-08a60b9e25db/dom_md/ids >> 471099 [2243175]: s27 delta_renew read timeout 10 sec offset 0 >> /rhev/data-center/mnt/glusterSD/onode1.example.org:_vmstore/3cf83851-1cc8-4f97-8960-08a60b9e25db/dom_md/ids >> many of: 424035 [2243175]: s27 delta_renew long write time XX sec >> >> All these issues tell use that your storage is not working correctly. >> >> sanlock.log is full of renewal errors form May: >> >> $ grep 2022-05- sanlock.log | wc -l >> 4844 >> >> $ grep 2022-05- sanlock.log | grep 'renewal error' | wc -l >> 631 >> >> But there is lot of trouble from earlier months: >> >> $ grep 2022-04- sanlock.log | wc -l >> 844 >> $ grep 2022-04- sanlock.log | grep 'renewal error' | wc -l >> 29 >> >> $ grep 2022-03- sanlock.log | wc -l >> 1609 >> $ grep 2022-03- sanlock.log | grep 'renewal error' | wc -l >> 483 >> >> $ grep 2022-02- sanlock.log | wc -l >> 826 >> $ grep 2022-02- sanlock.log | grep 'renewal error' | wc -l >> 242 >> >> Here sanlock log looks healthy: >> >> $ grep 2022-01- sanlock.log | wc -l >> 3 >> $ grep 2022-01- sanlock.log | grep 'renewal error' | wc -l >> 0 >> >> $ grep 2021-12- sanlock.log | wc -l >> 48 >> $ grep 2021-12- sanlock.log | grep 'renewal error' | wc -l >> 0 >> >> vdsm log shows that 2 domains are not accessible: >> >> $ grep ERROR vdsm.log >> 2022-05-29 15:07:19,048+0200 ERROR (check/loop) [storage.monitor] >> Error checking path >> /rhev/data-center/mnt/glusterSD/onode1.example.org:_data/de5f4123-0fac-4238-abcf-a329c142bd47/dom_md/metadata >> (monitor:511) >> 2022-05-29 16:33:59,049+0200 ERROR (check/loop) [storage.monitor] >> Error checking path >> /rhev/data-center/mnt/glusterSD/onode1.example.org:_data/de5f4123-0fac-4238-abcf-a329c142bd47/dom_md/metadata >> (monitor:511) >> 2022-05-29 16:34:39,049+0200 ERROR (check/loop) [storage.monitor] >> Error checking path >> /rhev/data-center/mnt/glusterSD/onode1.example.org:_data/de5f4123-0fac-4238-abcf-a329c142bd47/dom_md/metadata >> (monitor:511) >> 2022-05-29 17:21:39,050+0200 ERROR (check/loop) [storage.monitor] >> Error checking path >> /rhev/data-center/mnt/glusterSD/onode1.example.org:_data/de5f4123-0fac-4238-abcf-a329c142bd47/dom_md/metadata >> (monitor:511) >> 2022-05-29 17:55:59,712+0200 ERROR (check/loop) [storage.monitor] >> Error checking path >> /rhev/data-center/mnt/glusterSD/onode1.example.org:_vmstore/3cf83851-1cc8-4f97-8960-08a60b9e25db/dom_md/metadata >> (monitor:511) >> 2022-05-29 17:56:19,711+0200 ERROR (check/loop) [storage.monitor] >> Error checking path >> /rhev/data-center/mnt/glusterSD/onode1.example.org:_vmstore/3cf83851-1cc8-4f97-8960-08a60b9e25db/dom_md/metadata >> (monitor:511) >> 2022-05-29 17:56:39,050+0200 ERROR (check/loop) [storage.monitor] >> Error checking path >> /rhev/data-center/mnt/glusterSD/onode1.example.org:_data/de5f4123-0fac-4238-abcf-a329c142bd47/dom_md/metadata >> (monitor:511) >> 2022-05-29 17:56:39,711+0200 ERROR (check/loop) [storage.monitor] >> Error checking path >> /rhev/data-center/mnt/glusterSD/onode1.example.org:_vmstore/3cf83851-1cc8-4f97-8960-08a60b9e25db/dom_md/metadata >> (monitor:511) >> >> You need to find what is the issue with your Gluster storage. >> >> I hope that Ritesh can help debug the issue with Gluster. >> >> Nir >> >> I'm worry that I do something, that it makes it even more worst, and I hove >> not idea what's the problem. To me it looks not exactly like a problem with >> data inconsistencies. > The problem is that your Gluster storage is not healthy, and reading > and writing to it times out. > > Please keep users@ovirt.org CC when you reply. Gluster storage is very > popular in this mailing list > and you may get useful help from other users. > > Nir > _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/VBYRVRQPXXDZTDFG46LEECHLRDWDWZ37/
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/TVOYXO5PKXO4QKRIBVDEE3VFZER4ELT5/