Re: [Gluster-users] gluster and qcow2 images
Hi On Fri, Jun 28, 2019, 17:54 Marcus Schopen wrote: > Hi, > > does anyone have experience with gluster in KVM environments? I would > like to hold qcow2 images of a KVM host with a second KVM host in sync. > Unfortunately, shared storage is not available to me, only the > two KVM hosts. In principle, it would be sufficient for me - in case of > a failure of the first KVM host - to start the guests on the second > host by hand without restoring the images from the nightly backup > first. The question is, is glusterfs a sensible solution here or > should one better use other approaches e.g. DRBD. I have read > contradictory statements about this, many advise against using gluster > for qcow2 images, some report no problems at all. > Redhat uses gluster in its RHEV solution. Ovirt is the open source one. Thus gluster can be used with good results. You will need a 10G network for the gluster storage for higher performance and enable sharding on the shared volume. Two node setups are prone to split brain issues which may cause headaches. I am running such setups for years and encountered few splits which i was able to recover from. You need some fencing solution inplace to minimize such issues. I would expect higher performance from DRBD, though I am not aware of any GUI solution that simplifies its management. > > Cheers > Marcus > > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Geo-Replication Changelog Error - is a directory
Hello, I'm having some issues with successfully establishing a geo-repolication session between a 7-server distribute cluster as the primary volume, and a 2 server distribute cluster as the secondary volume. Both are running the same version of gluster on CentOS 7: glusterfs-5.3-2.el7.x86_64. I was able to setup the replication keys, user, groups, etc and establish the session, but it goes faulty quickly after starting. The error from the gsyncd.log is: Changelog register failed error=[Errno 21] Is a directory We made an attempt about 2 years ago to configure geo-replication but abandoned it, now with a new cluster I wanted to get it setup, but it looks like changelogs have been accumulating since then: [root@gluster07 .glusterfs]# ls -lh changelogs > /var/tmp/changelogs.txt [root@gluster07 ~]# head /var/tmp/changelogs.txt total 11G -rw-r--r--. 1 root root 130 Jun 27 13:48 CHANGELOG -rw-r--r--. 1 root root 2.6K Jun 19 2017 CHANGELOG.1497891971 -rw-r--r--. 1 root root 470 Jun 19 2017 CHANGELOG.1497892055 -rw-r--r--. 1 root root 186 Jun 19 2017 CHANGELOG.1497892195 -rw-r--r--. 1 root root 458 Jun 19 2017 CHANGELOG.1497892308 -rw-r--r--. 1 root root 188 Jun 19 2017 CHANGELOG.1497892491 -rw-r--r--. 1 root root 862 Jun 19 2017 CHANGELOG.1497892828 -rw-r--r--. 1 root root 11K Jun 19 2017 CHANGELOG.1497892927 -rw-r--r--. 1 root root 4.4K Jun 19 2017 CHANGELOG.1497892941 [root@gluster07 ~]# tail /var/tmp/changelogs.txt -rw-r--r--. 1 root root 130 Jun 27 13:47 CHANGELOG.1561668463 -rw-r--r--. 1 root root 130 Jun 27 13:47 CHANGELOG.1561668477 -rw-r--r--. 1 root root 130 Jun 27 13:48 CHANGELOG.1561668491 -rw-r--r--. 1 root root 130 Jun 27 13:48 CHANGELOG.1561668506 -rw-r--r--. 1 root root 130 Jun 27 13:48 CHANGELOG.1561668521 -rw-r--r--. 1 root root 130 Jun 27 13:48 CHANGELOG.1561668536 -rw-r--r--. 1 root root 130 Jun 27 13:49 CHANGELOG.1561668550 -rw-r--r--. 1 root root 130 Jun 27 13:49 CHANGELOG.1561668565 drw---. 2 root root 10 Jun 19 2017 csnap drw---. 2 root root 37 Jun 19 2017 htime Could this be related? When deleting the replication session I made sure to try the 'delete reset-sync-time' option, but it failed with: gsyncd failed to delete session info for storage and 10.0.231.81::pcic-backup peers geo-replication command failed Here is the volume info: [root@gluster07 ~]# gluster volume info storage Volume Name: storage Type: Distribute Volume ID: 6f95525a-94d7-4174-bac4-e1a18fe010a2 Status: Started Snapshot Count: 0 Number of Bricks: 7 Transport-type: tcp Bricks: Brick1: 10.0.231.50:/mnt/raid6-storage/storage Brick2: 10.0.231.51:/mnt/raid6-storage/storage Brick3: 10.0.231.52:/mnt/raid6-storage/storage Brick4: 10.0.231.53:/mnt/raid6-storage/storage Brick5: 10.0.231.54:/mnt/raid6-storage/storage Brick6: 10.0.231.55:/mnt/raid6-storage/storage Brick7: 10.0.231.56:/mnt/raid6-storage/storage Options Reconfigured: features.quota-deem-statfs: on features.read-only: off features.inode-quota: on features.quota: on performance.readdir-ahead: on nfs.disable: on geo-replication.indexing: on geo-replication.ignore-pid-check: on changelog.changelog: on transport.address-family: inet Any ideas? Thanks, -Matthew ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] gluster and qcow2 images
Hi, does anyone have experience with gluster in KVM environments? I would like to hold qcow2 images of a KVM host with a second KVM host in sync. Unfortunately, shared storage is not available to me, only the two KVM hosts. In principle, it would be sufficient for me - in case of a failure of the first KVM host - to start the guests on the second host by hand without restoring the images from the nightly backup first. The question is, is glusterfs a sensible solution here or should one better use other approaches e.g. DRBD. I have read contradictory statements about this, many advise against using gluster for qcow2 images, some report no problems at all. Cheers Marcus ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Removing subvolume from dist/rep volume
On Thu, Jun 27, 2019 at 12:17:10PM +0530, Nithya Balachandran wrote: > There are some edge cases that may prevent a file from being migrated > during a remove-brick. Please do the following after this: > >1. Check the remove-brick status for any failures. If there are any, >check the rebalance log file for errors. >2. Even if there are no failures, check the removed bricks to see if any >files have not been migrated. If there are any, please check that they are >valid files on the brick and copy them to the volume from the brick to the >mount point. Well, looks like I hit one of those edge cases. Probably because of some issues around a reboot last September which left a handful of files in a state where self-heal identified them as needing to be healed, but incapable of actually healing them. (Check the list archives for "Kicking a stuck heal", posted on Sept 4, if you want more details.) So I'm getting 9 failures on the arbiter (merlin), 8 on one data brick (gandalf), and 3 on the other (saruman). Looking in /var/log/gluster/palantir-rebalance.log, I see those numbers of migrate file failed: /.shard/291e9749-2d1b-47af-ad53-3a09ad4e64c6.229: failed to lock file on palantir-replicate-1 [Stale file handle] errors. Also, merlin has four errors, and gandalf has one, of the form: Gfid mismatch detected for /0f500288-ff62-4f0b-9574-53f510b4159f.2898>, 9f00c0fe-58c3-457e-a2e6-f6a006d1cfc6 on palantir-client-7 and 08bb7cdc-172b-4c21-916a-2a244c095a3e on palantir-client-1. There are no gfid mismatches recorded on saruman. All of the gfid mismatches are for and (on saruman) appear to correspond to 0-byte files (e.g., .shard/0f500288-ff62-4f0b-9574-53f510b4159f.2898, in the case of the gfid mismatch quoted above). For both types of errors, all affected files are in .shard/ and have UUID-style names, so I have no idea which actual files they belong to. File sizes are generally either 0 bytes or 4M (exactly), although one of them has a size slightly larger than 3M. So I'm assuming they're chunks of larger files (which would be almost all the files on the volume - it's primarily holding disk image files for kvm servers). Web searches generally seem to consider gfid mismatches to be a form of split-brain, but `gluster volume heal palantir info split-brain` shows "Number of entries in split-brain: 0" for all bricks, including those bricks which are reporting gfid mismatches. Given all that, how do I proceed with cleaning up the stale handle issues? I would guess that this will involve somehow converting the shard filename to a "real" filename, then shutting down the corresponding VM and maybe doing some additional cleanup. And then there's the gfid mismatches. Since they're for 0-byte files, is it safe to just ignore them on the assumption that they only hold metadata? Or do I need to do some kind of split-brain resolution on them (even though gluster says no files are in split-brain)? Finally, a listing of /var/local/brick0/data/.shard on saruman, in case any of the information it contains (like file sizes/permissions) might provide clues to resolving the errors: --- cut here --- root@saruman:/var/local/brick0/data/.shard# ls -l total 63996 -rw-rw 2 root libvirt-qemu 0 Sep 17 2018 0f500288-ff62-4f0b-9574-53f510b4159f.2864 -rw-rw 2 root libvirt-qemu 0 Sep 17 2018 0f500288-ff62-4f0b-9574-53f510b4159f.2868 -rw-rw 2 root libvirt-qemu 0 Sep 17 2018 0f500288-ff62-4f0b-9574-53f510b4159f.2879 -rw-rw 2 root libvirt-qemu 0 Sep 17 2018 0f500288-ff62-4f0b-9574-53f510b4159f.2898 -rw--- 2 root libvirt-qemu 4194304 May 17 14:42 291e9749-2d1b-47af-ad53-3a09ad4e64c6.229 -rw--- 2 root libvirt-qemu 4194304 Jun 24 09:10 291e9749-2d1b-47af-ad53-3a09ad4e64c6.925 -rw-rw-r-- 2 root libvirt-qemu 4194304 Jun 26 12:54 2df12cb0-6cf4-44ae-8b0a-4a554791187e.266 -rw-rw-r-- 2 root libvirt-qemu 4194304 Jun 26 16:30 2df12cb0-6cf4-44ae-8b0a-4a554791187e.820 -rw-r--r-- 2 root libvirt-qemu 4194304 Jun 17 20:22 323186b1-6296-4cbe-8275-b940cc9d65cf.27466 -rw-r--r-- 2 root libvirt-qemu 4194304 Jun 27 05:01 323186b1-6296-4cbe-8275-b940cc9d65cf.32575 -rw-r--r-- 2 root libvirt-qemu 3145728 Jun 11 13:23 323186b1-6296-4cbe-8275-b940cc9d65cf.3448 -T 2 root libvirt-qemu 0 Jun 28 14:26 4cd094f4-0344-4660-98b0-83249d5bd659.22998 -rw--- 2 root libvirt-qemu 4194304 Mar 13 2018 6cdd2e5c-f49e-492b-8039-239e71577836.1302 -T 2 root libvirt-qemu 0 Jun 28 13:22 7530a2d1-d6ec-4a04-95a2-da1f337ac1ad.47131 -T 2 root libvirt-qemu 0 Jun 28 13:22 7530a2d1-d6ec-4a04-95a2-da1f337ac1ad.52615 -rw-rw-r-- 2 root libvirt-qemu 4194304 Jun 27 08:56 8fefae99-ed2a-4a8f-ab87-aa94c6bb6e68.100 -rw-rw-r-- 2 root libvirt-qemu 4194304 Jun 27 11:29 8fefae99-ed2a-4a8f-ab87-aa94c6bb6e68.106 -rw-rw-r-- 2 root libvirt-qemu 4194304 Jun 28 02:35 8fefae99-ed2a-4a8f-ab87-aa94c6bb6e68.137 -rw-rw-r-- 2 root libvirt-qemu 4194304 Nov 4 2018
Re: [Gluster-users] Removing subvolume from dist/rep volume
OK, I'm just careless. Forgot to include "start" after the list of bricks... On Fri, Jun 28, 2019 at 04:03:40AM -0500, Dave Sherohman wrote: > On Thu, Jun 27, 2019 at 12:17:10PM +0530, Nithya Balachandran wrote: > > On Tue, 25 Jun 2019 at 15:26, Dave Sherohman wrote: > > > My objective is to remove nodes B and C entirely. > > > > > > First up is to pull their bricks from the volume: > > > > > > # gluster volume remove-brick myvol B:/data C:/data A:/arb1 start > > > (wait for data to be migrated) > > > # gluster volume remove-brick myvol B:/data C:/data A:/arb1 commit > > > > > > > > There are some edge cases that may prevent a file from being migrated > > during a remove-brick. Please do the following after this: > > > >1. Check the remove-brick status for any failures. If there are any, > >check the rebalance log file for errors. > >2. Even if there are no failures, check the removed bricks to see if any > >files have not been migrated. If there are any, please check that they > > are > >valid files on the brick and copy them to the volume from the brick to > > the > >mount point. > > > > The rest of the steps look good. > > Apparently, they weren't quite right. I tried it and it just gives me > the usage notes in return. Transcript of the commands and output is below. > > Any insight on how I got the syntax wrong? > > --- cut here --- > root@merlin:/# gluster volume status > Status of volume: palantir > Gluster process TCP Port RDMA Port Online Pid > -- > Brick saruman:/var/local/brick0/data49153 0 Y 17995 > Brick gandalf:/var/local/brick0/data49153 0 Y 9415 > Brick merlin:/var/local/arbiter1/data 49170 0 Y 35034 > Brick azathoth:/var/local/brick0/data 49153 0 Y 25312 > Brick yog-sothoth:/var/local/brick0/data49152 0 Y 10671 > Brick merlin:/var/local/arbiter2/data 49171 0 Y 35043 > Brick cthulhu:/var/local/brick0/data49153 0 Y 21925 > Brick mordiggian:/var/local/brick0/data 49152 0 Y 12368 > Brick merlin:/var/local/arbiter3/data 49172 0 Y 35050 > Self-heal Daemon on localhost N/A N/AY 1209 > Self-heal Daemon on saruman.lub.lu.se N/A N/AY 23253 > Self-heal Daemon on gandalf.lub.lu.se N/A N/AY 9542 > Self-heal Daemon on mordiggian.lub.lu.seN/A N/AY 11016 > Self-heal Daemon on yog-sothoth.lub.lu.se N/A N/AY 8126 > Self-heal Daemon on cthulhu.lub.lu.se N/A N/AY 30998 > Self-heal Daemon on azathoth.lub.lu.se N/A N/AY 34399 > > Task Status of Volume palantir > -- > Task : Rebalance > ID : e58bc091-5809-4364-af83-2b89bc5c7106 > Status : completed > > root@merlin:/# gluster volume remove-brick palantir > saruman:/var/local/brick0/data gandalf:/var/local/brick0/data > merlin:/var/local/arbiter1/data > > Usage: > volume remove-brick [replica ] ... > > > root@merlin:/# gluster volume remove-brick palantir replica 3 arbiter 1 > saruman:/var/local/brick0/data gandalf:/var/local/brick0/data > merlin:/var/local/arbiter1/data > > Usage: > volume remove-brick [replica ] ... > > > root@merlin:/# gluster volume remove-brick palantir replica 3 > saruman:/var/local/brick0/data gandalf:/var/local/brick0/data > merlin:/var/local/arbiter1/data > > Usage: > volume remove-brick [replica ] ... > > --- cut here --- > > -- > Dave Sherohman > ___ > Gluster-users mailing list > Gluster-users@gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- Dave Sherohman ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Removing subvolume from dist/rep volume
On Fri, 28 Jun 2019 at 14:34, Dave Sherohman wrote: > On Thu, Jun 27, 2019 at 12:17:10PM +0530, Nithya Balachandran wrote: > > On Tue, 25 Jun 2019 at 15:26, Dave Sherohman wrote: > > > My objective is to remove nodes B and C entirely. > > > > > > First up is to pull their bricks from the volume: > > > > > > # gluster volume remove-brick myvol B:/data C:/data A:/arb1 start > > > (wait for data to be migrated) > > > # gluster volume remove-brick myvol B:/data C:/data A:/arb1 commit > > > > > > > > There are some edge cases that may prevent a file from being migrated > > during a remove-brick. Please do the following after this: > > > >1. Check the remove-brick status for any failures. If there are any, > >check the rebalance log file for errors. > >2. Even if there are no failures, check the removed bricks to see if > any > >files have not been migrated. If there are any, please check that > they are > >valid files on the brick and copy them to the volume from the brick > to the > >mount point. > > > > The rest of the steps look good. > > Apparently, they weren't quite right. I tried it and it just gives me > the usage notes in return. Transcript of the commands and output is below. > > Any insight on how I got the syntax wrong? > > --- cut here --- > root@merlin:/# gluster volume status > Status of volume: palantir > Gluster process TCP Port RDMA Port Online > Pid > > -- > Brick saruman:/var/local/brick0/data49153 0 Y > 17995 > Brick gandalf:/var/local/brick0/data49153 0 Y > 9415 > Brick merlin:/var/local/arbiter1/data 49170 0 Y > 35034 > Brick azathoth:/var/local/brick0/data 49153 0 Y > 25312 > Brick yog-sothoth:/var/local/brick0/data49152 0 Y > 10671 > Brick merlin:/var/local/arbiter2/data 49171 0 Y > 35043 > Brick cthulhu:/var/local/brick0/data49153 0 Y > 21925 > Brick mordiggian:/var/local/brick0/data 49152 0 Y > 12368 > Brick merlin:/var/local/arbiter3/data 49172 0 Y > 35050 > Self-heal Daemon on localhost N/A N/AY > 1209 > Self-heal Daemon on saruman.lub.lu.se N/A N/AY > 23253 > Self-heal Daemon on gandalf.lub.lu.se N/A N/AY > 9542 > Self-heal Daemon on mordiggian.lub.lu.seN/A N/AY > 11016 > Self-heal Daemon on yog-sothoth.lub.lu.se N/A N/AY > 8126 > Self-heal Daemon on cthulhu.lub.lu.se N/A N/AY > 30998 > Self-heal Daemon on azathoth.lub.lu.se N/A N/AY > 34399 > > Task Status of Volume palantir > > -- > Task : Rebalance > ID : e58bc091-5809-4364-af83-2b89bc5c7106 > Status : completed > > root@merlin:/# gluster volume remove-brick palantir > saruman:/var/local/brick0/data gandalf:/var/local/brick0/data > merlin:/var/local/arbiter1/data > > You had it right in the first email. gluster volume remove-brick palantir replica 3 arbiter 1 saruman:/var/local/brick0/data gandalf:/var/local/brick0/data merlin:/var/local/arbiter1/data *start* Usage: > volume remove-brick [replica ] ... > > > root@merlin:/# gluster volume remove-brick palantir replica 3 arbiter 1 > saruman:/var/local/brick0/data gandalf:/var/local/brick0/data > merlin:/var/local/arbiter1/data > > Usage: > volume remove-brick [replica ] ... > > > root@merlin:/# gluster volume remove-brick palantir replica 3 > saruman:/var/local/brick0/data gandalf:/var/local/brick0/data > merlin:/var/local/arbiter1/data > > Usage: > volume remove-brick [replica ] ... > > --- cut here --- > > -- > Dave Sherohman > ___ > Gluster-users mailing list > Gluster-users@gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Removing subvolume from dist/rep volume
On Thu, Jun 27, 2019 at 12:17:10PM +0530, Nithya Balachandran wrote: > On Tue, 25 Jun 2019 at 15:26, Dave Sherohman wrote: > > My objective is to remove nodes B and C entirely. > > > > First up is to pull their bricks from the volume: > > > > # gluster volume remove-brick myvol B:/data C:/data A:/arb1 start > > (wait for data to be migrated) > > # gluster volume remove-brick myvol B:/data C:/data A:/arb1 commit > > > > > There are some edge cases that may prevent a file from being migrated > during a remove-brick. Please do the following after this: > >1. Check the remove-brick status for any failures. If there are any, >check the rebalance log file for errors. >2. Even if there are no failures, check the removed bricks to see if any >files have not been migrated. If there are any, please check that they are >valid files on the brick and copy them to the volume from the brick to the >mount point. > > The rest of the steps look good. Apparently, they weren't quite right. I tried it and it just gives me the usage notes in return. Transcript of the commands and output is below. Any insight on how I got the syntax wrong? --- cut here --- root@merlin:/# gluster volume status Status of volume: palantir Gluster process TCP Port RDMA Port Online Pid -- Brick saruman:/var/local/brick0/data49153 0 Y 17995 Brick gandalf:/var/local/brick0/data49153 0 Y 9415 Brick merlin:/var/local/arbiter1/data 49170 0 Y 35034 Brick azathoth:/var/local/brick0/data 49153 0 Y 25312 Brick yog-sothoth:/var/local/brick0/data49152 0 Y 10671 Brick merlin:/var/local/arbiter2/data 49171 0 Y 35043 Brick cthulhu:/var/local/brick0/data49153 0 Y 21925 Brick mordiggian:/var/local/brick0/data 49152 0 Y 12368 Brick merlin:/var/local/arbiter3/data 49172 0 Y 35050 Self-heal Daemon on localhost N/A N/AY 1209 Self-heal Daemon on saruman.lub.lu.se N/A N/AY 23253 Self-heal Daemon on gandalf.lub.lu.se N/A N/AY 9542 Self-heal Daemon on mordiggian.lub.lu.seN/A N/AY 11016 Self-heal Daemon on yog-sothoth.lub.lu.se N/A N/AY 8126 Self-heal Daemon on cthulhu.lub.lu.se N/A N/AY 30998 Self-heal Daemon on azathoth.lub.lu.se N/A N/AY 34399 Task Status of Volume palantir -- Task : Rebalance ID : e58bc091-5809-4364-af83-2b89bc5c7106 Status : completed root@merlin:/# gluster volume remove-brick palantir saruman:/var/local/brick0/data gandalf:/var/local/brick0/data merlin:/var/local/arbiter1/data Usage: volume remove-brick [replica ] ... root@merlin:/# gluster volume remove-brick palantir replica 3 arbiter 1 saruman:/var/local/brick0/data gandalf:/var/local/brick0/data merlin:/var/local/arbiter1/data Usage: volume remove-brick [replica ] ... root@merlin:/# gluster volume remove-brick palantir replica 3 saruman:/var/local/brick0/data gandalf:/var/local/brick0/data merlin:/var/local/arbiter1/data Usage: volume remove-brick [replica ] ... --- cut here --- -- Dave Sherohman ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users