What you are missing is the fact that gluster requires more than one set of bricks to recover from a dead host. I.e. In your set up, you'd need 6 hosts. 4x replicas and 2x arbiters with at least one set (2x replicas and 1x arbiter) operational bare minimum. Automated commands to fix the volume do not exist otherwise. (It's a Gluster limitation.) This can be fixed manually however.
Standard Disclaimer: Back up your data first! Fixing this issue requires manual intervention. Reader assumes all responsiblity for any action resulting from the instructions below. Etc. If it's just a dead brick, (i.e. the host is still functional), all you really need to do is replace the underlying storage: 1. Take the gluster volume offline. 2. Remove the bad storage device, and attach the replacement. 3. rsync / scp / etc. the data from a known good brick (be sure to include hidden files / preserve file times and ownership / SELinux labels / etc. ). 4. Restart the gluster volume. Gluster *might* still need to heal everything after all of that, but it should start the volume and get it running again. If the host itself is dead, (and the underlying storage is still functional), you can just move the underlying storage over to the new host: 1. Take the gluster volume offline. 2. Attach the old storage. 3. Fix up the ids on the volume file. (https://serverfault.com/questions/631365/rename-a-glusterfs-peer) 4. Restart the gluster volume. If both the host and underlying storage are dead, you'll need to do both tasks: 1. Take the gluster volume offline. 2. Attach the new storage. 3. rsync / scp / etc. the data from a known good brick (be sure to include hidden files / preserve file times and ownership / SELinux labels / etc. ). 4. Fix up the ids on the volume file. 5. Restart the gluster volume. Keep in mind one thing however: If the gluster host you are replacing is used by oVirt to connect to the volume (I.e. It's the host named in the volume config in the Admin portal). The new host will need to retain the old hostname / IP, or you'll need to update oVirt's config. Otherwise the VM hosts will wind up in Unassigned / Non-functional status. - Patrick Hibbs On Sun, 2022-07-17 at 22:15 +0300, Gilboa Davara wrote: > Hello all, > > I'm attempting to replace a dead host in a replica 2 + arbiter > gluster setup and replace it with a new host. > I've already set up a new host (same hostname..localdomain) and got > into the cluster. > > $ gluster peer status > Number of Peers: 2 > > Hostname: office-wx-hv3-lab-gfs > Uuid: 4e13f796-b818-4e07-8523-d84eb0faa4f9 > State: Peer in Cluster (Connected) > > Hostname: office-wx-hv1-lab-gfs.localdomain <------ This is a new > host. > Uuid: eee17c74-0d93-4f92-b81d-87f6b9c2204d > State: Peer in Cluster (Connected) > > $ gluster volume info GV2Data > Volume Name: GV2Data > Type: Replicate > Volume ID: c1946fc2-ed94-4b9f-9da3-f0f1ee90f303 > Status: Stopped > Snapshot Count: 0 > Number of Bricks: 1 x (2 + 1) = 3 > Transport-type: tcp > Bricks: > Brick1: office-wx-hv1-lab-gfs:/mnt/LogGFSData/brick <------ This is > the dead host. > Brick2: office-wx-hv2-lab-gfs:/mnt/LogGFSData/brick > Brick3: office-wx-hv3-lab-gfs:/mnt/LogGFSData/brick (arbiter) > ... > > Looking at the docs, it seems that I need to remove the dead brick. > > $ gluster volume remove-brick GV2Data office-wx-hv1-lab- > gfs:/mnt/LogGFSData/brick start > Running remove-brick with cluster.force-migration enabled can result > in data corruption. It is safer to disable this option so that files > that receive writes during migration are not migrated. > Files that are not migrated can then be manually copied after the > remove-brick commit operation. > Do you want to continue with your current cluster.force-migration > settings? (y/n) y > volume remove-brick start: failed: Removing bricks from replicate > configuration is not allowed without reducing replica count > explicitly > > So I guess I need to drop from replica 2 + arbiter to replica 1 + > arbiter (?). > > $ gluster volume remove-brick GV2Data replica 1 office-wx-hv1-lab- > gfs:/mnt/LogGFSData/brick start > Running remove-brick with cluster.force-migration enabled can result > in data corruption. It is safer to disable this option so that files > that receive writes during migration are not migrated. > Files that are not migrated can then be manually copied after the > remove-brick commit operation. > Do you want to continue with your current cluster.force-migration > settings? (y/n) y > volume remove-brick start: failed: need 2(xN) bricks for reducing > replica count of the volume from 3 to 1 > > ... What am I missing? > > - Gilboa > _______________________________________________ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/privacy-policy.html > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/OIXTFTJREUAHGP3WUW7DFL3VJNEMFJLF/
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/MAEPVKYSBBY3ZXO7T3VHVCPLQINDLGFO/