Strahil, Looking at your suggestions I think I need to provide a bit more info on my current setup.
1. I have 9 hosts in total 2. I have 5 storage domains: - hosted_storage (Data Master) - vmstore1 (Data) - data1 (Data) - data2 (Data) - ISO (NFS) //had to create this one because oVirt 4.3.3.1 would not let me upload disk images to a data domain without an ISO (I think this is due to a bug) 3. Each volume is of the type “Distributed Replicate” and each one is composed of 9 bricks. I started with 3 bricks per volume due to the initial Hyperconverged setup, then I expanded the cluster and the gluster cluster by 3 hosts at a time until I got to a total of 9 hosts. - *Disks, bricks and sizes used per volume / dev/sdb engine 100GB / dev/sdb vmstore1 2600GB / dev/sdc data1 2600GB / dev/sdd data2 2600GB / dev/sde -------- 400GB SSD Used for caching purposes From the above layout a few questions came up:* 1. *Using the web UI, How can I create a 100GB brick and a 2600GB brick to replace the bad bricks for “engine” and “vmstore1” within the same block device (sdb) ? What about / dev/sde (caching disk), When I tried creating a new brick thru the UI I saw that I could use / dev/sde for caching but only for 1 brick (i.e. vmstore1) so if I try to create another brick how would I specify it is the same / dev/sde device to be used for caching?* 1. If I want to remove a brick and it being a replica 3, I go to storage > Volumes > select the volume > bricks once in there I can select the 3 servers that compose the replicated bricks and click remove, this gives a pop-up window with the following info: Are you sure you want to remove the following Brick(s)? - vmm11:/gluster_bricks/vmstore1/vmstore1 - vmm12.virt.iad3p:/gluster_bricks/vmstore1/vmstore1 - 192.168.0.100:/gluster-bricks/vmstore1/vmstore1 - Migrate Data from the bricks? If I proceed with this that means I will have to do this for all the 4 volumes, that is just not very efficient, but if that is the only way, then I am hesitant to put this into a real production environment as there is no way I can take that kind of a hit for +500 vms :) and also I wont have that much storage or extra volumes to play with in a real sceneario. 2. After modifying yesterday */ etc/vdsm/vdsm.id <http://vdsm.id> by following (https://stijn.tintel.eu/blog/2013/03/02/ovirt-problem-duplicate-uuids <https://stijn.tintel.eu/blog/2013/03/02/ovirt-problem-duplicate-uuids>) I was able to add the server **back **to the cluster using a new fqdn and a new IP, and tested replacing one of the bricks and this is my mistake as mentioned in #3 above I used / dev/sdb entirely for 1 brick because thru the UI I could not separate the block device and be used for 2 bricks (one for the engine and one for vmstore1). **So in the “gluster vol info” you might see vmm102.mydomain.com <http://vmm102.mydomain.com> * *but in reality it is myhost1.mydomain.com <http://myhost1.mydomain.com> * 3. *I am also attaching gluster_peer_status.txt * *and in the last 2 entries of that file you will see and entry vmm10.mydomain.com <http://vmm10.mydomain.com> (old/bad entry) and vmm102.mydomain.com <http://vmm102.mydomain.com> (new entry, same server vmm10, but renamed to vmm102). * *Also please find gluster_vol_info.txt file. * 4. *I am ready * *to redeploy this environment if needed, but I am also ready to test any other suggestion. If I can get a good understanding on how to recover from this I will be ready to move to production. * 5. *Wondering if you’d be willing to have a look at my setup through a shared screen? * *Thanks * *Adrian* On Mon, Jun 10, 2019 at 11:41 PM Strahil <hunter86...@yahoo.com> wrote: > Hi Adrian, > > You have several options: > A) If you have space on another gluster volume (or volumes) or on > NFS-based storage, you can migrate all VMs live . Once you do it, the > simple way will be to stop and remove the storage domain (from UI) and > gluster volume that correspond to the problematic brick. Once gone, you > can remove the entry in oVirt for the old host and add the newly built > one.Then you can recreate your volume and migrate the data back. > > B) If you don't have space you have to use a more riskier approach > (usually it shouldn't be risky, but I had bad experience in gluster v3): > - New server has same IP and hostname: > Use command line and run the 'gluster volume reset-brick VOLNAME > HOSTNAME:BRICKPATH HOSTNAME:BRICKPATH commit' > Replace VOLNAME with your volume name. > A more practical example would be: > 'gluster volume reset-brick data ovirt3:/gluster_bricks/data/brick > ovirt3:/gluster_ ricks/data/brick commit' > > If it refuses, then you have to cleanup '/gluster_bricks/data' (which > should be empty). > Also check if the new peer has been probed via 'gluster peer status'.Check > the firewall is allowing gluster communication (you can compare it to the > firewalls on another gluster host). > > The automatic healing will kick in 10 minutes (if it succeeds) and will > stress the other 2 replicas, so pick your time properly. > Note: I'm not recommending you to use the 'force' option in the previous > command ... for now :) > > - The new server has a different IP/hostname: > Instead of 'reset-brick' you can use 'replace-brick': > It should be like this: > gluster volume replace-brick data old-server:/path/to/brick > new-server:/new/path/to/brick commit force > > In both cases check the status via: > gluster volume info VOLNAME > > If your cluster is in production , I really recommend you the first option > as it is less risky and the chance for unplanned downtime will be minimal. > > The 'reset-brick' in your previous e-mail shows that one of the servers > is not connected. Check peer status on all servers, if they are less than > they should check for network and/or firewall issues. > On the new node check if glusterd is enabled and running. > > In order to debug - you should provide more info like 'gluster volume > info' and the peer status from each node. > > Best Regards, > Strahil Nikolov > > On Jun 10, 2019 20:10, Adrian Quintero <adrianquint...@gmail.com> wrote: > > > > > Can you let me know how to fix the gluster and missing brick?, > > I tried removing it by going to "storage > Volumes > vmstore > bricks > > selected the brick > > However it is showing as an unknown status (which is expected because > the server was completely wiped) so if I try to "remove", "replace brick" > or "reset brick" it wont work > > If i do remove brick: Incorrect bricks selected for removal in > Distributed Replicate volume. Either all the selected bricks should be from > the same sub volume or one brick each for every sub volume! > > If I try "replace brick" I cant because I dont have another server with > extra bricks/disks > > And if I try "reset brick": Error while executing action Start Gluster > Volume Reset Brick: Volume reset brick commit force failed: rc=-1 out=() > err=['Host myhost1_mydomain_com not connected'] > > > > Are you suggesting to try and fix the gluster using command line? > > > > Note that I cant "peer detach" the sever , so if I force the removal > of the bricks would I need to force downgrade to replica 2 instead of 3? > what would happen to oVirt as it only supports replica 3? > > > > thanks again. > > > > On Mon, Jun 10, 2019 at 12:52 PM Strahil <hunter86...@yahoo.com> wrote: > > >> > >> Hi Adrian, > >> Did you fix the issue with the gluster and the missing brick? > >> If yes, try to set the 'old' host in maintenance an > > -- Adrian Quintero
Volume Name: data1 Type: Distributed-Replicate Volume ID: a953be2a-a23f-4425-bf61-1a27fa029975 Status: Started Snapshot Count: 0 Number of Bricks: 3 x 3 = 9 Transport-type: tcp Bricks: Brick1: vmm10.mydomain.com.:/gluster_bricks/data1/data1 Brick2: vmm11.mydomain.com:/gluster_bricks/data1/data1 Brick3: vmm12.mydomain.com:/gluster_bricks/data1/data1 Brick4: vmm13.mydomain.com:/gluster_bricks/data1/data1 Brick5: vmm14.mydomain.com:/gluster_bricks/data1/data1 Brick6: vmm15.mydomain.com:/gluster_bricks/data1/data1 Brick7: vmm16.mydomain.com:/gluster_bricks/data1/data1 Brick8: vmm17.mydomain.com:/gluster_bricks/data1/data1 Brick9: vmm18.mydomain.com:/gluster_bricks/data1/data1 Options Reconfigured: cluster.granular-entry-heal: enable storage.owner-gid: 36 storage.owner-uid: 36 network.ping-timeout: 30 cluster.choose-local: off user.cifs: off features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: off performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off performance.strict-o-direct: on transport.address-family: inet nfs.disable: on performance.client-io-threads: off Volume Name: data2 Type: Distributed-Replicate Volume ID: b5254bbb-a6a1-4f79-9513-d01f24331d03 Status: Started Snapshot Count: 0 Number of Bricks: 3 x 3 = 9 Transport-type: tcp Bricks: Brick1: vmm10.mydomain.com.:/gluster_bricks/data2/data2 Brick2: vmm11.mydomain.com:/gluster_bricks/data2/data2 Brick3: vmm12.mydomain.com:/gluster_bricks/data2/data2 Brick4: vmm13.mydomain.com:/gluster_bricks/data2/data2 Brick5: vmm14.mydomain.com:/gluster_bricks/data2/data2 Brick6: vmm15.mydomain.com:/gluster_bricks/data2/data2 Brick7: vmm16.mydomain.com:/gluster_bricks/data2/data2 Brick8: vmm17.mydomain.com:/gluster_bricks/data2/data2 Brick9: vmm18.mydomain.com:/gluster_bricks/data2/data2 Options Reconfigured: performance.client-io-threads: off nfs.disable: on transport.address-family: inet performance.strict-o-direct: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: off cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off cluster.choose-local: off network.ping-timeout: 30 storage.owner-uid: 36 storage.owner-gid: 36 cluster.granular-entry-heal: enable Volume Name: engine Type: Distributed-Replicate Volume ID: e89321ed-bf10-4d24-a376-f86656b3d65c Status: Started Snapshot Count: 0 Number of Bricks: 3 x 3 = 9 Transport-type: tcp Bricks: Brick1: vmm10.mydomain.com.:/gluster_bricks/engine/engine Brick2: vmm11.mydomain.com:/gluster_bricks/engine/engine Brick3: vmm12.mydomain.com:/gluster_bricks/engine/engine Brick4: vmm13.mydomain.com:/gluster_bricks/engine/engine Brick5: vmm14.mydomain.com:/gluster_bricks/engine/engine Brick6: vmm15.mydomain.com:/gluster_bricks/engine/engine Brick7: vmm16.mydomain.com:/gluster_bricks/engine/engine Brick8: vmm17.mydomain.com:/gluster_bricks/engine/engine Brick9: vmm18.mydomain.com:/gluster_bricks/engine/engine Options Reconfigured: performance.client-io-threads: off nfs.disable: on transport.address-family: inet performance.strict-o-direct: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: off cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off cluster.choose-local: off network.ping-timeout: 30 storage.owner-uid: 36 storage.owner-gid: 36 cluster.granular-entry-heal: enable Volume Name: vmstore1 Type: Distributed-Replicate Volume ID: 19c4d170-3b79-44c4-8dbd-20dc49beb8b2 Status: Started Snapshot Count: 0 Number of Bricks: 3 x 3 = 9 Transport-type: tcp Bricks: Brick1: 192.168.0.100:/gluster-bricks/vmstore1/vmstore1 Brick2: vmm11.mydomain.com:/gluster_bricks/vmstore1/vmstore1 Brick3: vmm12.mydomain.com:/gluster_bricks/vmstore1/vmstore1 Brick4: vmm13.mydomain.com:/gluster_bricks/vmstore1/vmstore1 Brick5: vmm14.mydomain.com:/gluster_bricks/vmstore1/vmstore1 Brick6: vmm15.mydomain.com:/gluster_bricks/vmstore1/vmstore1 Brick7: vmm16.mydomain.com:/gluster_bricks/vmstore1/vmstore1 Brick8: vmm17.mydomain.com:/gluster_bricks/vmstore1/vmstore1 Brick9: vmm18.mydomain.com:/gluster_bricks/vmstore1/vmstore1 Options Reconfigured: performance.client-io-threads: off nfs.disable: on transport.address-family: inet performance.strict-o-direct: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: off cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off cluster.choose-local: off network.ping-timeout: 30 storage.owner-uid: 36 storage.owner-gid: 36 cluster.granular-entry-heal: enable
Number of Peers: 9 Hostname: vmm12.mydomain.com Uuid: 2c86fa95-67a2-492d-abf0-54da625417f8 State: Peer in Cluster (Connected) Other names: 192.168.0.4 172.26.0.26 Hostname: vmm13.mydomain.com Uuid: ab099e72-0f56-4d33-a16b-ba67d67bdf9d State: Peer in Cluster (Connected) Other names: 172.26.0.27 Hostname: vmm14.mydomain.com Uuid: c35ad74d-1f83-4032-a459-079a27175ee4 State: Peer in Cluster (Connected) Other names: 172.26.0.28 Hostname: vmm17.mydomain.com Uuid: aeb7712a-e74e-4492-b6af-9c266d69bfd3 State: Peer in Cluster (Connected) Other names: 192.168.0.9 172.26.0.32 Hostname: vmm16.mydomain.com Uuid: 4476d434-d6ff-480f-b3f1-d976f642df9c State: Peer in Cluster (Connected) Other names: 192.168.0.8 172.26.0.31 Hostname: vmm15.mydomain.com Uuid: 22ec0c0a-a5fc-431c-9f32-8b17fcd80298 State: Peer in Cluster (Connected) Other names: 172.26.0.29 Hostname: vmm18.mydomain.com Uuid: caf84e9f-3e03-4e6f-b0f8-4c5ecec4bef6 State: Peer in Cluster (Connected) Other names: 192.168.0.10 172.26.0.33 Hostname: vmm10.mydomain.com Uuid: 18385970-aba6-4fd1-85a6-1b13f663e60b State: Peer in Cluster (Disconnected) Other names: 192.168.0.2 192.168.0.21 172.26.0.4 Hostname: vmm102.mydomain.com Uuid: b152fd82-8213-451f-93c6-353e96aa3be9 State: Peer in Cluster (Connected) Other names: 192.168.0.100
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/RIDY2VGXOBTBDJDRQSTXXBP5JDSDT3E7/