adding gluster pool list: UUID Hostname State 2c86fa95-67a2-492d-abf0-54da625417f8 vmm12.mydomain.com Connected ab099e72-0f56-4d33-a16b-ba67d67bdf9d vmm13.mydomain.com Connected c35ad74d-1f83-4032-a459-079a27175ee4 vmm14.mydomain.com Connected aeb7712a-e74e-4492-b6af-9c266d69bfd3 vmm17.mydomain.com Connected 4476d434-d6ff-480f-b3f1-d976f642df9c vmm16.mydomain.com Connected 22ec0c0a-a5fc-431c-9f32-8b17fcd80298 vmm15.mydomain.com Connected caf84e9f-3e03-4e6f-b0f8-4c5ecec4bef6 vmm18.mydomain.com Connected 18385970-aba6-4fd1-85a6-1b13f663e60b vmm10.mydomain.com * Disconnected //server that went bad.* b152fd82-8213-451f-93c6-353e96aa3be9 vmm102.mydomain.com Connected //vmm10 but with different name 228a9282-c04e-4229-96a6-67cb47629892 localhost Connected
On Tue, Jun 11, 2019 at 11:24 AM Adrian Quintero <[email protected]> wrote: > Strahil, > > Looking at your suggestions I think I need to provide a bit more info on > my current setup. > > > > 1. > > I have 9 hosts in total > 2. > > I have 5 storage domains: > - > > hosted_storage (Data Master) > - > > vmstore1 (Data) > - > > data1 (Data) > - > > data2 (Data) > - > > ISO (NFS) //had to create this one because oVirt 4.3.3.1 would not > let me upload disk images to a data domain without an ISO (I think this > is > due to a bug) > > 3. > > Each volume is of the type “Distributed Replicate” and each one is > composed of 9 bricks. > I started with 3 bricks per volume due to the initial Hyperconverged > setup, then I expanded the cluster and the gluster cluster by 3 hosts at a > time until I got to a total of 9 hosts. > > > - > > > > > > > > > *Disks, bricks and sizes used per volume / dev/sdb engine 100GB / dev/sdb > vmstore1 2600GB / dev/sdc data1 2600GB / dev/sdd data2 2600GB / dev/sde > -------- 400GB SSD Used for caching purposes From the above layout a few > questions came up:* > 1. > > > > *Using the web UI, How can I create a 100GB brick and a 2600GB brick to > replace the bad bricks for “engine” and “vmstore1” within the same > block > device (sdb) ? What about / dev/sde (caching disk), When I tried > creating a > new brick thru the UI I saw that I could use / dev/sde for caching > but only > for 1 brick (i.e. vmstore1) so if I try to create another brick how > would I > specify it is the same / dev/sde device to be used for caching?* > > > > 1. > > If I want to remove a brick and it being a replica 3, I go to storage > > Volumes > select the volume > bricks once in there I can select the 3 > servers that compose the replicated bricks and click remove, this gives a > pop-up window with the following info: > > Are you sure you want to remove the following Brick(s)? > - vmm11:/gluster_bricks/vmstore1/vmstore1 > - vmm12.virt.iad3p:/gluster_bricks/vmstore1/vmstore1 > - 192.168.0.100:/gluster-bricks/vmstore1/vmstore1 > - Migrate Data from the bricks? > > If I proceed with this that means I will have to do this for all the 4 > volumes, that is just not very efficient, but if that is the only way, then > I am hesitant to put this into a real production environment as there is no > way I can take that kind of a hit for +500 vms :) and also I wont have > that much storage or extra volumes to play with in a real sceneario. > > 2. > > After modifying yesterday */ etc/vdsm/vdsm.id <http://vdsm.id> by > following > (https://stijn.tintel.eu/blog/2013/03/02/ovirt-problem-duplicate-uuids > <https://stijn.tintel.eu/blog/2013/03/02/ovirt-problem-duplicate-uuids>) I > was able to add the server **back **to the cluster using a new fqdn > and a new IP, and tested replacing one of the bricks and this is my mistake > as mentioned in #3 above I used / dev/sdb entirely for 1 brick because thru > the UI I could not separate the block device and be used for 2 bricks (one > for the engine and one for vmstore1). **So in the “gluster vol info” > you might see vmm102.mydomain.com <http://vmm102.mydomain.com> * > *but in reality it is myhost1.mydomain.com <http://myhost1.mydomain.com> * > 3. > > *I am also attaching gluster_peer_status.txt * *and in the last 2 > entries of that file you will see and entry vmm10.mydomain.com > <http://vmm10.mydomain.com> (old/bad entry) and vmm102.mydomain.com > <http://vmm102.mydomain.com> (new entry, same server vmm10, but renamed to > vmm102). * > *Also please find gluster_vol_info.txt file. * > 4. > > *I am ready * > *to redeploy this environment if needed, but I am also ready to test any > other suggestion. If I can get a good understanding on how to recover from > this I will be ready to move to production. * > 5. > > > > *Wondering if you’d be willing to have a look at my setup through a shared > screen? * > > *Thanks * > > > *Adrian* > > On Mon, Jun 10, 2019 at 11:41 PM Strahil <[email protected]> wrote: > >> Hi Adrian, >> >> You have several options: >> A) If you have space on another gluster volume (or volumes) or on >> NFS-based storage, you can migrate all VMs live . Once you do it, the >> simple way will be to stop and remove the storage domain (from UI) and >> gluster volume that correspond to the problematic brick. Once gone, you >> can remove the entry in oVirt for the old host and add the newly built >> one.Then you can recreate your volume and migrate the data back. >> >> B) If you don't have space you have to use a more riskier approach >> (usually it shouldn't be risky, but I had bad experience in gluster v3): >> - New server has same IP and hostname: >> Use command line and run the 'gluster volume reset-brick VOLNAME >> HOSTNAME:BRICKPATH HOSTNAME:BRICKPATH commit' >> Replace VOLNAME with your volume name. >> A more practical example would be: >> 'gluster volume reset-brick data ovirt3:/gluster_bricks/data/brick >> ovirt3:/gluster_ ricks/data/brick commit' >> >> If it refuses, then you have to cleanup '/gluster_bricks/data' (which >> should be empty). >> Also check if the new peer has been probed via 'gluster peer >> status'.Check the firewall is allowing gluster communication (you can >> compare it to the firewalls on another gluster host). >> >> The automatic healing will kick in 10 minutes (if it succeeds) and will >> stress the other 2 replicas, so pick your time properly. >> Note: I'm not recommending you to use the 'force' option in the previous >> command ... for now :) >> >> - The new server has a different IP/hostname: >> Instead of 'reset-brick' you can use 'replace-brick': >> It should be like this: >> gluster volume replace-brick data old-server:/path/to/brick >> new-server:/new/path/to/brick commit force >> >> In both cases check the status via: >> gluster volume info VOLNAME >> >> If your cluster is in production , I really recommend you the first >> option as it is less risky and the chance for unplanned downtime will be >> minimal. >> >> The 'reset-brick' in your previous e-mail shows that one of the servers >> is not connected. Check peer status on all servers, if they are less than >> they should check for network and/or firewall issues. >> On the new node check if glusterd is enabled and running. >> >> In order to debug - you should provide more info like 'gluster volume >> info' and the peer status from each node. >> >> Best Regards, >> Strahil Nikolov >> >> On Jun 10, 2019 20:10, Adrian Quintero <[email protected]> wrote: >> >> > >> > Can you let me know how to fix the gluster and missing brick?, >> > I tried removing it by going to "storage > Volumes > vmstore > bricks > >> selected the brick >> > However it is showing as an unknown status (which is expected because >> the server was completely wiped) so if I try to "remove", "replace brick" >> or "reset brick" it wont work >> > If i do remove brick: Incorrect bricks selected for removal in >> Distributed Replicate volume. Either all the selected bricks should be from >> the same sub volume or one brick each for every sub volume! >> > If I try "replace brick" I cant because I dont have another server with >> extra bricks/disks >> > And if I try "reset brick": Error while executing action Start Gluster >> Volume Reset Brick: Volume reset brick commit force failed: rc=-1 out=() >> err=['Host myhost1_mydomain_com not connected'] >> > >> > Are you suggesting to try and fix the gluster using command line? >> > >> > Note that I cant "peer detach" the sever , so if I force the removal >> of the bricks would I need to force downgrade to replica 2 instead of 3? >> what would happen to oVirt as it only supports replica 3? >> > >> > thanks again. >> > >> > On Mon, Jun 10, 2019 at 12:52 PM Strahil <[email protected]> wrote: >> >> >> >> >> Hi Adrian, >> >> Did you fix the issue with the gluster and the missing brick? >> >> If yes, try to set the 'old' host in maintenance an >> >> > > -- > Adrian Quintero > -- Adrian Quintero
_______________________________________________ Users mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/GWTF2PJ7FHPIKIFLRXCR35AC7HMCSTTJ/

