----- Original Message ----- > From: "Ted Miller" <tmil...@hcjb.org> > To: "Federico Simoncelli" <fsimo...@redhat.com>, "Itamar Heim" > <ih...@redhat.com> > Cc: users@ovirt.org > Sent: Monday, January 27, 2014 7:16:14 PM > Subject: Re: [Users] Data Center stuck between "Non Responsive" and > "Contending" > > > On 1/27/2014 3:47 AM, Federico Simoncelli wrote: > > Maybe someone from gluster can identify easily what happened. Meanwhile if > > you just want to repair your data-center you could try with: > > > > $ cd > > > > /rhev/data-center/mnt/glusterSD/10.41.65.2\:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ > > $ touch ids > > $ sanlock direct init -s > > 0322a407-2b16-40dc-ac67-13d387c6eb4c:0:ids:1048576 > Federico, > > I won't be able to do anything to the ovirt setup for another 5 hours or so > (it is a trial system I am working on at home, I am at work), but I will try > your repair script and report back. > > In bugzilla 862975 they suggested turning off write-behind caching and "eager > locking" on the gluster volume to avoid/reduce the problems that come from > many different computers all writing to the same file(s) on a very frequent > basis. If I interpret the comment in the bug correctly, it did seem to help > in that situation. My situation is a little different. My gluster setup is > replicate only, replica 3 (though there are only two hosts). I was not > stress-testing it, I was just using it, trying to figure out how I can import > some old VMWare VMs without an ESXi server to run them on.
Have you done anything similar to what is described here in comment 21? https://bugzilla.redhat.com/show_bug.cgi?id=859589#c21 When did you realize that you weren't able to use the data-center anymore? Can you describe exactly what you did and what happened, for example: 1. I created the data center (up and running) 2. I tried to import some VMs from VMWare 3. During the import (or after it) the data-center went in the contending state ... Did something special happened? I don't know, power loss, split-brain? For example also an excessive load on one of the servers could have triggered a timeout somewhere (forcing the data-center to go back in the contending state). Could you check if any host was fenced? (Forcibly rebooted) > I am guessing that what makes cluster storage have the (Master) designation > is that this is the one that actually contains the sanlocks? If so, would it > make sense to set up a gluster volume to be (Master), but not use it for VM > storage, just for storing the sanlock info? Separate gluster volume(s) could > then have the VMs on it(them), and would not need the optimizations turned > off. Any domain must be able to become the master at any time. Without a master the data center is unusable (at the present time), that's why we migrate (or reconstruct) it on another domain when necessary. -- Federico _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users