Hi Liron, I've reproduced the issue with a fresh deployment of oVirt 3.5.2rc. I've provided you with new screencasts and relevant logs for both cases (see inline comments):
screencast for case 1: https://www.dropbox.com/s/fdrcwmpy03v5xri/Screencast%20from%2004-03-2015%2010%3A53%3A32.webm?dl=1 screencast for case 2: https://www.dropbox.com/s/w72bf86n9v2pvdw/Screencast%20from%2004-03-2015%2015%3A18%3A45.webm?dl=1 logs for case 1: https://www.dropbox.com/sh/bl24umw0w1anclb/AAC0Oq7c6oXWetw-tp-55c37a?dl=0 logs for case 2: https://www.dropbox.com/sh/rp3pdda68nox099/AABtZGKDfFCH3sD6FZPvxRmEa?dl=0 Please note that I'm using different networks for Management (192.168.48.0/24) and GlusterFS replica (192.168.50.0/24): management FQDN GlusterFS FQDN node 1: s20.ovirt.prisma s20gfs.ovirt.prisma node 2: s21.ovirt.prisma s21gfs.ovirt.prisma On dom, 2015-03-01 at 04:55 -0500, Liron Aravot wrote: > Hi Stefano, > thanks for the great input! > > I went over the logs (is the screencast uses the same domains? i don't have > the logs from that run) - the master domain deactivation (and the master role > migration to the new domain) fails with the error to copy the master fs > content to the new domain on tar copy (see on [1] the error). > > 1. Is there a chance that there is any problem inconsistent storage access > problem to any of the domains? Storage domains rely on GlusterFS volumes created on purpose. VMs runs correctly. > 2. Does the issue reproduces always or only in some of the runs? The issue reproduces always but: case 1) if DATA and DATA_NEW are both created pointing to s20gfs the issue reproduces and Master role changes (Screencast 1). case 2) if DATA is pointing to s20 and DATA_NEW to s20gfs the issue reproduces and Muster roles flips but does not change (Screencast 2). > 3. Have you tried to run a operation that creates a task? a creation of a > disk for example. Every operations like creating or moving a disk are working correctly. > > thanks, > Liron. > > > > [1]: > Thread-9875::DEBUG::2015-02-25 > 15:06:57,969::clusterlock::349::Storage.SANLock::(release) Cluster lock for > domain 08298f60-4919-4f86-9233-827c1089779a success > fully released > Thread-9875::ERROR::2015-02-25 > 15:06:57,969::task::866::Storage.TaskManager.Task::(_setError) > Task=`2a434209-3e96-4d1e-8d1b-8c7463889f6a`::Unexpected error > Traceback (most recent call last): > File "/usr/share/vdsm/storage/task.py", line 873, in _run > return fn(*args, **kargs) > File "/usr/share/vdsm/logUtils.py", line 45, in wrapper > res = f(*args, **kwargs) > File "/usr/share/vdsm/storage/hsm.py", line 1246, in deactivateStorageDomain > pool.deactivateSD(sdUUID, msdUUID, masterVersion) > File "/usr/share/vdsm/storage/securable.py", line 77, in wrapper > return method(self, *args, **kwargs) > File "/usr/share/vdsm/storage/sp.py", line 1097, in deactivateSD > self.masterMigrate(sdUUID, newMsdUUID, masterVersion) > File "/usr/share/vdsm/storage/securable.py", line 77, in wrapper > return method(self, *args, **kwargs) > File "/usr/share/vdsm/storage/sp.py", line 816, in masterMigrate > exclude=('./lost+found',)) > File "/usr/share/vdsm/storage/fileUtils.py", line 68, in tarCopy > raise TarCopyFailed(tsrc.returncode, tdst.returncode, out, err) > TarCopyFailed: (1, 0, '', '') > Thread-9875::DEBUG::2015-02-25 > 15:06:57,969::task::885::Storage.TaskManager.Task::(_run) > Task=`2a434209-3e96-4d1e-8d1b-8c7463889f6a`::Task._run: 2a434209-3e96 > -4d1e-8d1b-8c7463889f6a ('62a034ca-63df-44f2-9a87-735ddd257a6b', > '00000002-0002-0002-0002-00000000022f', > '08298f60-4919-4f86-9233-827c1089779a', 34) {} failed > - stopping task > > ----- Original Message ----- > > From: "Stefano Stagnaro" <stefa...@prisma-eng.com> > > To: "Vered Volansky" <ve...@redhat.com> > > Cc: users@ovirt.org > > Sent: Friday, February 27, 2015 4:54:31 PM > > Subject: Re: [ovirt-users] Sync Error on Master Domain after adding a > > second one > > > > I think I finally managed to replicate the problem: > > > > 1. deploy a datacenter with a virt only cluster and a gluster only cluster > > 2. create a first GlusterFS Storage Domain (e.g. DATA) and activate it > > (should become Master) > > 3. create a second GlusterFS Storage Domain (e.g. DATA_NEW) and activate it > > 4. put DATA in maintenance > > > > Both Storage Domains flows between the following states: > > https://www.dropbox.com/s/x542q1epf40ar5p/Screencast%20from%2027-02-2015%2015%3A09%3A29.webm?dl=1 > > > > Webadmin Events shows: "Sync Error on Master Domain between Host v10 and > > oVirt Engine. Domain: DATA is marked as Master in oVirt Engine database but > > not on the Storage side. Please consult with Support on how to fix this > > issue." > > > > It seems DATA can be deactivated at the second attempt. > > > > -- > > Stefano Stagnaro > > > > Prisma Engineering S.r.l. > > Via Petrocchi, 4 > > 20127 Milano – Italy > > > > Tel. 02 26113507 int 339 > > e-mail: stefa...@prisma-eng.com > > skype: stefano.stagnaro > > > > On mer, 2015-02-25 at 15:41 +0100, Stefano Stagnaro wrote: > > > This is what I've done basically: > > > > > > 1. added a new data domain (DATA_R3); > > > 2. activated the new data domain - both domains in "active" state; > > > 3. moved Disks from DATA to DATA_R3; > > > 4. tried to put the old data domain in maintenance (from webadmin or > > > shell); > > > 5. both domains became inactive; > > > 6. DATA_R3 came back in "active"; > > > 7. DATA domain went in "being initialized"; > > > 8. Webadmin shows the error "Sync Error on Master Domain between..."; > > > 9. DATA domain completed the reconstruction and came back in "active". > > > > > > Please find engine and vdsm logs here: > > > https://www.dropbox.com/sh/uuwwo8sxcg4ffqp/AAAx6UrwI3jbsN4oraJuDx9Fa?dl=0 > > > > > > > > > > > _______________________________________________ > > Users mailing list > > Users@ovirt.org > > http://lists.ovirt.org/mailman/listinfo/users > > _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users