Re: [ovirt-users] Sync Error on Master Domain after adding a second one

2015-03-04 Thread Stefano Stagnaro
Hi Liron,

I've reproduced the issue with a fresh deployment of oVirt 3.5.2rc. I've 
provided you with new screencasts and relevant logs for both cases (see inline 
comments):

screencast for case 1: 
https://www.dropbox.com/s/fdrcwmpy03v5xri/Screencast%20from%2004-03-2015%2010%3A53%3A32.webm?dl=1
screencast for case 2: 
https://www.dropbox.com/s/w72bf86n9v2pvdw/Screencast%20from%2004-03-2015%2015%3A18%3A45.webm?dl=1
logs for case 1: 
https://www.dropbox.com/sh/bl24umw0w1anclb/AAC0Oq7c6oXWetw-tp-55c37a?dl=0
logs for case 2: 
https://www.dropbox.com/sh/rp3pdda68nox099/AABtZGKDfFCH3sD6FZPvxRmEa?dl=0

Please note that I'm using different networks for Management (192.168.48.0/24) 
and GlusterFS replica (192.168.50.0/24):

management FQDN GlusterFS FQDN
node 1: s20.ovirt.prismas20gfs.ovirt.prisma
node 2: s21.ovirt.prismas21gfs.ovirt.prisma

On dom, 2015-03-01 at 04:55 -0500, Liron Aravot wrote:
> Hi Stefano,
> thanks for the great input!
> 
> I went over the logs (is the screencast uses the same domains? i don't have 
> the logs from that run) - the master domain deactivation (and the master role 
> migration to the new domain) fails with the error to copy the master fs 
> content to the new domain on tar copy (see on [1] the error).
> 
> 1. Is there a chance that there is any problem inconsistent storage access 
> problem to any of the domains?
Storage domains rely on GlusterFS volumes created on purpose. VMs runs 
correctly.

> 2. Does the issue reproduces always or only in some of the runs?
The issue reproduces always but:
case 1) if DATA and DATA_NEW are both created pointing to s20gfs the issue 
reproduces and Master role changes (Screencast 1).
case 2) if DATA is pointing to s20 and DATA_NEW to s20gfs the issue reproduces 
and Muster roles flips but does not change (Screencast 2).

> 3. Have you tried to run a operation that creates a task? a creation of a 
> disk for example.
Every operations like creating or moving a disk are working correctly.

> 
> thanks,
> Liron.
> 
> 
> 
> [1]:
> Thread-9875::DEBUG::2015-02-25 
> 15:06:57,969::clusterlock::349::Storage.SANLock::(release) Cluster lock for 
> domain 08298f60-4919-4f86-9233-827c1089779a success
> fully released
> Thread-9875::ERROR::2015-02-25 
> 15:06:57,969::task::866::Storage.TaskManager.Task::(_setError) 
> Task=`2a434209-3e96-4d1e-8d1b-8c7463889f6a`::Unexpected error
> Traceback (most recent call last):
>   File "/usr/share/vdsm/storage/task.py", line 873, in _run
> return fn(*args, **kargs)
>   File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
> res = f(*args, **kwargs)
>   File "/usr/share/vdsm/storage/hsm.py", line 1246, in deactivateStorageDomain
> pool.deactivateSD(sdUUID, msdUUID, masterVersion)
>   File "/usr/share/vdsm/storage/securable.py", line 77, in wrapper
> return method(self, *args, **kwargs)
>   File "/usr/share/vdsm/storage/sp.py", line 1097, in deactivateSD
> self.masterMigrate(sdUUID, newMsdUUID, masterVersion)
>   File "/usr/share/vdsm/storage/securable.py", line 77, in wrapper
> return method(self, *args, **kwargs)
>   File "/usr/share/vdsm/storage/sp.py", line 816, in masterMigrate
> exclude=('./lost+found',))
>   File "/usr/share/vdsm/storage/fileUtils.py", line 68, in tarCopy
> raise TarCopyFailed(tsrc.returncode, tdst.returncode, out, err)
> TarCopyFailed: (1, 0, '', '')
> Thread-9875::DEBUG::2015-02-25 
> 15:06:57,969::task::885::Storage.TaskManager.Task::(_run) 
> Task=`2a434209-3e96-4d1e-8d1b-8c7463889f6a`::Task._run: 2a434209-3e96
> -4d1e-8d1b-8c7463889f6a ('62a034ca-63df-44f2-9a87-735ddd257a6b', 
> '0002-0002-0002-0002-0000022f', 
> '08298f60-4919-4f86-9233-827c1089779a', 34) {} failed
>  - stopping task
> 
> - Original Message -
> > From: "Stefano Stagnaro" 
> > To: "Vered Volansky" 
> > Cc: users@ovirt.org
> > Sent: Friday, February 27, 2015 4:54:31 PM
> > Subject: Re: [ovirt-users] Sync Error on Master Domain after adding a 
> > second one
> > 
> > I think I finally managed to replicate the problem:
> > 
> > 1. deploy a datacenter with a virt only cluster and a gluster only cluster
> > 2. create a first GlusterFS Storage Domain (e.g. DATA) and activate it
> > (should become Master)
> > 3. create a second GlusterFS Storage Domain (e.g. DATA_NEW) and activate it
> > 4. put DATA in maintenance
> > 
> > Both Storage Domains flows between the following states:
> > https://www.dropbox.com/s/x542q1epf40ar5p/Screencast%20from%2027-02-2015%2015%3A09%3A29.webm?dl=1
> > 
>

Re: [ovirt-users] Sync Error on Master Domain after adding a second one

2015-03-01 Thread Stefano Stagnaro
I've double checked and the correct flow from point 5 is:

DATA --> Preparing for maintenance --> Unknown --> Data Center is being 
initialized --> Active

DATA_R3 --> Active --> Locked --> Unknown --> Active

The weird thing is that after few attempts, DATA is finally went in 
maintenance. Now I'm on 
ovirt-engine-3.5.3-0.0.master.20150226123132.gitbea0538.el6.noarch

I'll try to reproduce it from the beginning.

-- 
Stefano Stagnaro

Prisma Telecom Testing S.r.l.
Via Petrocchi, 4
20127 Milano – Italy

Tel. 02 26113507 int 339
e-mail: stefa...@prismatelecomtesting.com
skype: stefano.stagnaro

On mer, 2015-02-25 at 15:41 +0100, Stefano Stagnaro wrote:
> This is what I've done basically:
> 
> 1. added a new data domain (DATA_R3);
> 2. activated the new data domain - both domains in "active" state;
> 3. moved Disks from DATA to DATA_R3;
> 4. tried to put the old data domain in maintenance (from webadmin or shell);
> 5. both domains became inactive;
> 6. DATA_R3 came back in "active";
> 7. DATA domain went in "being initialized";
> 8. Webadmin shows the error "Sync Error on Master Domain between...";
> 9. DATA domain completed the reconstruction and came back in "active".
> 
> Please find engine and vdsm logs here: 
> https://www.dropbox.com/sh/uuwwo8sxcg4ffqp/AAAx6UrwI3jbsN4oraJuDx9Fa?dl=0
> 



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Sync Error on Master Domain after adding a second one

2015-03-01 Thread Liron Aravot
Hi Stefano,
thanks for the great input!

I went over the logs (is the screencast uses the same domains? i don't have the 
logs from that run) - the master domain deactivation (and the master role 
migration to the new domain) fails with the error to copy the master fs content 
to the new domain on tar copy (see on [1] the error).

1. Is there a chance that there is any problem inconsistent storage access 
problem to any of the domains?
2. Does the issue reproduces always or only in some of the runs?
3. Have you tried to run a operation that creates a task? a creation of a disk 
for example.

thanks,
Liron.



[1]:
Thread-9875::DEBUG::2015-02-25 
15:06:57,969::clusterlock::349::Storage.SANLock::(release) Cluster lock for 
domain 08298f60-4919-4f86-9233-827c1089779a success
fully released
Thread-9875::ERROR::2015-02-25 
15:06:57,969::task::866::Storage.TaskManager.Task::(_setError) 
Task=`2a434209-3e96-4d1e-8d1b-8c7463889f6a`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 1246, in deactivateStorageDomain
pool.deactivateSD(sdUUID, msdUUID, masterVersion)
  File "/usr/share/vdsm/storage/securable.py", line 77, in wrapper
return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1097, in deactivateSD
self.masterMigrate(sdUUID, newMsdUUID, masterVersion)
  File "/usr/share/vdsm/storage/securable.py", line 77, in wrapper
return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 816, in masterMigrate
exclude=('./lost+found',))
  File "/usr/share/vdsm/storage/fileUtils.py", line 68, in tarCopy
raise TarCopyFailed(tsrc.returncode, tdst.returncode, out, err)
TarCopyFailed: (1, 0, '', '')
Thread-9875::DEBUG::2015-02-25 
15:06:57,969::task::885::Storage.TaskManager.Task::(_run) 
Task=`2a434209-3e96-4d1e-8d1b-8c7463889f6a`::Task._run: 2a434209-3e96
-4d1e-8d1b-8c7463889f6a ('62a034ca-63df-44f2-9a87-735ddd257a6b', 
'0002-0002-0002-0002-022f', '08298f60-4919-4f86-9233-827c1089779a', 
34) {} failed
 - stopping task

- Original Message -----
> From: "Stefano Stagnaro" 
> To: "Vered Volansky" 
> Cc: users@ovirt.org
> Sent: Friday, February 27, 2015 4:54:31 PM
> Subject: Re: [ovirt-users] Sync Error on Master Domain after adding a second 
> one
> 
> I think I finally managed to replicate the problem:
> 
> 1. deploy a datacenter with a virt only cluster and a gluster only cluster
> 2. create a first GlusterFS Storage Domain (e.g. DATA) and activate it
> (should become Master)
> 3. create a second GlusterFS Storage Domain (e.g. DATA_NEW) and activate it
> 4. put DATA in maintenance
> 
> Both Storage Domains flows between the following states:
> https://www.dropbox.com/s/x542q1epf40ar5p/Screencast%20from%2027-02-2015%2015%3A09%3A29.webm?dl=1
> 
> Webadmin Events shows: "Sync Error on Master Domain between Host v10 and
> oVirt Engine. Domain: DATA is marked as Master in oVirt Engine database but
> not on the Storage side. Please consult with Support on how to fix this
> issue."
> 
> It seems DATA can be deactivated at the second attempt.
> 
> --
> Stefano Stagnaro
> 
> Prisma Engineering S.r.l.
> Via Petrocchi, 4
> 20127 Milano – Italy
> 
> Tel. 02 26113507 int 339
> e-mail: stefa...@prisma-eng.com
> skype: stefano.stagnaro
> 
> On mer, 2015-02-25 at 15:41 +0100, Stefano Stagnaro wrote:
> > This is what I've done basically:
> > 
> > 1. added a new data domain (DATA_R3);
> > 2. activated the new data domain - both domains in "active" state;
> > 3. moved Disks from DATA to DATA_R3;
> > 4. tried to put the old data domain in maintenance (from webadmin or
> > shell);
> > 5. both domains became inactive;
> > 6. DATA_R3 came back in "active";
> > 7. DATA domain went in "being initialized";
> > 8. Webadmin shows the error "Sync Error on Master Domain between...";
> > 9. DATA domain completed the reconstruction and came back in "active".
> > 
> > Please find engine and vdsm logs here:
> > https://www.dropbox.com/sh/uuwwo8sxcg4ffqp/AAAx6UrwI3jbsN4oraJuDx9Fa?dl=0
> > 
> 
> 
> 
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
> 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Sync Error on Master Domain after adding a second one

2015-02-27 Thread Stefano Stagnaro
I think I finally managed to replicate the problem:

1. deploy a datacenter with a virt only cluster and a gluster only cluster
2. create a first GlusterFS Storage Domain (e.g. DATA) and activate it (should 
become Master)
3. create a second GlusterFS Storage Domain (e.g. DATA_NEW) and activate it
4. put DATA in maintenance

Both Storage Domains flows between the following states: 
https://www.dropbox.com/s/x542q1epf40ar5p/Screencast%20from%2027-02-2015%2015%3A09%3A29.webm?dl=1

Webadmin Events shows: "Sync Error on Master Domain between Host v10 and oVirt 
Engine. Domain: DATA is marked as Master in oVirt Engine database but not on 
the Storage side. Please consult with Support on how to fix this issue."

It seems DATA can be deactivated at the second attempt.

-- 
Stefano Stagnaro

Prisma Engineering S.r.l.
Via Petrocchi, 4
20127 Milano – Italy

Tel. 02 26113507 int 339
e-mail: stefa...@prisma-eng.com
skype: stefano.stagnaro

On mer, 2015-02-25 at 15:41 +0100, Stefano Stagnaro wrote:
> This is what I've done basically:
> 
> 1. added a new data domain (DATA_R3);
> 2. activated the new data domain - both domains in "active" state;
> 3. moved Disks from DATA to DATA_R3;
> 4. tried to put the old data domain in maintenance (from webadmin or shell);
> 5. both domains became inactive;
> 6. DATA_R3 came back in "active";
> 7. DATA domain went in "being initialized";
> 8. Webadmin shows the error "Sync Error on Master Domain between...";
> 9. DATA domain completed the reconstruction and came back in "active".
> 
> Please find engine and vdsm logs here: 
> https://www.dropbox.com/sh/uuwwo8sxcg4ffqp/AAAx6UrwI3jbsN4oraJuDx9Fa?dl=0
> 



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Sync Error on Master Domain after adding a second one

2015-02-25 Thread Stefano Stagnaro
This is what I've done basically:

1. added a new data domain (DATA_R3);
2. activated the new data domain - both domains in "active" state;
3. moved Disks from DATA to DATA_R3;
4. tried to put the old data domain in maintenance (from webadmin or shell);
5. both domains became inactive;
6. DATA_R3 came back in "active";
7. DATA domain went in "being initialized";
8. Webadmin shows the error "Sync Error on Master Domain between...";
9. DATA domain completed the reconstruction and came back in "active".

Please find engine and vdsm logs here: 
https://www.dropbox.com/sh/uuwwo8sxcg4ffqp/AAAx6UrwI3jbsN4oraJuDx9Fa?dl=0

-- 
Stefano Stagnaro

Prisma Engineering S.r.l.
Via Petrocchi, 4
20127 Milano – Italy

Tel. 02 26113507 int 339
e-mail: stefa...@prisma-eng.com
skype: stefano.stagnaro


On mer, 2015-02-25 at 07:05 -0500, Vered Volansky wrote:
> Please specify the exact flow building to this error, in terms of adding a 
> new domain, the statuses of both domains when an operation is performed, etc.
> What are the statuses of both domains?
> 
> In case this is not the same issue, we'll need to have a look at the full 
> engine & vdsm logs.
> 
> - Original Message -
> > From: "Stefano Stagnaro" 
> > To: users@ovirt.org
> > Sent: Wednesday, February 25, 2015 1:18:41 PM
> > Subject: [ovirt-users] Sync Error on Master Domain after adding a second one
> > 
> > I'm testing oVirt 3.5.2 nightly with 1 host for engine, 2 for virt (v10,v11)
> > and 2 for GlusterFS (s20,s21). The Master Data Domain (named DATA) rely on
> > GlusterFS.
> > 
> > I've added second Data Domain (named DATA_R3) in order to switch the Master
> > role and remove the old one. Every time I try to put the old Data Domain in
> > maintenance I got the following error:
> > 
> > "Sync Error on Master Domain between Host v10 and oVirt Engine. Domain:
> > DATA_R3 is marked as Master in oVirt Engine database but not on the Storage
> > side. Please consult with Support on how to fix this issue."
> > 
> > Same error if I try to put DATA in maintenance from the shell:
> > 
> > # action storagedomain '62a034ca-63df-44f2-9a87-735ddd257a6b' deactivate
> > --datacenter-identifier '0002-0002-0002-0002-022f'
> > 
> > I cannot switch to the new Master Data Domain neither I can put the Data
> > Center in maintenance.
> > 
> > I'm not sure if it is related to bug 1183977.  I've already upgraded to
> > ovirt-engine-3.5.2-0.0.master.20150224122113.git410d88b.el6.noarch but the
> > problem still happen.
> > 
> > Thanks,
> > --
> > Stefano Stagnaro
> > 
> > Prisma Engineering S.r.l.
> > Via Petrocchi, 4
> > 20127 Milano – Italy
> > 
> > Tel. 02 26113507 int 339
> > e-mail: stefa...@prisma-eng.com
> > skype: stefano.stagnaro
> > 
> > ___
> > Users mailing list
> > Users@ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/users
> > 



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Sync Error on Master Domain after adding a second one

2015-02-25 Thread Vered Volansky
Please specify the exact flow building to this error, in terms of adding a new 
domain, the statuses of both domains when an operation is performed, etc.
What are the statuses of both domains?

In case this is not the same issue, we'll need to have a look at the full 
engine & vdsm logs.

- Original Message -
> From: "Stefano Stagnaro" 
> To: users@ovirt.org
> Sent: Wednesday, February 25, 2015 1:18:41 PM
> Subject: [ovirt-users] Sync Error on Master Domain after adding a second one
> 
> I'm testing oVirt 3.5.2 nightly with 1 host for engine, 2 for virt (v10,v11)
> and 2 for GlusterFS (s20,s21). The Master Data Domain (named DATA) rely on
> GlusterFS.
> 
> I've added second Data Domain (named DATA_R3) in order to switch the Master
> role and remove the old one. Every time I try to put the old Data Domain in
> maintenance I got the following error:
> 
> "Sync Error on Master Domain between Host v10 and oVirt Engine. Domain:
> DATA_R3 is marked as Master in oVirt Engine database but not on the Storage
> side. Please consult with Support on how to fix this issue."
> 
> Same error if I try to put DATA in maintenance from the shell:
> 
> # action storagedomain '62a034ca-63df-44f2-9a87-735ddd257a6b' deactivate
> --datacenter-identifier '0002-0002-0002-0002-022f'
> 
> I cannot switch to the new Master Data Domain neither I can put the Data
> Center in maintenance.
> 
> I'm not sure if it is related to bug 1183977.  I've already upgraded to
> ovirt-engine-3.5.2-0.0.master.20150224122113.git410d88b.el6.noarch but the
> problem still happen.
> 
> Thanks,
> --
> Stefano Stagnaro
> 
> Prisma Engineering S.r.l.
> Via Petrocchi, 4
> 20127 Milano – Italy
> 
> Tel. 02 26113507 int 339
> e-mail: stefa...@prisma-eng.com
> skype: stefano.stagnaro
> 
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
> 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Sync Error on Master Domain after adding a second one

2015-02-25 Thread Stefano Stagnaro
I'm testing oVirt 3.5.2 nightly with 1 host for engine, 2 for virt (v10,v11) 
and 2 for GlusterFS (s20,s21). The Master Data Domain (named DATA) rely on 
GlusterFS.

I've added second Data Domain (named DATA_R3) in order to switch the Master 
role and remove the old one. Every time I try to put the old Data Domain in 
maintenance I got the following error:

"Sync Error on Master Domain between Host v10 and oVirt Engine. Domain: DATA_R3 
is marked as Master in oVirt Engine database but not on the Storage side. 
Please consult with Support on how to fix this issue."

Same error if I try to put DATA in maintenance from the shell:

# action storagedomain '62a034ca-63df-44f2-9a87-735ddd257a6b' deactivate 
--datacenter-identifier '0002-0002-0002-0002-022f'

I cannot switch to the new Master Data Domain neither I can put the Data Center 
in maintenance.

I'm not sure if it is related to bug 1183977.  I've already upgraded to 
ovirt-engine-3.5.2-0.0.master.20150224122113.git410d88b.el6.noarch but the 
problem still happen.

Thanks,
-- 
Stefano Stagnaro

Prisma Engineering S.r.l.
Via Petrocchi, 4
20127 Milano – Italy

Tel. 02 26113507 int 339
e-mail: stefa...@prisma-eng.com
skype: stefano.stagnaro

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users