Hi Paul,

Great to hear that you were able to get 3.4 working.

It makes sense that vdsm complained about the changed metadata - in general it 
is not a good idea to alter this yourself (and if you do, it is better to 
delete the checksum entry entirely and then let vdsm recalculate it next time 
it tries to read the metadata).

It is still strange that 3.3.2 had no issue if this information was still in 
the metadata *before* the upgrade, but the solution would still be to remove 
the reference to it somehow, although the preferred method would be to tell 
vdsm cli commands to do it using (hope this never happens again, but listing it 
here for reference anyway ;) ):

vdsClient -s 0 deactivateStorageDomain <domain UUID> <pool UUID>

and

vdsClient -s 0 detachStorageDomain <domain UUID> <pool UUID>


Thanks,
Gadi Ickowicz

----- Original Message -----
From: re...@mccleary.me.uk
To: "Gadi Ickowicz" <gicko...@redhat.com>
Cc: users@ovirt.org
Sent: Wednesday, April 30, 2014 10:44:29 PM
Subject: Re: [ovirt-users] Storage Domain Not Found After upgrade from 3.3.2 to 
3.4

Hi Gadi,

I thought I would have a look around for this "missing" storage domain.
I searched all the storage areas and in the master domain directory 
structure I found a metadata file which had a reference to this storage 
domain ID under POOL_DOMAINS.

I edited this file and removed the offending domain ID (all the other 
domain IDs matched those listed by vdsClient), whilst the host was in 
maintenance mode.

s2data1_s2mgt_boot1/0a897f2e-1b01-4577-9f91-cd136ef4a978/dom_md/metadata:
POOL_DOMAINS=7b083758-45f9-4896-913d-11fe02043e6e:Active,e637bb04-a8b7-4c77-809c-d58051494c52:Active,0a897f2e-1b01-4577-9f91-cd136ef4a978:Active,c2c4ade6-049e-4159-a294-a0c151f4983d:Active,9a4a80a1-5377-4a94-ade3-e58183e916ae:Active

The storage domains failed to come online and this message was in vdsm.log:

Thread-105::ERROR::2014-04-30 
20:10:54,292::dispatcher::67::Storage.Dispatcher.Protect::(run) 
{'status': {'message': "Meta Data seal is broken (checksum mismatch): 
'cksum = e8b290eebc9a70d822b38a81d25d9f11eae6f282, computed_cksum = 
83d6a7876b6a915f69818490610306b0287efe6f'", 'code': 752}}

I had seen a _SHA_CKSUM parameter in the metadata file, so I changed 
this to the computed value listed in the logfile and the storage domains 
activated fine.

I then upgraded to 3.4 and when I took it out of maintenance mode, 
everything came online fine :)

So it looks to me like 3.4 or the upgrade to it, has a problem with this 
erroneous storage domain ID being in the metadata file, that didn't 
cause a problem for 3.3.2.  Either way both my servers are now running 
3.4 :)

Thanks for all your help with this.

Cheers,

Paul

On 30/04/2014 06:48, Gadi Ickowicz wrote:
> Hi Paul,
>
> Looking at what you sent, it seems a bit strange - another domain was added 
> to the data-center according to the logs you sent in the other mail (the ones 
> after the upgrade).
> If you could send the engine logs for for the same time period and the engine 
> upgrade log, there may be some information there, but currently the log you 
> sent just starts from the fact that there is a missing storage domain, and it 
> is already listed in the storage pool's metadata.
>
> In any case, it is (somewhat) good to hear that reverting to 3.3.2 at least 
> returned the system to a sane state.
>
> Thanks,
> Gadi Ickowicz
>
> ----- Original Message -----
> From: re...@mccleary.me.uk
> To: "Gadi Ickowicz" <gicko...@redhat.com>
> Cc: users@ovirt.org
> Sent: Tuesday, April 29, 2014 7:52:15 PM
> Subject: Re: [ovirt-users] Storage Domain Not Found After upgrade from 3.3.2 
> to       3.4
>
> Hi Gadi,
>
> Thanks for the response.
>
> I've only just had chance to restore that node back, which completed
> fine and the DC and it's storage domains activated ok.  So the
> ovirt-node is now fully functional again on the 3.3.2 version; the
> ovirt-engine server remains untouched on 3.4 and is working fine.
>
> I collected the requested info, but it doesn't list the ID you pulled
> out of the vdsm.log:
>
> [root@ovirt-node ~]# vdsClient -s 0 getStorageDomainsList
> e637bb04-a8b7-4c77-809c-d58051494c52
> 7b083758-45f9-4896-913d-11fe02043e6e
> c2c4ade6-049e-4159-a294-a0c151f4983d
> 0a897f2e-1b01-4577-9f91-cd136ef4a978
>
> I tried anyway to list against that ID and I get the storage domain
> doesn't exist error, but maybe that's because it really doesn't exist!?
> [root@ovirt-node ~]# vdsClient -s 0 getStorageDomainInfo
> 9a4a80a1-5377-4a94-ade3-e58183e916ae
> Storage domain does not exist: ('9a4a80a1-5377-4a94-ade3-e58183e916ae',)
>
> I grabbed the info against all the Storage Domain IDs from the list
> without issue:
> [root@ovirt-node ~]#
> ## Storage Domain: e637bb04-a8b7-4c77-809c-d58051494c52 ##
>           uuid = e637bb04-a8b7-4c77-809c-d58051494c52
>           pool = ['c713062f-300f-4256-9ac8-2d3fcfcdb002']
>           lver = -1
>           version = 3
>           role = Regular
>           remotePath = /vdsm_store/s2data1_s2usr_boot1
>           spm_id = -1
>           type = LOCALFS
>           class = Data
>           master_ver = 0
>           name = s2data1_s2usr_boot1
>
> ####
> ## Storage Domain: 7b083758-45f9-4896-913d-11fe02043e6e ##
>           uuid = 7b083758-45f9-4896-913d-11fe02043e6e
>           pool = ['f027ec99-913f-4f00-ac95-ad484c9c6a4b',
> 'c713062f-300f-4256-9ac8-2d3fcfcdb002']
>           lver = -1
>           version = 0
>           role = Regular
>           remotePath = ovirt-engine:/iso
>           spm_id = -1
>           type = NFS
>           class = Iso
>           master_ver = 0
>           name = ISO1_ZFS
>
> ####
> ## Storage Domain: c2c4ade6-049e-4159-a294-a0c151f4983d ##
>           uuid = c2c4ade6-049e-4159-a294-a0c151f4983d
>           pool = ['c713062f-300f-4256-9ac8-2d3fcfcdb002']
>           lver = -1
>           version = 3
>           role = Regular
>           remotePath = /vdsm_store/s2data1_s2mgt_app1
>           spm_id = -1
>           type = LOCALFS
>           class = Data
>           master_ver = 0
>           name = s2data1_s2mgt_app1
>
> ####
> ## Storage Domain: 0a897f2e-1b01-4577-9f91-cd136ef4a978 ##
>           uuid = 0a897f2e-1b01-4577-9f91-cd136ef4a978
>           pool = ['c713062f-300f-4256-9ac8-2d3fcfcdb002']
>           lver = 1
>           version = 3
>           role = Master
>           remotePath = /vdsm_store/s2data1_s2mgt_boot1
>           spm_id = 1
>           type = LOCALFS
>           class = Data
>           master_ver = 1
>           name = s2data1_s2mgt_boot1
>
> ####
>
> I checked the logfile for today and since the restore there are no
> errors about this storage domain not existing:
>
> [root@ovirt-node vdsm]# grep "04-29" vdsm.log|grep
> "StorageDomainDoesNotExist"
> [root@ovirt-node vdsm]#
>
> I also checked the IDs listed on the ovirt-engine server just on the
> off-chance, but the ID throwing the error doesn't exist on that either:
>
> [root@ovirt-engine ~]# vdsClient -s 0 getStorageDomainsList
> feb04d94-4ea8-471c-b759-3ed95943e9a3
> b3c02266-2426-4285-b4dc-0acba75af530
> 7b083758-45f9-4896-913d-11fe02043e6e
> 79178b6b-8d98-45e4-93f2-3ce1d7a270a5
>
> None of the storage domains exist on the root VG partitions, they are
> completely separate disks in separate VGs.  I did delete some Storage
> Domains on both the ovirt-engine and the ovirt-node a number of weeks
> back, but if it was due to this I would have expected the issues on both
> servers, not just one.
>
> Have you found anything else that looks interesting in the log?
>
> Thanks, Paul
>
>
> On 28/04/2014 07:07, Gadi Ickowicz wrote:
>> Hi Paul,
>>
>> I am still looking into this log, but from a quick first assessment, it 
>> looks like (for some reason I don't know yet...) there is a storage domain 
>> that is missing. This is visible in the following error traceback in the 
>> vdsm log:
>>
>> Thread-29::ERROR::2014-04-27 
>> 12:43:05,825::domainMonitor::239::Storage.DomainMonitorThread::(_monitorDomain)
>>  Error while collecting domain 9a4a80a1-5377-4a94-ade3-e58183e916ae m
>> Traceback (most recent call last):
>>     File "/usr/share/vdsm/storage/domainMonitor.py", line 204, in 
>> _monitorDomain
>>       self.domain = sdCache.produce(self.sdUUID)
>>     File "/usr/share/vdsm/storage/sdc.py", line 98, in produce
>>       domain.getRealDomain()
>>     File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain
>>       return self._cache._realProduce(self._sdUUID)
>>     File "/usr/share/vdsm/storage/sdc.py", line 122, in _realProduce
>>       domain = self._findDomain(sdUUID)
>>     File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain
>>       dom = findMethod(sdUUID)
>>     File "/usr/share/vdsm/storage/sdc.py", line 171, in _findUnfetchedDomain
>>       raise se.StorageDomainDoesNotExist(sdUUID)
>> StorageDomainDoesNotExist: Storage domain does not exist: 
>> ('9a4a80a1-5377-4a94-ade3-e58183e916ae',)
>>
>>
>> However, I do not see any information in the vdsm log itself about what this 
>> domain is (yet - it may be there and I am still looking). The reason vdsm is 
>> trying to access this storage domain (9a4a80a1-5377-4a94-ade3-e58183e916ae) 
>> is that it appears to be part of the storage pool (datacenter) according to 
>> the pool's metadata, as seen in the following lines from the initial 
>> connection to the datacenter, when vdsm first starts up:
>>
>> Thread-13::DEBUG::2014-04-27 
>> 12:42:05,604::persistentDict::234::Storage.PersistentDict::(refresh) read 
>> lines (FileMetadataRW)=['CLASS=Data', 'DESCRIPTION=s2data1_s2mgt_boot1', 
>> 'IOOPTIMEOUTSEC=10', 'LEASERETRIES=3', 'LEASETIMESEC=60', 'LOCKPOLICY=', 
>> 'LOCKRENEWALINTERVALSEC=5', 'MASTER_VERSION=1', 
>> 'POOL_DESCRIPTION=BDS_DataCentre2', 
>> 'POOL_DOMAINS=7b083758-45f9-4896-913d-11fe02043e6e:Active,e637bb04-a8b7-4c77-809c-d58051494c52:Active,0a897ff2e-1b01-4577-9f91-cd136ef4a978:Active,c2c4ade6-049e-4159-a294-a0c151f4983d:Active,9a4a80a1-5377-4a94-ade3-e58183e916ae:Active',
>>  'POOL_SPM_ID=-1', 'POOL_SPM_LVER=0', 
>> 'POOL_UUID=c713062f-300f-4256-9ac8-2d3fcfcdb002', 
>> 'REMOTE_PATH=/vdsm_store/s2data1_s2mgt_boot1', 'ROLE=Master', 
>> 'SDUUID=0a897f2e-1b01-4577-9f91-cd136ef4a978', 'TYPE=LOCALFS', 'VERSION=3', 
>> '_SHA_CKSUM=afe618d7596d75d0fb96453bcdd34a1255534454']
>>
>> Is it possible you had another domain attached to this datacenter before the 
>> upgrade that is somehow on the root vg partitions and gets destroyed along 
>> the upgrade process, and then reverting brings them back?
>>
>> If you have this system currently back to 3.3.2 it should be up, could you 
>> run the following commands on the ovirt-node:
>>
>> vdsClient -s 0 getStorageDomainsList    <- This lists all storage domains' 
>> ids that vdsm (the ovirt-node) can currently see (hopefully the domain in 
>> question's ID should be listed there)
>>
>> vdsClient -s 0 getStorageDomainInfo 9a4a80a1-5377-4a94-ade3-e58183e916ae  <- 
>> displays information about the storage domain. If this succeeds we should 
>> know a bit more about the domain
>>
>> Thanks,
>> Gadi Ickowicz
>>
>> ----- Original Message -----
>> From: re...@mccleary.me.uk
>> To: "Gadi Ickowicz" <gicko...@redhat.com>, users@ovirt.org
>> Sent: Sunday, April 27, 2014 3:10:27 PM
>> Subject: Re: [ovirt-users] Storage Domain Not Found After upgrade from 3.3.2 
>> to      3.4
>>
>> Hi,
>>
>> Yes, you're correct, Gadi.  I have a single Ovirt engine, which has two
>> Datacenters; one is local storage on the Engine server and the other is
>> local storage on the Ovirt Node.  I've renamed the servers in the
>> attached vdsm log output (from the ovirt-node) to ovirt-engine
>> (10.50.0.18) and ovirt-node (10.50.0.19).
>>
>> BDS_DataCentre1 is on the ovirt-engine server and this works fine after
>> the upgrade.
>> BDS_DataCentre2 is on the ovirt-node and this is the one that fails to
>> activate due to the storage domains not being accessible.
>>
>> The Master storage domain on the ovirt-node is s2data1_s2mgt_boot1.
>> There are two other storage domains as well: s2data1_s2mgt_app1 and
>> s2data1_s2usr_boot1.  The underlying filesystems are mounted fine, and
>> as I said, if I restore the server (root vg partitions; the storage
>> domain filesystems are not touched) then it works fine again.  So the
>> upgrade is not playing nicely for some reason, but it's not clear to me
>> from the log what the issue is.
>>
>> Thanks,
>>
>> Paul
>>
>> On 27/04/2014 07:56, Gadi Ickowicz wrote:
>>> Hi,
>>>
>>> Could you please attach the vdsm log as a file (it is easier to read) for 
>>> the failing node?
>>>
>>> Also - I am a bit confused regarding what exactly your setup is - Do you 
>>> have only a single engine (the all-in-one) and 2 dcs, one for the all in 
>>> one and one for the oVirt node, which is the one that is failing?
>>>
>>> Thanks,
>>> Gadi Ickowicz
>>>
>>> ----- Original Message -----
>>> From: re...@mccleary.me.uk
>>> To: users@ovirt.org
>>> Sent: Friday, April 25, 2014 10:46:35 PM
>>> Subject: [ovirt-users] Storage Domain Not Found After upgrade from 3.3.2 to 
>>> 3.4
>>>
>>> Hi,
>>>
>>> I have an All-in-one installation engine node and an Ovirt node. They
>>> each use local storage and are thus configured in separate Data
>>> Centres.  I upgraded both from 3.3.2 to 3.4 and this completed without
>>> error.  I completed the engine-setup upgrade and this complete ok.  I
>>> rebooted both the servers and the Ovirt engine node worked fine and I
>>> could start it's VMs.  The Ovirt Node's datacenter is not activating,
>>> which seems to be due to none of the storage domains coming online.
>>> I've checked and the storage is mounted and available fine on the Ovirt
>>> node.
>>>
>>> Looking at the vdsm.log I can see errors stating that it can't find the
>>> storage domain.  I have restored the entire node back to the pre-upgrade
>>> state and it works fine.  It breaks again when I upgrade it.  The
>>> approach I used was to put the Ovirt node in maintenance mode and run
>>> yum update.  Anybody have similar issues or understand the log errors below?
>>>
>>> < SNIP SNIP original log output>
>>>
>>>
>>> Thanks, Paul
>>> _______________________________________________
>>> Users mailing list
>>> Users@ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Reply via email to