Hi,

about one hour ago my AMD host came back up, after more than 10 days being down.
Apart from checking the logs (which I suppose didn't help in solving my 
problem), I enabled NFS share on my ISO domain.

I'm still not able to understand how that could help.
I would be really happy to better understand what happened! :-)
If anyone has ideas/explanations to share, you are welcome! :-)

Best regards,
        Giuseppe

--
Giuseppe Berellini
PTV SISTeMA
www.sistemaits.com<http://www.sistemaits.com/>
facebook.com/sistemaits<https://www.facebook.com/sistemaits>
linkedin.com/SISTeMA<https://www.linkedin.com/company/sistema-soluzioni-per-l-ingegneria-dei-sistemi-di-trasporto-e-l-infomobilit-s-r-l->

Da: users-boun...@ovirt.org [mailto:users-boun...@ovirt.org] Per conto di 
Giuseppe Berellini
Inviato: giovedì 25 febbraio 2016 12:10
A: users@ovirt.org
Oggetto: [ovirt-users] One host failed to attach one of Storage Domains after 
reboot of all hosts

Hi,

At the beginning of february I successfully installed oVirt 3.6.2 (with hosted 
engine) on 3 hosts, which are using 1 storage server with GlusterFS.
2 hosts (with Intel CPU) are using HA and are hosting the engine; the 3rd host 
(AMD CPU) was added as host from oVirt web administration panel, without hosted 
engine deployment (I don't want the engine running on this host).

About 10 days ago I tried to reboot my oVirt environment (i.e. going to global 
maintenance, shutting down the engine, turning off all the hosts, starting them 
again, then setting maintenance mode to "none").
After the reboot, everything was fine with the Intel hosts and the hosted 
engine, but AMD host (the one without HA) was not operational.
I tryed to activate it, buti t failed with the following error:
        "Host failed to attach one of Storage Domains attached to it."

If I log into my AMD host and I check the logs, I see that the storage domain 
which is not mounted is the one of the hosted engine (but this could be 
correct, since this host won't run the hosted engine).

>From /var/log/vdsm/vdsm.log:

Thread-29::DEBUG::2016-02-25 
11:44:01,157::monitor::322::Storage.Monitor::(_produceDomain) Producing domain 
6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd
Thread-29::ERROR::2016-02-25 
11:44:01,158::sdc::139::Storage.StorageDomainCache::(_findDomain) looking for 
unfetched domain 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd
Thread-29::ERROR::2016-02-25 
11:44:01,158::sdc::156::Storage.StorageDomainCache::(_findUnfetchedDomain) 
looking for domain 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd
Thread-29::DEBUG::2016-02-25 
11:44:01,159::lvm::370::Storage.OperationMutex::(_reloadvgs) Operation 'lvm 
reload operation' got the operation mutex
Thread-29::DEBUG::2016-02-25 11:44:01,159::lvm::290::Storage.Misc.excCmd::(cmd) 
/usr/bin/taskset --cpu-list 0-63 /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' 
devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 
write_cache_state=0 disable_after_error_count=3 filter = [ '\''r|.*|'\'' ] }  
global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  
use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --noheadings 
--units b --nosuffix --separator '|' --ignoreskippedcluster -o 
uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd (cwd None)
Thread-29::DEBUG::2016-02-25 11:44:01,223::lvm::290::Storage.Misc.excCmd::(cmd) 
FAILED: <err> = '  WARNING: lvmetad is running but disabled. Restart lvmetad 
before enabling it!\n  Volume group "6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd" not 
found\n  Cannot process volume group 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd\n'; 
<rc> = 5
Thread-29::WARNING::2016-02-25 
11:44:01,225::lvm::375::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 [] ['  
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!', 
'  Volume group "6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd" not found', '  Cannot 
process volume group 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd']
Thread-29::DEBUG::2016-02-25 
11:44:01,225::lvm::415::Storage.OperationMutex::(_reloadvgs) Operation 'lvm 
reload operation' released the operation mutex
Thread-29::ERROR::2016-02-25 
11:44:01,245::sdc::145::Storage.StorageDomainCache::(_findDomain) domain 
6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd not found
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sdc.py", line 143, in _findDomain
    dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 173, in _findUnfetchedDomain
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: 
(u'6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd',)
Thread-29::ERROR::2016-02-25 
11:44:01,246::monitor::276::Storage.Monitor::(_monitorDomain) Error monitoring 
domain 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/monitor.py", line 264, in _monitorDomain
    self._produceDomain()
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 767, in wrapper
    value = meth(self, *a, **kw)
  File "/usr/share/vdsm/storage/monitor.py", line 323, in _produceDomain
    self.domain = sdCache.produce(self.sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 100, in produce
    domain.getRealDomain()
  File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain
    return self._cache._realProduce(self._sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 124, in _realProduce
    domain = self._findDomain(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 143, in _findDomain
    dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 173, in _findUnfetchedDomain
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: 
(u'6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd',)
jsonrpc.Executor/0::DEBUG::2016-02-25 
11:44:03,292::task::595::Storage.TaskManager.Task::(_updateState) 
Task=`2862ba96-8080-4e74-a55a-cdf93326631a`::moving from state init -> state 
preparing
jsonrpc.Executor/0::INFO::2016-02-25 
11:44:03,293::logUtils::48::dispatcher::(wrapper) Run and protect: 
repoStats(options=None)
jsonrpc.Executor/0::INFO::2016-02-25 
11:44:03,293::logUtils::51::dispatcher::(wrapper) Run and protect: repoStats, 
Return response: {u'5f7991ba-fdf8-4b40-9974-c7adcd4da879': {'code': 0, 
'actual': True, 'version': 3, 'acquired': True, 'delay': '0.00056349', 
'lastCheck': '7.7', 'valid': True}, u'6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd': 
{'code': 358, 'actual': True, 'version': -1, 'acquired': False, 'delay': '0', 
'lastCheck': '2.0', 'valid': False}, u'5efea9c7-c4ec-44d4-a283-060d4c83303c': 
{'code': 0, 'actual': True, 'version': 0, 'acquired': True, 'delay': 
'0.000561865', 'lastCheck': '8.4', 'valid': True}, 
u'e84c6a1a-9f82-4fa6-9a3b-0b0bc0330d9a': {'code': 0, 'actual': True, 'version': 
3, 'acquired': True, 'delay': '0.000227759', 'lastCheck': '8.7', 'valid': True}}
jsonrpc.Executor/0::DEBUG::2016-02-25 
11:44:03,294::task::1191::Storage.TaskManager.Task::(prepare) 
Task=`2862ba96-8080-4e74-a55a-cdf93326631a`::finished: 
{u'5f7991ba-fdf8-4b40-9974-c7adcd4da879': {'code': 0, 'actual': True, 
'version': 3, 'acquired': True, 'delay': '0.00056349', 'lastCheck': '7.7', 
'valid': True}, u'6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd': {'code': 358, 
'actual': True, 'version': -1, 'acquired': False, 'delay': '0', 'lastCheck': 
'2.0', 'valid': False}, u'5efea9c7-c4ec-44d4-a283-060d4c83303c': {'code': 0, 
'actual': True, 'version': 0, 'acquired': True, 'delay': '0.000561865', 
'lastCheck': '8.4', 'valid': True}, u'e84c6a1a-9f82-4fa6-9a3b-0b0bc0330d9a': 
{'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': 
'0.000227759', 'lastCheck': '8.7', 'valid': True}}
jsonrpc.Executor/0::DEBUG::2016-02-25 
11:44:03,294::task::595::Storage.TaskManager.Task::(_updateState) 
Task=`2862ba96-8080-4e74-a55a-cdf93326631a`::moving from state preparing -> 
state finished
jsonrpc.Executor/0::DEBUG::2016-02-25 
11:44:03,294::resourceManager::940::Storage.ResourceManager.Owner::(releaseAll) 
Owner.releaseAll requests {} resources {}
jsonrpc.Executor/0::DEBUG::2016-02-25 
11:44:03,295::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll) 
Owner.cancelAll requests {}
jsonrpc.Executor/0::DEBUG::2016-02-25 
11:44:03,295::task::993::Storage.TaskManager.Task::(_decref) 
Task=`2862ba96-8080-4e74-a55a-cdf93326631a`::ref 0 aborting False
Thread-30::DEBUG::2016-02-25 
11:44:04,603::fileSD::173::Storage.Misc.excCmd::(getReadDelay) /usr/bin/taskset 
--cpu-list 0-63 /usr/bin/dd 
if=/rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_ssd-pcie/e84c6a1a-9f82-4fa6-9a3b-0b0bc0330d9a/dom_md/metadata
 iflag=direct of=/dev/null bs=4096 count=1 (cwd None)
Thread-30::DEBUG::2016-02-25 
11:44:04,630::fileSD::173::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = 
'0+1 records in\n0+1 records out\n336 bytes (336 B) copied, 0.000286148 s, 1.2 
MB/s\n'; <rc> = 0
Thread-31::DEBUG::2016-02-25 
11:44:04,925::fileSD::173::Storage.Misc.excCmd::(getReadDelay) /usr/bin/taskset 
--cpu-list 0-63 /usr/bin/dd 
if=/rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_isodomain/5efea9c7-c4ec-44d4-a283-060d4c83303c/dom_md/metadata
 iflag=direct of=/dev/null bs=4096 count=1 (cwd None)
Thread-31::DEBUG::2016-02-25 
11:44:04,950::fileSD::173::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = 
'0+1 records in\n0+1 records out\n339 bytes (339 B) copied, 0.0005884 s, 576 
kB/s\n'; <rc> = 0
Thread-28::DEBUG::2016-02-25 
11:44:05,583::fileSD::173::Storage.Misc.excCmd::(getReadDelay) /usr/bin/taskset 
--cpu-list 0-63 /usr/bin/dd 
if=/rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_virtualmachines/5f7991ba-fdf8-4b40-9974-c7adcd4da879/dom_md/metadata
 iflag=direct of=/dev/null bs=4096 count=1 (cwd None)
Thread-28::DEBUG::2016-02-25 
11:44:05,606::fileSD::173::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = 
'0+1 records in\n0+1 records out\n482 bytes (482 B) copied, 0.000637557 s, 756 
kB/s\n'; <rc> = 0



Other commands (executed on the host having the problem) which probably give 
useful information:
# vdsClient -s 0 getConnectedStoragePoolsList
00000001-0001-0001-0001-00000000020e

# vdsClient -s 0 getStoragePoolInfo 00000001-0001-0001-0001-00000000020e
        name = No Description
        isoprefix = 
/rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_isodomain/5efea9c7-c4ec-44d4-a283-060d4c83303c/images/11111111-1111-1111-1111-111111111111
        pool_status = connected
        lver = 6
        spm_id = 2
        master_uuid = 5f7991ba-fdf8-4b40-9974-c7adcd4da879
        version = 3
        domains = 
5f7991ba-fdf8-4b40-9974-c7adcd4da879:Active,6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd:Active,5efea9c7-c4ec-44d4-a283-060d4c83303c:Active,e84c6a1a-9f82-4fa6-9a3b-0b0bc0330d9a:Active
        type = GLUSTERFS
        master_ver = 1
        5f7991ba-fdf8-4b40-9974-c7adcd4da879 = {'status': 'Active', 'diskfree': 
'6374172262400', 'isoprefix': '', 'alerts': [], 'disktotal': '6995436371968', 
'version': 3}
        6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd = {'status': 'Active', 
'isoprefix': '', 'alerts': [], 'version': -1}
        e84c6a1a-9f82-4fa6-9a3b-0b0bc0330d9a = {'status': 'Active', 'diskfree': 
'224145833984', 'isoprefix': '', 'alerts': [], 'disktotal': '236317179904', 
'version': 3}
        5efea9c7-c4ec-44d4-a283-060d4c83303c = {'status': 'Active', 'diskfree': 
'6374172262400', 'isoprefix': 
'/rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_isodomain/5efea9c7-c4ec-44d4-a283-060d4c83303c/images/11111111-1111-1111-1111-111111111111',
 'alerts': [], 'disktotal': '6995436371968', 'version': 0}

# vdsClient -s 0 getStorageDomainInfo 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd
Storage domain does not exist: ('6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd',)

If I run this last command on one of the working hosts:
# vdsClient -s 0 getStorageDomainInfo 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd
        uuid = 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd
        version = 3
        role = Regular
        remotePath = srv-stor-01-ib0:/ovirtengine
        type = GLUSTERFS
        class = Data
        pool = ['00000001-0001-0001-0001-00000000020e']
        name = hosted_storage

(please note: this is the storage domain used for my hosted engine)



If I run "mount" on my AMD host (the one with the problem):
# mount
...
srv-stor-01-ib0:/virtualmachines on 
/rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_virtualmachines type 
fuse.glusterfs 
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
srv-stor-01-ib0:/isodomain on 
/rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_isodomain type fuse.glusterfs 
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
srv-stor-01-ib0:/ssd-pcie on 
/rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_ssd-pcie type fuse.glusterfs 
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
tmpfs on /run/user/0 type tmpfs 
(rw,nosuid,nodev,relatime,size=13185796k,mode=700)

If I run "mount" on one of the Intel hosts (currently working):
# mount
...
srv-stor-01-ib0:/ovirtengine on 
/rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_ovirtengine type 
fuse.glusterfs 
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
srv-stor-01-ib0:/virtualmachines on 
/rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_virtualmachines type 
fuse.glusterfs 
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
srv-stor-01-ib0:/isodomain on 
/rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_isodomain type fuse.glusterfs 
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
srv-stor-01-ib0:/ssd-pcie on 
/rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_ssd-pcie type fuse.glusterfs 
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
tmpfs on /run/user/0 type tmpfs 
(rw,nosuid,nodev,relatime,size=3272288k,mode=700)

The only difference in "mount" is that the hosted-engine storage domain is not 
loaded on the host which should not run the engine. The other domains are 
mounted correctly.

What could I do to solve this issue?

Best regards,
        Giuseppe


--
Giuseppe Berellini
PTV SISTeMA
www.sistemaits.com<http://www.sistemaits.com/>
facebook.com/sistemaits<https://www.facebook.com/sistemaits>
linkedin.com/SISTeMA<https://www.linkedin.com/company/sistema-soluzioni-per-l-ingegneria-dei-sistemi-di-trasporto-e-l-infomobilit-s-r-l->

_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Reply via email to