Hi,
At the beginning of february I successfully installed oVirt 3.6.2 (with hosted
engine) on 3 hosts, which are using 1 storage server with GlusterFS.
2 hosts (with Intel CPU) are using HA and are hosting the engine; the 3rd host
(AMD CPU) was added as host from oVirt web administration panel, without hosted
engine deployment (I don't want the engine running on this host).
About 10 days ago I tried to reboot my oVirt environment (i.e. going to global
maintenance, shutting down the engine, turning off all the hosts, starting them
again, then setting maintenance mode to "none").
After the reboot, everything was fine with the Intel hosts and the hosted
engine, but AMD host (the one without HA) was not operational.
I tryed to activate it, buti t failed with the following error:
"Host failed to attach one of Storage Domains attached to it."
If I log into my AMD host and I check the logs, I see that the storage domain
which is not mounted is the one of the hosted engine (but this could be
correct, since this host won't run the hosted engine).
>From /var/log/vdsm/vdsm.log:
Thread-29::DEBUG::2016-02-25
11:44:01,157::monitor::322::Storage.Monitor::(_produceDomain) Producing domain
6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd
Thread-29::ERROR::2016-02-25
11:44:01,158::sdc::139::Storage.StorageDomainCache::(_findDomain) looking for
unfetched domain 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd
Thread-29::ERROR::2016-02-25
11:44:01,158::sdc::156::Storage.StorageDomainCache::(_findUnfetchedDomain)
looking for domain 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd
Thread-29::DEBUG::2016-02-25
11:44:01,159::lvm::370::Storage.OperationMutex::(_reloadvgs) Operation 'lvm
reload operation' got the operation mutex
Thread-29::DEBUG::2016-02-25 11:44:01,159::lvm::290::Storage.Misc.excCmd::(cmd)
/usr/bin/taskset --cpu-list 0-63 /usr/bin/sudo -n /usr/sbin/lvm vgs --config '
devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1
write_cache_state=0 disable_after_error_count=3 filter = [ '\''r|.*|'\'' ] }
global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1
use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' --noheadings
--units b --nosuffix --separator '|' --ignoreskippedcluster -o
uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd (cwd None)
Thread-29::DEBUG::2016-02-25 11:44:01,223::lvm::290::Storage.Misc.excCmd::(cmd)
FAILED: <err> = ' WARNING: lvmetad is running but disabled. Restart lvmetad
before enabling it!\n Volume group "6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd" not
found\n Cannot process volume group 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd\n';
<rc> = 5
Thread-29::WARNING::2016-02-25
11:44:01,225::lvm::375::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 [] ['
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!',
' Volume group "6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd" not found', ' Cannot
process volume group 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd']
Thread-29::DEBUG::2016-02-25
11:44:01,225::lvm::415::Storage.OperationMutex::(_reloadvgs) Operation 'lvm
reload operation' released the operation mutex
Thread-29::ERROR::2016-02-25
11:44:01,245::sdc::145::Storage.StorageDomainCache::(_findDomain) domain
6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd not found
Traceback (most recent call last):
File "/usr/share/vdsm/storage/sdc.py", line 143, in _findDomain
dom = findMethod(sdUUID)
File "/usr/share/vdsm/storage/sdc.py", line 173, in _findUnfetchedDomain
raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist:
(u'6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd',)
Thread-29::ERROR::2016-02-25
11:44:01,246::monitor::276::Storage.Monitor::(_monitorDomain) Error monitoring
domain 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd
Traceback (most recent call last):
File "/usr/share/vdsm/storage/monitor.py", line 264, in _monitorDomain
self._produceDomain()
File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 767, in wrapper
value = meth(self, *a, **kw)
File "/usr/share/vdsm/storage/monitor.py", line 323, in _produceDomain
self.domain = sdCache.produce(self.sdUUID)
File "/usr/share/vdsm/storage/sdc.py", line 100, in produce
domain.getRealDomain()
File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain
return self._cache._realProduce(self._sdUUID)
File "/usr/share/vdsm/storage/sdc.py", line 124, in _realProduce
domain = self._findDomain(sdUUID)
File "/usr/share/vdsm/storage/sdc.py", line 143, in _findDomain
dom = findMethod(sdUUID)
File "/usr/share/vdsm/storage/sdc.py", line 173, in _findUnfetchedDomain
raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist:
(u'6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd',)
jsonrpc.Executor/0::DEBUG::2016-02-25
11:44:03,292::task::595::Storage.TaskManager.Task::(_updateState)
Task=`2862ba96-8080-4e74-a55a-cdf93326631a`::moving from state init -> state
preparing
jsonrpc.Executor/0::INFO::2016-02-25
11:44:03,293::logUtils::48::dispatcher::(wrapper) Run and protect:
repoStats(options=None)
jsonrpc.Executor/0::INFO::2016-02-25
11:44:03,293::logUtils::51::dispatcher::(wrapper) Run and protect: repoStats,
Return response: {u'5f7991ba-fdf8-4b40-9974-c7adcd4da879': {'code': 0,
'actual': True, 'version': 3, 'acquired': True, 'delay': '0.00056349',
'lastCheck': '7.7', 'valid': True}, u'6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd':
{'code': 358, 'actual': True, 'version': -1, 'acquired': False, 'delay': '0',
'lastCheck': '2.0', 'valid': False}, u'5efea9c7-c4ec-44d4-a283-060d4c83303c':
{'code': 0, 'actual': True, 'version': 0, 'acquired': True, 'delay':
'0.000561865', 'lastCheck': '8.4', 'valid': True},
u'e84c6a1a-9f82-4fa6-9a3b-0b0bc0330d9a': {'code': 0, 'actual': True, 'version':
3, 'acquired': True, 'delay': '0.000227759', 'lastCheck': '8.7', 'valid': True}}
jsonrpc.Executor/0::DEBUG::2016-02-25
11:44:03,294::task::1191::Storage.TaskManager.Task::(prepare)
Task=`2862ba96-8080-4e74-a55a-cdf93326631a`::finished:
{u'5f7991ba-fdf8-4b40-9974-c7adcd4da879': {'code': 0, 'actual': True,
'version': 3, 'acquired': True, 'delay': '0.00056349', 'lastCheck': '7.7',
'valid': True}, u'6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd': {'code': 358,
'actual': True, 'version': -1, 'acquired': False, 'delay': '0', 'lastCheck':
'2.0', 'valid': False}, u'5efea9c7-c4ec-44d4-a283-060d4c83303c': {'code': 0,
'actual': True, 'version': 0, 'acquired': True, 'delay': '0.000561865',
'lastCheck': '8.4', 'valid': True}, u'e84c6a1a-9f82-4fa6-9a3b-0b0bc0330d9a':
{'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay':
'0.000227759', 'lastCheck': '8.7', 'valid': True}}
jsonrpc.Executor/0::DEBUG::2016-02-25
11:44:03,294::task::595::Storage.TaskManager.Task::(_updateState)
Task=`2862ba96-8080-4e74-a55a-cdf93326631a`::moving from state preparing ->
state finished
jsonrpc.Executor/0::DEBUG::2016-02-25
11:44:03,294::resourceManager::940::Storage.ResourceManager.Owner::(releaseAll)
Owner.releaseAll requests {} resources {}
jsonrpc.Executor/0::DEBUG::2016-02-25
11:44:03,295::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll)
Owner.cancelAll requests {}
jsonrpc.Executor/0::DEBUG::2016-02-25
11:44:03,295::task::993::Storage.TaskManager.Task::(_decref)
Task=`2862ba96-8080-4e74-a55a-cdf93326631a`::ref 0 aborting False
Thread-30::DEBUG::2016-02-25
11:44:04,603::fileSD::173::Storage.Misc.excCmd::(getReadDelay) /usr/bin/taskset
--cpu-list 0-63 /usr/bin/dd
if=/rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_ssd-pcie/e84c6a1a-9f82-4fa6-9a3b-0b0bc0330d9a/dom_md/metadata
iflag=direct of=/dev/null bs=4096 count=1 (cwd None)
Thread-30::DEBUG::2016-02-25
11:44:04,630::fileSD::173::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> =
'0+1 records in\n0+1 records out\n336 bytes (336 B) copied, 0.000286148 s, 1.2
MB/s\n'; <rc> = 0
Thread-31::DEBUG::2016-02-25
11:44:04,925::fileSD::173::Storage.Misc.excCmd::(getReadDelay) /usr/bin/taskset
--cpu-list 0-63 /usr/bin/dd
if=/rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_isodomain/5efea9c7-c4ec-44d4-a283-060d4c83303c/dom_md/metadata
iflag=direct of=/dev/null bs=4096 count=1 (cwd None)
Thread-31::DEBUG::2016-02-25
11:44:04,950::fileSD::173::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> =
'0+1 records in\n0+1 records out\n339 bytes (339 B) copied, 0.0005884 s, 576
kB/s\n'; <rc> = 0
Thread-28::DEBUG::2016-02-25
11:44:05,583::fileSD::173::Storage.Misc.excCmd::(getReadDelay) /usr/bin/taskset
--cpu-list 0-63 /usr/bin/dd
if=/rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_virtualmachines/5f7991ba-fdf8-4b40-9974-c7adcd4da879/dom_md/metadata
iflag=direct of=/dev/null bs=4096 count=1 (cwd None)
Thread-28::DEBUG::2016-02-25
11:44:05,606::fileSD::173::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> =
'0+1 records in\n0+1 records out\n482 bytes (482 B) copied, 0.000637557 s, 756
kB/s\n'; <rc> = 0
Other commands (executed on the host having the problem) which probably give
useful information:
# vdsClient -s 0 getConnectedStoragePoolsList
00000001-0001-0001-0001-00000000020e
# vdsClient -s 0 getStoragePoolInfo 00000001-0001-0001-0001-00000000020e
name = No Description
isoprefix =
/rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_isodomain/5efea9c7-c4ec-44d4-a283-060d4c83303c/images/11111111-1111-1111-1111-111111111111
pool_status = connected
lver = 6
spm_id = 2
master_uuid = 5f7991ba-fdf8-4b40-9974-c7adcd4da879
version = 3
domains =
5f7991ba-fdf8-4b40-9974-c7adcd4da879:Active,6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd:Active,5efea9c7-c4ec-44d4-a283-060d4c83303c:Active,e84c6a1a-9f82-4fa6-9a3b-0b0bc0330d9a:Active
type = GLUSTERFS
master_ver = 1
5f7991ba-fdf8-4b40-9974-c7adcd4da879 = {'status': 'Active', 'diskfree':
'6374172262400', 'isoprefix': '', 'alerts': [], 'disktotal': '6995436371968',
'version': 3}
6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd = {'status': 'Active',
'isoprefix': '', 'alerts': [], 'version': -1}
e84c6a1a-9f82-4fa6-9a3b-0b0bc0330d9a = {'status': 'Active', 'diskfree':
'224145833984', 'isoprefix': '', 'alerts': [], 'disktotal': '236317179904',
'version': 3}
5efea9c7-c4ec-44d4-a283-060d4c83303c = {'status': 'Active', 'diskfree':
'6374172262400', 'isoprefix':
'/rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_isodomain/5efea9c7-c4ec-44d4-a283-060d4c83303c/images/11111111-1111-1111-1111-111111111111',
'alerts': [], 'disktotal': '6995436371968', 'version': 0}
# vdsClient -s 0 getStorageDomainInfo 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd
Storage domain does not exist: ('6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd',)
If I run this last command on one of the working hosts:
# vdsClient -s 0 getStorageDomainInfo 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd
uuid = 6fb10a49-5f1c-4bd4-9ff7-b7e33c1125cd
version = 3
role = Regular
remotePath = srv-stor-01-ib0:/ovirtengine
type = GLUSTERFS
class = Data
pool = ['00000001-0001-0001-0001-00000000020e']
name = hosted_storage
(please note: this is the storage domain used for my hosted engine)
If I run "mount" on my AMD host (the one with the problem):
# mount
...
srv-stor-01-ib0:/virtualmachines on
/rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_virtualmachines type
fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
srv-stor-01-ib0:/isodomain on
/rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_isodomain type fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
srv-stor-01-ib0:/ssd-pcie on
/rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_ssd-pcie type fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
tmpfs on /run/user/0 type tmpfs
(rw,nosuid,nodev,relatime,size=13185796k,mode=700)
If I run "mount" on one of the Intel hosts (currently working):
# mount
...
srv-stor-01-ib0:/ovirtengine on
/rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_ovirtengine type
fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
srv-stor-01-ib0:/virtualmachines on
/rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_virtualmachines type
fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
srv-stor-01-ib0:/isodomain on
/rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_isodomain type fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
srv-stor-01-ib0:/ssd-pcie on
/rhev/data-center/mnt/glusterSD/srv-stor-01-ib0:_ssd-pcie type fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
tmpfs on /run/user/0 type tmpfs
(rw,nosuid,nodev,relatime,size=3272288k,mode=700)
The only difference in "mount" is that the hosted-engine storage domain is not
loaded on the host which should not run the engine. The other domains are
mounted correctly.
What could I do to solve this issue?
Best regards,
Giuseppe
--
Giuseppe Berellini
PTV SISTeMA
www.sistemaits.com<http://www.sistemaits.com/>
facebook.com/sistemaits<https://www.facebook.com/sistemaits>
linkedin.com/SISTeMA<https://www.linkedin.com/company/sistema-soluzioni-per-l-ingegneria-dei-sistemi-di-trasporto-e-l-infomobilit-s-r-l->
_______________________________________________
Users mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/users