Was your hyper converged and is this storage gluster based?

Your error is DNS related, if a bit odd. Have you checked the resolv.conf 
configs and confirmed the servers listed there are reachable and responsive? 
When your hosts are active, are they able to mount all the storage domains they 
need? You should also make sure each HA node can reliably ping your gateway IP, 
failures there will cause nodes to bounce.

A starting place rather a solution, but the first places to look. Good luck!

  -Darrell



> On May 7, 2019, at 5:14 AM, Alan G <[email protected]> wrote:
> 
> Hi,
> 
> We have a dev cluster running 4.2. It had to be powered down as the building 
> was going to loose power. Since we've brought it back up it has been 
> massively un-stable (Hosts constantly switching state, VMs migrating all the 
> time).
> 
> I now have one host running (with HE) and all others in maintenance mode. 
> When I try activate another host i see storage errors in vdsm.log
> 
> 2019-05-07 09:41:00,114+0000 ERROR (monitor/a98c0b4) [storage.Monitor] Error 
> checking domain a98c0b42-47b9-4632-8b54-0ff3bd80d4c2 (monitor:424)
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 416, 
> in _checkDomainStatus
>     masterStats = self.domain.validateMaster()
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 941, in 
> validateMaster
>     if not self.validateMasterMount():
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line 1377, 
> in validateMasterMount
>     return mount.isMounted(self.getMasterDir())
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 161, in 
> isMounted
>     getMountFromTarget(target)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 173, in 
> getMountFromTarget
>     for rec in _iterMountRecords():
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 143, in 
> _iterMountRecords
>     for rec in _iterKnownMounts():
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 139, in 
> _iterKnownMounts
>     yield _parseFstabLine(line)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 81, in 
> _parseFstabLine
>     fs_spec = fileUtils.normalize_path(_unescape_spaces(fs_spec))
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/fileUtils.py", line 94, 
> in normalize_path
>     host, tail = address.hosttail_split(path)
>   File "/usr/lib/python2.7/site-packages/vdsm/common/network/address.py", 
> line 43, in hosttail_split
>     raise HosttailError('%s is not a valid hosttail address:' % hosttail)
> HosttailError: :/ is not a valid hosttail address:
> 
> Not sure if it's related but since the restart the hosted_storage domain has 
> been elected the master domain.
> 
> I'm a bit stuck at the moment. My only idea is to remove HE and switch to a 
> standalone Engine VM running outside the cluster.
> 
> Thanks,
> 
> Alan
> 
> _______________________________________________
> Users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/[email protected]/message/UDINZK5BQQHXYENSVV3OYFMVLG2YXBNT/

_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/I6YJQFP43R5NTQN3HG2VWBJW2WFFBGNB/

Reply via email to