Re: [Users] VM crashes and doesn't recover

Dafna Ron Sun, 24 Mar 2013 02:56:41 -0700

https://bugzilla.redhat.com/show_bug.cgi?id=890365


try restarting the vdsm service.
you had a problem with the storage and the vdsm did not recover properly.



On 03/24/2013 11:40 AM, Yuval M wrote:

sanlock is at the latest version (this solved another problem we had afew days ago):


$ rpm -q sanlock
sanlock-2.6-7.fc18.x86_64

the storage is on the same machine as the engine and vdsm.
iptables is up but there is a rule to allow all localhost traffic.

On Sun, Mar 24, 2013 at 11:34 AM, Maor Lipchuk <mlipc...@redhat.com<mailto:mlipc...@redhat.com>> wrote:


    From the VDSM log, it seems that the master storage domain was not
    responding.

    Thread-23::DEBUG::2013-03-22
    
18:50:20,263::domainMonitor::216::Storage.DomainMonitorThread::(_monitorDomain)
    Domain 1083422e-a5db-41b6-b667-b9ef1ef244f0 changed its status to
    Invalid
    ....
    Traceback (most recent call last):
      File "/usr/share/vdsm/storage/domainMonitor.py", line 186, in
    _monitorDomain
        self.domain.selftest()
      File "/usr/share/vdsm/storage/nfsSD.py", line 108, in selftest
        fileSD.FileStorageDomain.selftest(self)
      File "/usr/share/vdsm/storage/fileSD.py", line 480, in selftest
        self.oop.os.statvfs(self.domaindir)
      File "/usr/share/vdsm/storage/remoteFileHandler.py", line 280, in
    callCrabRPCFunction
        *args, **kwargs)
      File "/usr/share/vdsm/storage/remoteFileHandler.py", line 180, in
    callCrabRPCFunction
        rawLength = self._recvAll(LENGTH_STRUCT_LENGTH, timeout)
      File "/usr/share/vdsm/storage/remoteFileHandler.py", line 146,
    in _recvAll
        raise Timeout()
    Timeout
    .....

    I'm also see a san lock issue, but I think that is because the storage
    could not be reached:
    ReleaseHostIdFailure: Cannot release host id:
    ('1083422e-a5db-41b6-b667-b9ef1ef244f0', SanlockException(16, 'Sanlock
    lockspace remove failure', 'Device or resource busy'))

    Can you try to see if the ip tables are running on your host, and
    if so,
    please check if it is blocking the storage server by any chance?
    Can you try to manually mount this NFS and see if it works?
    Is it possible the storage server got connectivity issues?


    Regards,
    Maor

    On 03/22/2013 08:24 PM, Limor Gavish wrote:
    > Hello,
    >
    > I am using Ovirt 3.2 on Fedora 18:
    > [wil@bufferoverflow ~]$ rpm -q vdsm
    > vdsm-4.10.3-7.fc18.x86_64
    >
    > (the engine is built from sources).
    >
    > I seem to have hit this bug:
    > https://bugzilla.redhat.com/show_bug.cgi?id=922515
    >
    > in the following configuration:
    > Single host (no migrations)
    > Created a VM, installed an OS inside (Fedora18)
    > stopped the VM.
    > created template from it.
    > Created an additional VM from the template using thin provision.
    > Started the second VM.
    >
    > in addition to the errors in the logs the storage domains (both
    data and
    > ISO) crashed, i.e went to "unknown" and "inactive" states
    respectively.
    > (see the attached engine.log)
    >
    > I attached the VDSM and engine logs.
    >
    > is there a way to work around this problem?
    > It happens repeatedly.
    >
    > Yuval Meir
    >
    >
    >
    > _______________________________________________
    > Users mailing list
    > Users@ovirt.org <mailto:Users@ovirt.org>
    > http://lists.ovirt.org/mailman/listinfo/users
    >





_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users



--
Dafna Ron
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Re: [Users] VM crashes and doesn't recover

Reply via email to