Any errors in vdsm.log or gluster mount log for this volume?
On Wed, Feb 27, 2019 at 1:07 PM Mike Lykov <[email protected]> wrote: > > > Hi all. I have a HCI setup, glusterfs 3.12, ovirt 4.2.7, 4 nodes > > Yesterday I see 3 VMs detected by engine as "not responding" (it is marked as > HA VMs) > (it all located on ovirtnode1 server) > Two of them are restarted by engine on other nodes successfully, but one are > not. All get LOCALIZED message like 'cannot set lock: no free space on > device' - what is this ? Why engine get that errors, and why some VMs can > restart automatically, some not (but successfully restarted bu user via webui > after some pause?) > Who knows, what are you think? Full engine logs may be uploaded. > > from engine.log: start event > --------------- > 2019-02-26 17:04:05,308+04 INFO > [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] > (EE-ManagedThreadFactory-engineScheduled-Thread-59) [] VM > 'd546add1-126a-4490-bc83-469bab659854'(openfire.miac) moved from 'Up' --> > 'NotResponding' > 2019-02-26 17:04:05,865+04 WARN > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (EE-ManagedThreadFactory-engineScheduled-Thread-59) [] EVENT_ID: > VM_NOT_RESPONDING(126), VM openfire.miac is not responding. > 2019-02-26 17:04:05,865+04 INFO > [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] > (EE-ManagedThreadFactory-engineScheduled-Thread-59) [] VM > '7a3af2e7-8296-4fe0-ac55-c52a4b1de93f'(e-l-k.miac) moved from 'Up' --> > 'NotResponding' > 2019-02-26 17:04:05,894+04 WARN > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (EE-ManagedThreadFactory-engineScheduled-Thread-59) [] EVENT_ID: > VM_NOT_RESPONDING(126), VM e-l-k.miac is not responding. > 2019-02-26 17:04:05,895+04 INFO > [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] > (EE-ManagedThreadFactory-engineScheduled-Thread-59) [] VM > 'de76aa6c-a211-41de-8d85-7d2821c3980d'(tsgr-mon) moved from 'Up' --> > 'NotResponding' > 2019-02-26 17:04:05,926+04 WARN > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (EE-ManagedThreadFactory-engineScheduled-Thread-59) [] EVENT_ID: > VM_NOT_RESPONDING(126), VM tsgr-mon is not responding. > --------------------------- > 2019-02-26 17:04:22,237+04 ERROR > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (ForkJoinPool-1-worker-9) [] EVENT_ID: VM_DOWN_ERROR(119), VM openfire.miac > is down with error. Exit message: VM has been terminated on the host. > .... > 2019-02-26 17:04:22,374+04 INFO [org.ovirt.engine.core.bll.VdsEventListener] > (ForkJoinPool-1-worker-9) [] Highly Available VM went down. Attempting to > restart. VM Name 'openfire.miac', VM Id 'd546add1-126a-4490-bc83-469bab659854' > ... > 2019-02-26 17:04:27,737+04 ERROR > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (EE-ManagedThreadFactory-engineScheduled-Thread-5) [] EVENT_ID: > VM_DOWN_ERROR(119), VM openfire.miac is down with error. Exit message: > resource busy: Failed to acquire lock: Lease is held by another host. > ... > 2019-02-26 17:04:28,350+04 INFO > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (EE-ManagedThreadFactory-engine-Thread-2886073) [] EVENT_ID: > VDS_INITIATED_RUN_VM(506), Trying to restart VM openfire.miac on Host > ovirtnode6.miac > 2019-02-26 17:04:31,841+04 ERROR > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (ForkJoinPool-1-worker-2) [] EVENT_ID: VM_DOWN_ERROR(119), VM openfire.miac > is down with error. Exit message: resource busy: Failed to acquire lock: > Lease is held by another host. > 2019-02-26 17:04:31,877+04 ERROR > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (EE-ManagedThreadFactory-engine-Thread-2886082) [] EVENT_ID: > VDS_INITIATED_RUN_VM_FAILED(507), Failed to restart VM openfire.miac on Host > ovirtnode6.miac > ... > 2019-02-26 17:04:31,994+04 INFO > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (EE-ManagedThreadFactory-engine-Thread-2886082) [] EVENT_ID: > VDS_INITIATED_RUN_VM(506), Trying to restart VM openfire.miac on Host > ovirtnode1.miac > ..... > 2019-02-26 17:04:36,054+04 ERROR > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (ForkJoinPool-1-worker-9) [] EVENT_ID: VM_DOWN_ERROR(119), VM openfire.miac > is down with error. Exit message: Не удалось установить блокировку: На > устройстве не осталось свободного места. > 2019-02-26 17:04:36,054+04 INFO > [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] > (ForkJoinPool-1-worker-9) [] add VM > 'd546add1-126a-4490-bc83-469bab659854'(openfire.miac) to rerun treatment > 2019-02-26 17:04:36,091+04 ERROR > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (EE-ManagedThreadFactory-engine-Thread-2886083) [] EVENT_ID: > VDS_INITIATED_RUN_VM_FAILED(507), Failed to restart VM openfire.miac on Host > ovirtnode1.miac > ------------------- > No more attemtps for this VM were made (state now is 'Down') > > Engine tried restart this VM on some other nodes, get ' Lease is held by > another host' (it is normal, because timeout for lock not expired?) and then > got (LOCALIZED MESSAGE ?? Why it is localized while all other are not?) > 'cannot set lock: no free space on device' > > Which 'device' it mean? How to see how many 'free space' in it? gluster > volumes almost empty: > ovirtstor1.miac:/engine 150G 7,5G 143G > 5% /rhev/data-center/mnt/glusterSD/ovirtstor1.miac:_engine > ovirtnode1.miac:/data 5,9T 42G 5,9T > 1% /rhev/data-center/mnt/glusterSD/ovirtnode1.miac:_data > ovirtnode1.miac:/vmstore 4,4T 194G 4,2T > 5% /rhev/data-center/mnt/glusterSD/ovirtnode1.miac:_vmstore > ovirtnode1.miac:/iso_storage 600G 1,3G 599G > 1% /rhev/data-center/mnt/glusterSD/ovirtnode1.miac:_iso__storage > ovirtnode1.miac:/export 5,1T 14G 5,1T > 1% /rhev/data-center/mnt/glusterSD/ovirtnode1.miac:_export > > Other VMs also get this error about free space: > ------------- > 2019-02-26 17:05:05,142+04 ERROR > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (ForkJoinPool-1-worker-11) [] EVENT_ID: VM_DOWN_ERROR(119), VM tsgr-mon is > down with error. Exit message: Не удалось установить блокир > овку: На устройстве не осталось свободного места. > > 2019-02-26 17:05:43,526+04 INFO > [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] > (ForkJoinPool-1-worker-11) [] VM > 'de76aa6c-a211-41de-8d85-7d2821c3980d'(tsgr-mon) moved from 'PoweringUp' --> > 'Up' > 2019-02-26 17:05:43,537+04 INFO > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (ForkJoinPool-1-worker-11) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to > restart VM tsgr-mon on Host ovirtnode6.miac > ---------------- > And it is last message about this VM, state now is 'Up' > > Therefore all VMs are migrated from this node ovirtnode1, and later in log > are records about it unresponsiveness: > > 2019-02-26 17:07:43,584+04 ERROR > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (DefaultQuartzScheduler9) [35e05907] EVENT_ID: > VDS_BROKER_COMMAND_FAILURE(10,802), VDSM ovirtnode1.miac command > GetGlusterVolumeHealIn > foVDS failed: Message timeout which can be caused by communication issues > 2019-02-26 17:07:43,592+04 WARN > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (EE-ManagedThreadFactory-engine-Thread-2886194) [35e05907] EVENT_ID: > VDS_HOST_NOT_RESPONDING_CONNECTING(9,008), Host ovirtnode1.miac i > s not responding. It will stay in Connecting state for a grace period of 60 > seconds and after that an attempt to fence the host will be issued. > 2019-02-26 17:07:49,684+04 INFO > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (EE-ManagedThreadFactory-engineScheduled-Thread-99) [4511cab1] EVENT_ID: > FENCE_OPERATION_USING_AGENT_AND_PROXY_STARTED(9,020), Executing power > management status on Host ovirtnode1.miac using Proxy Host ovirtnode5.miac > and Fence Agent ipmilan:172.16.16.1. > 2019-02-26 17:07:49,685+04 INFO > [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] > (EE-ManagedThreadFactory-engineScheduled-Thread-99) [4511cab1] START, > FenceVdsVDSCommand(HostName = ovirtnode5.miac, ... action='STATUS', > agent='FenceAgent:{id='338cbdd9-aab6-484c-a906-21e747a96a84', ... > type='ipmilan', ip='172.16.16.1', port='null', user='admin', password='***', > encryptOptions='false', options=''}', policy='null'}), log id: > 281e6c2019-02-26 17:07:49,942+04 INFO > [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] > (EE-ManagedThreadFactory-engineScheduled-Thread-99) [4511cab1] FINISH, > FenceVdsVDSCommand, return: FenceOperationResult:{status='SUCCESS', > powerStatus='ON', message=''}, log id: 281e6c > > 2019-02-26 17:07:56,131+04 INFO > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (EE-ManagedThreadFactory-engineScheduled-Thread-99) [317316e4] EVENT_ID: > SYNC_STORAGE_DEVICES_IN_HOST(4,126), Manually synced the storage devices from > host ovirtnode1.miac > 2019-02-26 17:07:56,550+04 INFO > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (EE-ManagedThreadFactory-engineScheduled-Thread-99) [3a88be4f] EVENT_ID: > VDS_DETECTED(13), Status of host ovirtnode1.miac was set to Up. > 2019-02-26 17:07:56,554+04 INFO > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (EE-ManagedThreadFactory-engineScheduled-Thread-99) [3a88be4f] EVENT_ID: > VDS_FENCE_STATUS(496), Host ovirtnode1.miac power managementwas verified > successfully. > ------------- > > After all, now I try to start VM that not restarted automatically: > ------------- > 2019-02-27 11:23:30,139+04 INFO > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (EE-ManagedThreadFactory-engine-Thread-2921212) > [8e405cb9-3a3a-42ab-ae05-5aaf40194b30] EVENT_ID: USER_STARTED_VM(153), VM > openfire.miac was started by admin@internal-authz (Host: ovirtnode1.miac). > ..... > 2019-02-27 11:24:12,489+04 INFO > [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] > (ForkJoinPool-1-worker-1) [] VM > 'd546add1-126a-4490-bc83-469bab659854'(openfire.miac) moved from 'PoweringUp' > --> 'Up' > 2019-02-27 11:24:12,502+04 INFO > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (ForkJoinPool-1-worker-1) [] EVENT_ID: USER_RUN_VM(32), VM openfire.miac > started on Host ovirtnode1.miac > --------------- > > It started without problems on same host ovirtnode1. Strange. > > > -- > Mike > _______________________________________________ > Users mailing list -- [email protected] > To unsubscribe send an email to [email protected] > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/[email protected]/message/XHC5MDVYLIHKZR4YBCGULV6OLGTB25LF/ _______________________________________________ Users mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/PMLDHBXSEVKUSHFHYYGSQVFMJSN5W3HK/

