Hi, Could you please share your volume info output?
-Krutika On Fri, Mar 10, 2017 at 6:41 PM, p...@email.cz <p...@email.cz> wrote: > freez / freezing > IO operations are paused from any reasons > available posibilities are > 1) net - any tcp framework collapse > 2) gluster interconnect due gluster daemon - process hang ?? > 3) VSD - pause managed services > 4) XFS - RW issues > 5) swap overfulled - any processes are killed - but why swap is full if > max 30% of mem (196 GB ) is used by VMs ? ( unmanaged process forking ) > > regs > > > On 03/10/2017 01:56 PM, Nir Soffer wrote: > > On Fri, Mar 10, 2017 at 1:07 PM, p...@email.cz <p...@email.cz> > <p...@email.cz> wrote: > > Hello everybody, > > for production usage i'm testing ovirt with gluster. > All components seems to be running fine but whenever I'm testing huge > workload, then node freez. Not the main OS, but VDSM mgmt and attached > services, VMs eg. > > What do you mean by freez? > > > mgmt > oVirt - 4.1.0.4 > centos 7.3-1611 > > > nodes ( installed from ovirt image > "ovirt-node-ng-installer-ovirt-4.1-2017030804.iso" ) > > OS Version: == RHEL - 7 - 3.1611.el7.centos > OS Description:== oVirt Node 4.1.0 > Kernel Version:== 3.10.0 - 514.10.2.el7.x86_64 > KVM Version:== 2.6.0 - 28.el7_3.3.1 > LIBVIRT Version:== libvirt-2.0.0-10.el7_3.5 > VDSM Version:== vdsm-4.19.4-1.el7.centos > SPICE Version:== 0.12.4 - 20.el7_3 > GlusterFS Version:== glusterfs-3.8.9-1.el7 ( LVM thinprovisioning in > replica 2 - created from ovirt GUI ) > > concurently running > - huge import from export domain ( net workload ) > - sequential write to VMs local disk ( gluster replica sequential workload ) > - VMs database huge select ( random IOps ) > - huge old snapshot delete ( random IOps ) > > In this configuration / workload is runnig one hour eg, with no exceptions > , with 70-80% disk load, but in some point VDSM freez all jobs for a > timeout and VM's are in "uknown" status . > The whole system revitalize then automaticaly in cca 20min time frame ( > except the import and snapshot deleting(rollback) ) > > engine.log - focus 10:39:07 time ( Failed in 'HSMGetAllTasksStatusesVDS' > method ) > ======== > > n child command id: 'a8a3a4d5-cf7d-4423-8243-022911232508' > type:'RemoveSnapshotSingleDiskLive' to complete > 2017-03-10 10:39:01,727+01 INFO > [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback] > (DefaultQuartzScheduler2) [759c8e1f] Command 'RemoveSnapshotSingleDiskLive' > (id: 'a8a3a4d5-cf7d-4423-8243-022911232508') waiting on child command id: > '33df2c1e-6ce3-44fd-a39b-d111883b4c4e' type:'DestroyImage' to complete > 2017-03-10 10:39:03,929+01 INFO > [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] > (DefaultQuartzScheduler5) [fde51205-3e8b-4b84-a478-352dc444ccc4] START, > GlusterServersListVDSCommand(HostName = 2kvm1, > VdsIdVDSCommandParametersBase:{runAsync='true', > hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 446d0cd3 > 2017-03-10 10:39:04,343+01 INFO > [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] > (DefaultQuartzScheduler5) [fde51205-3e8b-4b84-a478-352dc444ccc4] FINISH, > GlusterServersListVDSCommand, return: [172.16.5.163/24:CONNECTED, > 16.0.0.164:CONNECTED], log id: 446d0cd3 > 2017-03-10 10:39:04,353+01 INFO > [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] > (DefaultQuartzScheduler5) [fde51205-3e8b-4b84-a478-352dc444ccc4] START, > GlusterVolumesListVDSCommand(HostName = 2kvm1, > GlusterVolumesListVDSParameters:{runAsync='true', > hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 69ea1fda > 2017-03-10 10:39:05,128+01 INFO > [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] > (DefaultQuartzScheduler5) [fde51205-3e8b-4b84-a478-352dc444ccc4] FINISH, > GlusterVolumesListVDSCommand, return: > {8ded4083-2f31-489e-a60d-a315a5eb9b22=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@7765e4ad}, > log id: 69ea1fda > 2017-03-10 10:39:07,163+01 ERROR > [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] > (DefaultQuartzScheduler2) [759c8e1f] Failed in 'HSMGetAllTasksStatusesVDS' > method > 2017-03-10 10:39:07,178+01 ERROR > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (DefaultQuartzScheduler2) [759c8e1f] EVENT_ID: > VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call Stack: null, > Custom Event ID: -1, Message: VDSM 2kvm2 command HSMGetAllTasksStatusesVDS > failed: Connection timed out > 2017-03-10 10:39:07,182+01 INFO > [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler2) > [759c8e1f] BaseAsyncTask::onTaskEndSuccess: Task > 'f594bf69-619b-4d1b-8f6d-a9826997e478' (Parent Command 'ImportVm', > Parameters Type > 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended > successfully. > 2017-03-10 10:39:07,182+01 INFO > [org.ovirt.engine.core.bll.CommandMultiAsyncTasks] (DefaultQuartzScheduler2) > [759c8e1f] Task with DB Task ID 'a05c7c07-9b98-4ab2-ac7b-9e70a75ba7b7' and > VDSM Task ID '7c60369f-70a3-4a6a-80c9-4753ac9ed372' is in state Polling. End > action for command 8deb3fe3-4a83-4605-816c-ffdc63fd9ac1 will proceed when > all the entity's tasks are completed. > 2017-03-10 10:39:07,182+01 INFO > [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler2) > [759c8e1f] SPMAsyncTask::PollTask: Polling task > 'f351e8f6-6dd7-49aa-bf54-650d84fc6352' (Parent Command 'DestroyImage', > Parameters Type > 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned > status 'finished', result 'cleanSuccess'. > 2017-03-10 10:39:07,182+01 ERROR > [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler2) > [759c8e1f] BaseAsyncTask::logEndTaskFailure: Task > 'f351e8f6-6dd7-49aa-bf54-650d84fc6352' (Parent Command 'DestroyImage', > Parameters Type > 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended with > failure: > -- Result: 'cleanSuccess' > -- Message: 'VDSGenericException: VDSErrorException: Failed to > HSMGetAllTasksStatusesVDS, error = Connection timed out, code = 100', > -- Exception: 'VDSGenericException: VDSErrorException: Failed to > HSMGetAllTasksStatusesVDS, error = Connection timed out, code = 100' > 2017-03-10 10:39:07,184+01 INFO > [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (DefaultQuartzScheduler2) > [759c8e1f] CommandAsyncTask::endActionIfNecessary: All tasks of command > '33df2c1e-6ce3-44fd-a39b-d111883b4c4e' has ended -> executing 'endAction' > 2017-03-10 10:39:07,185+01 INFO > [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (DefaultQuartzScheduler2) > [759c8e1f] CommandAsyncTask::endAction: Ending action for '1' tasks (command > ID: '33df2c1e-6ce3-44fd-a39b-d111883b4c4e'): calling endAction '. > 2017-03-10 10:39:07,185+01 INFO > [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] > (org.ovirt.thread.pool-6-thread-31) [759c8e1f] > CommandAsyncTask::endCommandAction [within thread] context: Attempting to > endAction 'DestroyImage', > 2017-03-10 10:39:07,192+01 INFO > [org.ovirt.engine.core.bll.storage.disk.image.DestroyImageCommand] > (org.ovirt.thread.pool-6-thread-31) [759c8e1f] Command > [id=33df2c1e-6ce3-44fd-a39b-d111883b4c4e]: Updating status to 'FAILED', The > command end method logic will be executed by one of its parent commands. > 2017-03-10 10:39:07,192+01 INFO > [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] > (org.ovirt.thread.pool-6-thread-31) [759c8e1f] > CommandAsyncTask::HandleEndActionResult [within thread]: endAction for > action type 'DestroyImage' completed, handling the result. > 2017-03-10 10:39:07,192+01 INFO > [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] > (org.ovirt.thread.pool-6-thread-31) [759c8e1f] > CommandAsyncTask::HandleEndActionResult [within thread]: endAction for > action type 'DestroyImage' succeeded, clearing tasks. > 2017-03-10 10:39:07,192+01 INFO > [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] > (org.ovirt.thread.pool-6-thread-31) [759c8e1f] SPMAsyncTask::ClearAsyncTask: > Attempting to clear task 'f351e8f6-6dd7-49aa-bf54-650d84fc6352' > 2017-03-10 10:39:07,193+01 INFO > [org.ovirt.engine.core.vdsbroker.irsbroker.SPMClearTaskVDSCommand] > (org.ovirt.thread.pool-6-thread-31) [759c8e1f] START, > SPMClearTaskVDSCommand( > SPMTaskGuidBaseVDSCommandParameters:{runAsync='true', > storagePoolId='00000001-0001-0001-0001-000000000311', > ignoreFailoverLimit='false', > taskId='f351e8f6-6dd7-49aa-bf54-650d84fc6352'}), log id: 2b7080c2 > 2017-03-10 10:39:07,194+01 INFO > [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] > (org.ovirt.thread.pool-6-thread-31) [759c8e1f] START, > HSMClearTaskVDSCommand(HostName = 2kvm2, > HSMTaskGuidBaseVDSCommandParameters:{runAsync='true', > hostId='905375e1-6de4-4fdf-b69c-b2d546f869c8', > taskId='f351e8f6-6dd7-49aa-bf54-650d84fc6352'}), log id: 2edff460 > 2017-03-10 10:39:08,208+01 INFO > [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] > (org.ovirt.thread.pool-6-thread-31) [759c8e1f] FINISH, > HSMClearTaskVDSCommand, log id: 2edff460 > 2017-03-10 10:39:08,208+01 INFO > [org.ovirt.engine.core.vdsbroker.irsbroker.SPMClearTaskVDSCommand] > (org.ovirt.thread.pool-6-thread-31) [759c8e1f] FINISH, > SPMClearTaskVDSCommand, log id: 2b7080c2 > 2017-03-10 10:39:08,213+01 INFO > [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] > (org.ovirt.thread.pool-6-thread-31) [759c8e1f] > BaseAsyncTask::removeTaskFromDB: Removed task > 'f351e8f6-6dd7-49aa-bf54-650d84fc6352' from DataBase > 2017-03-10 10:39:08,213+01 INFO > [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] > (org.ovirt.thread.pool-6-thread-31) [759c8e1f] > CommandAsyncTask::HandleEndActionResult [within thread]: Removing > CommandMultiAsyncTasks object for entity > '33df2c1e-6ce3-44fd-a39b-d111883b4c4e' > 2017-03-10 10:39:10,142+01 INFO > [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] > (DefaultQuartzScheduler10) [a86dc7b5-52dc-40d4-a3b9-49d7eabbb93c] START, > GlusterServersListVDSCommand(HostName = 2kvm1, > VdsIdVDSCommandParametersBase:{runAsync='true', > hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 2e7278cb > 2017-03-10 10:39:11,513+01 INFO > [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] > (DefaultQuartzScheduler10) [a86dc7b5-52dc-40d4-a3b9-49d7eabbb93c] FINISH, > GlusterServersListVDSCommand, return: [172.16.5.163/24:CONNECTED, > 16.0.0.164:CONNECTED], log id: 2e7278cb > 2017-03-10 10:39:11,523+01 INFO > [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] > (DefaultQuartzScheduler10) [a86dc7b5-52dc-40d4-a3b9-49d7eabbb93c] START, > GlusterVolumesListVDSCommand(HostName = 2kvm1, > GlusterVolumesListVDSParameters:{runAsync='true', > hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 43704ef2 > 2017-03-10 10:39:11,777+01 INFO > [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] > (DefaultQuartzScheduler9) [67e1d8ed] Command 'RemoveSnapshot' (id: > '13c2cb7c-0809-4971-aceb-37ae66105ab7') waiting on child command id: > 'a8a3a4d5-cf7d-4423-8243-022911232508' type:'RemoveSnapshotSingleDiskLive' > to complete > 2017-03-10 10:39:11,789+01 WARN > [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] > (DefaultQuartzScheduler9) [759c8e1f] Child command 'DESTROY_IMAGE' failed, > proceeding to verify > 2017-03-10 10:39:11,789+01 INFO > [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] > (DefaultQuartzScheduler9) [759c8e1f] Executing Live Merge command step > 'DESTROY_IMAGE_CHECK' > 2017-03-10 10:39:11,832+01 INFO > [org.ovirt.engine.core.bll.DestroyImageCheckCommand] (pool-5-thread-7) > [4856f570] Running command: DestroyImageCheckCommand internal: true. > 2017-03-10 10:39:11,833+01 INFO > [org.ovirt.engine.core.vdsbroker.irsbroker.SPMGetVolumeInfoVDSCommand] > (pool-5-thread-7) [4856f570] START, SPMGetVolumeInfoVDSCommand( > SPMGetVolumeInfoVDSCommandParameters:{expectedEngineErrors='[VolumeDoesNotExist]', > runAsync='true', storagePoolId='00000001-0001-0001-0001-000000000311', > ignoreFailoverLimit='false', > storageDomainId='1603cd90-92ef-4c03-922c-cecb282fd00e', > imageGroupId='7543338a-3ca6-4698-bb50-c14f0bd71428', > imageId='50b592f7-bfba-4398-879c-8d6a19a2c000'}), log id: 2c8031f8 > 2017-03-10 10:39:11,833+01 INFO > [org.ovirt.engine.core.vdsbroker.irsbroker.SPMGetVolumeInfoVDSCommand] > (pool-5-thread-7) [4856f570] Executing GetVolumeInfo using the current SPM > 2017-03-10 10:39:11,834+01 INFO > [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] > (pool-5-thread-7) [4856f570] START, GetVolumeInfoVDSCommand(HostName = > 2kvm2, > GetVolumeInfoVDSCommandParameters:{expectedEngineErrors='[VolumeDoesNotExist]', > runAsync='true', hostId='905375e1-6de4-4fdf-b69c-b2d546f869c8', > storagePoolId='00000001-0001-0001-0001-000000000311', > storageDomainId='1603cd90-92ef-4c03-922c-cecb282fd00e', > imageGroupId='7543338a-3ca6-4698-bb50-c14f0bd71428', > imageId='50b592f7-bfba-4398-879c-8d6a19a2c000'}), log id: 79ca86cc > 2017-03-10 10:39:11,846+01 INFO > [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] > (DefaultQuartzScheduler10) [a86dc7b5-52dc-40d4-a3b9-49d7eabbb93c] FINISH, > GlusterVolumesListVDSCommand, return: > {8ded4083-2f31-489e-a60d-a315a5eb9b22=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@7765e4ad}, > log id: 43704ef2 > 2017-03-10 10:39:16,858+01 INFO > [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] > (DefaultQuartzScheduler7) [d82701d9-9fa3-467d-b273-f5fe5a93062f] START, > GlusterServersListVDSCommand(HostName = 2kvm1, > VdsIdVDSCommandParametersBase:{runAsync='true', > hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 6542adcd > 2017-03-10 10:39:17,394+01 INFO > [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] > (DefaultQuartzScheduler7) [d82701d9-9fa3-467d-b273-f5fe5a93062f] FINISH, > GlusterServersListVDSCommand, return: [172.16.5.163/24:CONNECTED, > 16.0.0.164:CONNECTED], log id: 6542adcd > 2017-03-10 10:39:17,406+01 INFO > [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] > (DefaultQuartzScheduler7) [d82701d9-9fa3-467d-b273-f5fe5a93062f] START, > GlusterVolumesListVDSCommand(HostName = 2kvm1, > GlusterVolumesListVDSParameters:{runAsync='true', > hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 44ec33ed > 2017-03-10 10:39:18,598+01 INFO > [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] > (DefaultQuartzScheduler7) [d82701d9-9fa3-467d-b273-f5fe5a93062f] FINISH, > GlusterVolumesListVDSCommand, return: > {8ded4083-2f31-489e-a60d-a315a5eb9b22=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@7765e4ad}, > log id: 44ec33ed > 2017-03-10 10:39:21,865+01 INFO > [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] > (DefaultQuartzScheduler6) [67e1d8ed] Command 'RemoveSnapshot' (id: > '13c2cb7c-0809-4971-aceb-37ae66105ab7') waiting on child command id: > 'a8a3a4d5-cf7d-4423-8243-022911232508' type:'RemoveSnapshotSingleDiskLive' > to complete > 2017-03-10 10:39:21,881+01 INFO > [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback] > (DefaultQuartzScheduler6) [4856f570] Command 'RemoveSnapshotSingleDiskLive' > (id: 'a8a3a4d5-cf7d-4423-8243-022911232508') waiting on child command id: > 'b1d63b8e-19d3-4d64-8fa8-4eb3e2d1a8fc' type:'DestroyImageCheck' to complete > 2017-03-10 10:39:23,611+01 INFO > [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] > (DefaultQuartzScheduler6) [4856f570] START, > GlusterServersListVDSCommand(HostName = 2kvm1, > VdsIdVDSCommandParametersBase:{runAsync='true', > hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 4c2fc22d > 2017-03-10 10:39:24,616+01 INFO > [org.ovirt.engine.core.vdsbroker.gluster.GlusterTasksListVDSCommand] > (DefaultQuartzScheduler7) [d82701d9-9fa3-467d-b273-f5fe5a93062f] START, > GlusterTasksListVDSCommand(HostName = 2kvm1, > VdsIdVDSCommandParametersBase:{runAsync='true', > hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 1f169371 > 2017-03-10 10:39:24,618+01 INFO > [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] > (DefaultQuartzScheduler6) [4856f570] FINISH, GlusterServersListVDSCommand, > return: [172.16.5.163/24:CONNECTED, 16.0.0.164:CONNECTED], log id: 4c2fc22d > 2017-03-10 10:39:24,629+01 INFO > [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] > (DefaultQuartzScheduler6) [4856f570] START, > GlusterVolumesListVDSCommand(HostName = 2kvm1, > GlusterVolumesListVDSParameters:{runAsync='true', > hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 2ac55735 > 2017-03-10 10:39:24,822+01 INFO > [org.ovirt.engine.core.vdsbroker.gluster.GlusterTasksListVDSCommand] > (DefaultQuartzScheduler7) [d82701d9-9fa3-467d-b273-f5fe5a93062f] FINISH, > GlusterTasksListVDSCommand, return: [], log id: 1f169371 > 2017-03-10 10:39:26,836+01 INFO > [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] > (DefaultQuartzScheduler6) [4856f570] FINISH, GlusterVolumesListVDSCommand, > return: > {8ded4083-2f31-489e-a60d-a315a5eb9b22=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@7765e4ad}, > log id: 2ac55735 > 2017-03-10 10:39:31,849+01 INFO > [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] > (DefaultQuartzScheduler8) [fde51205-3e8b-4b84-a478-352dc444ccc4] START, > GlusterServersListVDSCommand(HostName = 2kvm1, > VdsIdVDSCommandParametersBase:{runAsync='true', > hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 2e8dbcd1 > 2017-03-10 10:39:31,932+01 INFO > [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] > (DefaultQuartzScheduler6) [67e1d8ed] Command 'RemoveSnapshot' (id: > '13c2cb7c-0809-4971-aceb-37ae66105ab7') waiting on child command id: > 'a8a3a4d5-cf7d-4423-8243-022911232508' type:'RemoveSnapshotSingleDiskLive' > to complete > 2017-03-10 10:39:31,944+01 INFO > [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback] > (DefaultQuartzScheduler6) [4856f570] Command 'RemoveSnapshotSingleDiskLive' > (id: 'a8a3a4d5-cf7d-4423-8243-022911232508') waiting on child command id: > 'b1d63b8e-19d3-4d64-8fa8-4eb3e2d1a8fc' type:'DestroyImageCheck' to complete > 2017-03-10 10:39:33,213+01 INFO > [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] > (DefaultQuartzScheduler8) [fde51205-3e8b-4b84-a478-352dc444ccc4] FINISH, > GlusterServersListVDSCommand, return: [172.16.5.163/24:CONNECTED, > 16.0.0.164:CONNECTED], log id: 2e8dbcd1 > 2017-03-10 10:39:33,226+01 INFO > [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] > (DefaultQuartzScheduler8) [fde51205-3e8b-4b84-a478-352dc444ccc4] START, > GlusterVolumesListVDSCommand(HostName = 2kvm1, > GlusterVolumesListVDSParameters:{runAsync='true', > hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 1fb3f9e3 > 2017-03-10 10:39:34,375+01 INFO > [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] > (DefaultQuartzScheduler8) [fde51205-3e8b-4b84-a478-352dc444ccc4] FINISH, > GlusterVolumesListVDSCommand, return: > {8ded4083-2f31-489e-a60d-a315a5eb9b22=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@7765e4ad}, > log id: 1fb3f9e3 > 2017-03-10 10:39:39,392+01 INFO > [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] > (DefaultQuartzScheduler9) [12d6d15f-e054-4833-bd87-58f6a51e5fa6] START, > GlusterServersListVDSCommand(HostName = 2kvm1, > VdsIdVDSCommandParametersBase:{runAsync='true', > hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 1e0b8eeb > 2017-03-10 10:39:40,753+01 INFO > [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] > (DefaultQuartzScheduler9) [12d6d15f-e054-4833-bd87-58f6a51e5fa6] FINISH, > GlusterServersListVDSCommand, return: [172.16.5.163/24:CONNECTED, > 16.0.0.164:CONNECTED], log id: 1e0b8eeb > 2017-03-10 10:39:40,763+01 INFO > [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] > (DefaultQuartzScheduler9) [12d6d15f-e054-4833-bd87-58f6a51e5fa6] START, > GlusterVolumesListVDSCommand(HostName = 2kvm1, > GlusterVolumesListVDSParameters:{runAsync='true', > hostId='86876b79-71d8-4ae1-883b-ba010cd270e7'}), log id: 35b04b33 > 2017-03-10 10:39:41,952+01 INFO > [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] > (DefaultQuartzScheduler9) [12d6d15f-e054-4833-bd87-58f6a51e5fa6] FINISH, > GlusterVolumesListVDSCommand, return: > {8ded4083-2f31-489e-a60d-a315a5eb9b22=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@7765e4ad}, > log id: 35b04b33 > 2017-03-10 10:39:41,991+01 INFO > [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] > (DefaultQuartzScheduler6) [67e1d8ed] Command 'RemoveSnapshot' (id: > '13c2cb7c-0809-4971-aceb-37ae66105ab7') waiting on child command id: > 'a8a3a4d5-cf7d-4423-8243-022911232508' type:'RemoveSnapshotSingleDiskLive' > to complete > > > gluster ( nothing in logs ) > ====== > > > ## "etc-glusterfs-glusterd.vol.log" > [2017-03-10 10:13:52.599019] I [MSGID: 106499] > [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management: > Received status volume req for volume slow1 > [2017-03-10 10:16:48.639635] I [MSGID: 106499] > [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management: > Received status volume req for volume slow1 > The message "I [MSGID: 106499] > [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management: > Received status volume req for volume slow1" repeated 3 times between > [2017-03-10 10:16:48.639635] and [2017-03-10 10:17:55.659379] > [2017-03-10 10:18:56.875516] I [MSGID: 106499] > [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management: > Received status volume req for volume slow1 > [2017-03-10 10:19:57.204689] I [MSGID: 106499] > [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management: > Received status volume req for volume slow1 > [2017-03-10 10:21:56.576879] I [MSGID: 106499] > [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management: > Received status volume req for volume slow1 > [2017-03-10 10:21:57.772857] I [MSGID: 106499] > [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management: > Received status volume req for volume slow1 > [2017-03-10 10:24:00.617931] I [MSGID: 106499] > [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management: > Received status volume req for volume slow1 > [2017-03-10 10:30:04.918080] I [MSGID: 106499] > [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management: > Received status volume req for volume slow1 > [2017-03-10 10:31:06.128638] I [MSGID: 106499] > [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management: > Received status volume req for volume slow1 > [2017-03-10 10:32:07.325672] I [MSGID: 106499] > [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management: > Received status volume req for volume slow1 > [2017-03-10 10:32:12.433586] I [MSGID: 106499] > [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management: > Received status volume req for volume slow1 > [2017-03-10 10:32:13.544909] I [MSGID: 106499] > [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management: > Received status volume req for volume slow1 > [2017-03-10 10:35:10.039213] I [MSGID: 106499] > [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management: > Received status volume req for volume slow1 > [2017-03-10 10:37:19.905314] I [MSGID: 106499] > [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management: > Received status volume req for volume slow1 > [2017-03-10 10:37:20.174209] I [MSGID: 106499] > [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management: > Received status volume req for volume slow1 > [2017-03-10 10:38:12.635460] I [MSGID: 106499] > [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management: > Received status volume req for volume slow1 > [2017-03-10 10:40:14.169864] I [MSGID: 106499] > [glusterd-handler.c:4349:__glusterd_handle_status_volume] 0-management: > Received status volume req for volume slow1 > > > ## "rhev-data-center-mnt-glusterSD-localhost:_slow1.log" > [2017-03-10 09:43:40.346785] W [MSGID: 101159] [inode.c:1214:__inode_unlink] > 0-inode: > be318638-e8a0-4c6d-977d-7a937aa84806/b6f2d08d-2441-4111-ab62-e14abdfaf602.61849: > dentry not found in 43e6968f-9c2a-40d8-8074-caf1a36f60cf > [2017-03-10 09:43:40.347076] W [MSGID: 101159] [inode.c:1214:__inode_unlink] > 0-inode: > be318638-e8a0-4c6d-977d-7a937aa84806/b6f2d08d-2441-4111-ab62-e14abdfaf602.61879: > dentry not found in 902a6e3d-b7aa-439f-8262-cdc1b7f9f022 > [2017-03-10 09:43:40.347145] W [MSGID: 101159] [inode.c:1214:__inode_unlink] > 0-inode: > be318638-e8a0-4c6d-977d-7a937aa84806/b6f2d08d-2441-4111-ab62-e14abdfaf602.61935: > dentry not found in 846bbcfc-f2b3-4ab6-af44-aeaa10b39318 > [2017-03-10 09:43:40.347211] W [MSGID: 101159] [inode.c:1214:__inode_unlink] > 0-inode: > be318638-e8a0-4c6d-977d-7a937aa84806/b6f2d08d-2441-4111-ab62-e14abdfaf602.61922: > dentry not found in 66ad3bc5-26c7-4360-b33b-a084e3305cf8 > [2017-03-10 09:43:40.351571] W [MSGID: 101159] [inode.c:1214:__inode_unlink] > 0-inode: > be318638-e8a0-4c6d-977d-7a937aa84806/b6f2d08d-2441-4111-ab62-e14abdfaf602.61834: > dentry not found in 3b8278e1-40e5-4363-b21e-7bffcd024c62 > [2017-03-10 09:43:40.352449] W [MSGID: 101159] [inode.c:1214:__inode_unlink] > 0-inode: > be318638-e8a0-4c6d-977d-7a937aa84806/b6f2d08d-2441-4111-ab62-e14abdfaf602.61870: > dentry not found in 282f4c05-e09a-48e0-96a3-52e079ff2f73 > [2017-03-10 09:50:38.829325] I [MSGID: 109066] > [dht-rename.c:1569:dht_rename] 0-slow1-dht: renaming > /1603cd90-92ef-4c03-922c-cecb282fd00e/images/014ca3aa-d5f5-4b88-8f84-be8d4c5dfc1e/f147532a-89fa-49e0-8225-f82343fca8be.meta.new > (hash=slow1-replicate-0/cache=slow1-replicate-0) => > /1603cd90-92ef-4c03-922c-cecb282fd00e/images/014ca3aa-d5f5-4b88-8f84-be8d4c5dfc1e/f147532a-89fa-49e0-8225-f82343fca8be.meta > (hash=slow1-replicate-0/cache=slow1-replicate-0) > [2017-03-10 09:50:42.221775] I [MSGID: 109066] > [dht-rename.c:1569:dht_rename] 0-slow1-dht: renaming > /1603cd90-92ef-4c03-922c-cecb282fd00e/images/4cf7dd90-9dcc-428c-82bc-fbf08dbee0be/12812d56-1606-4bf8-a391-0a2cacbd020b.meta.new > (hash=slow1-replicate-0/cache=slow1-replicate-0) => > /1603cd90-92ef-4c03-922c-cecb282fd00e/images/4cf7dd90-9dcc-428c-82bc-fbf08dbee0be/12812d56-1606-4bf8-a391-0a2cacbd020b.meta > (hash=slow1-replicate-0/cache=slow1-replicate-0) > [2017-03-10 09:50:45.956432] I [MSGID: 109066] > [dht-rename.c:1569:dht_rename] 0-slow1-dht: renaming > /1603cd90-92ef-4c03-922c-cecb282fd00e/images/3cef54b4-45b9-4f5b-82c2-fcc8def06a37/85287865-38f0-45df-9e6c-1294913cbb88.meta.new > (hash=slow1-replicate-0/cache=slow1-replicate-0) => > /1603cd90-92ef-4c03-922c-cecb282fd00e/images/3cef54b4-45b9-4f5b-82c2-fcc8def06a37/85287865-38f0-45df-9e6c-1294913cbb88.meta > (hash=slow1-replicate-0/cache=slow1-replicate-0) > [2017-03-10 09:50:40.349563] I [MSGID: 109066] > [dht-rename.c:1569:dht_rename] 0-slow1-dht: renaming > /1603cd90-92ef-4c03-922c-cecb282fd00e/images/014ca3aa-d5f5-4b88-8f84-be8d4c5dfc1e/f147532a-89fa-49e0-8225-f82343fca8be.meta.new > (hash=slow1-replicate-0/cache=slow1-replicate-0) => > /1603cd90-92ef-4c03-922c-cecb282fd00e/images/014ca3aa-d5f5-4b88-8f84-be8d4c5dfc1e/f147532a-89fa-49e0-8225-f82343fca8be.meta > (hash=slow1-replicate-0/cache=slow1-replicate-0) > [2017-03-10 09:50:44.503866] I [MSGID: 109066] > [dht-rename.c:1569:dht_rename] 0-slow1-dht: renaming > /1603cd90-92ef-4c03-922c-cecb282fd00e/images/4cf7dd90-9dcc-428c-82bc-fbf08dbee0be/12812d56-1606-4bf8-a391-0a2cacbd020b.meta.new > (hash=slow1-replicate-0/cache=slow1-replicate-0) => > /1603cd90-92ef-4c03-922c-cecb282fd00e/images/4cf7dd90-9dcc-428c-82bc-fbf08dbee0be/12812d56-1606-4bf8-a391-0a2cacbd020b.meta > (hash=slow1-replicate-0/cache=slow1-replicate-0) > [2017-03-10 09:59:46.860762] W [MSGID: 101159] [inode.c:1214:__inode_unlink] > 0-inode: > be318638-e8a0-4c6d-977d-7a937aa84806/6e105aa3-a3fc-4aca-be50-78b7642c4072.6684: > dentry not found in d1e65eea-8758-4407-ac2e-3605dc661364 > [2017-03-10 10:02:22.500865] W [MSGID: 101159] [inode.c:1214:__inode_unlink] > 0-inode: > be318638-e8a0-4c6d-977d-7a937aa84806/6e105aa3-a3fc-4aca-be50-78b7642c4072.8767: > dentry not found in e228bb28-9602-4f8e-8323-7434d77849fc > [2017-03-10 10:04:03.103839] W [MSGID: 101159] [inode.c:1214:__inode_unlink] > 0-inode: > be318638-e8a0-4c6d-977d-7a937aa84806/6e105aa3-a3fc-4aca-be50-78b7642c4072.9787: > dentry not found in 6be71632-aa36-4975-b673-1357e0355027 > [2017-03-10 10:06:02.406385] I [MSGID: 109066] > [dht-rename.c:1569:dht_rename] 0-slow1-dht: renaming > /1603cd90-92ef-4c03-922c-cecb282fd00e/images/2a9c1c6a-f045-4dce-a47b-95a2267eef72/6f264695-0669-4b49-a2f6-e6c92482f2fb.meta.new > (hash=slow1-replicate-0/cache=slow1-replicate-0) => > /1603cd90-92ef-4c03-922c-cecb282fd00e/images/2a9c1c6a-f045-4dce-a47b-95a2267eef72/6f264695-0669-4b49-a2f6-e6c92482f2fb.meta > (hash=slow1-replicate-0/cache=slow1-replicate-0) > ... no other record > > > messages > ======== > > several times occured: > > Mar 10 09:04:38 2kvm2 lvmetad: WARNING: Ignoring unsupported value for cmd. > Mar 10 09:04:38 2kvm2 lvmetad: WARNING: Ignoring unsupported value for cmd. > Mar 10 09:04:38 2kvm2 lvmetad: WARNING: Ignoring unsupported value for cmd. > Mar 10 09:04:38 2kvm2 lvmetad: WARNING: Ignoring unsupported value for cmd. > Mar 10 09:10:01 2kvm2 systemd: Started Session 274 of user root. > Mar 10 09:10:01 2kvm2 systemd: Starting Session 274 of user root. > Mar 10 09:20:02 2kvm2 systemd: Started Session 275 of user root. > Mar 10 09:20:02 2kvm2 systemd: Starting Session 275 of user root. > Mar 10 09:22:59 2kvm2 sanlock[1673]: 2017-03-10 09:22:59+0100 136031 [2576]: > s3 delta_renew long write time 11 sec > > Sanlock cannot write to storage > > > Mar 10 09:24:03 2kvm2 kernel: kswapd1: page allocation failure: order:2, > mode:0x104020 > > Kernel cannot allocate page? > > > Mar 10 09:24:03 2kvm2 kernel: CPU: 42 PID: 265 Comm: kswapd1 Tainted: G > I ------------ 3.10.0-514.10.2.el7.x86_64 #1 > Mar 10 09:24:03 2kvm2 kernel: Hardware name: Supermicro X10DRC/X10DRi-LN4+, > BIOS 1.0a 08/29/2014 > Mar 10 09:24:03 2kvm2 kernel: 0000000000104020 00000000f7228dc9 > ffff88301f4839d8 ffffffff816864ef > Mar 10 09:24:03 2kvm2 kernel: ffff88301f483a68 ffffffff81186ba0 > 000068fc00000000 0000000000000000 > Mar 10 09:24:03 2kvm2 kernel: fffffffffffffffc 0010402000000000 > ffff88301567ae80 00000000f7228dc9 > Mar 10 09:24:03 2kvm2 kernel: Call Trace: > Mar 10 09:24:03 2kvm2 kernel: <IRQ> [<ffffffff816864ef>] > dump_stack+0x19/0x1b > Mar 10 09:24:03 2kvm2 kernel: [<ffffffff81186ba0>] > warn_alloc_failed+0x110/0x180 > Mar 10 09:24:03 2kvm2 kernel: [<ffffffff81682083>] > __alloc_pages_slowpath+0x6b7/0x725 > Mar 10 09:24:03 2kvm2 kernel: [<ffffffff8118b155>] > __alloc_pages_nodemask+0x405/0x420 > Mar 10 09:24:03 2kvm2 kernel: [<ffffffff811cf30a>] > alloc_pages_current+0xaa/0x170 > Mar 10 09:24:03 2kvm2 kernel: [<ffffffff81185a7e>] __get_free_pages+0xe/0x50 > Mar 10 09:24:03 2kvm2 kernel: [<ffffffff811dabae>] > kmalloc_order_trace+0x2e/0xa0 > Mar 10 09:24:03 2kvm2 kernel: [<ffffffff811dd381>] __kmalloc+0x221/0x240 > Mar 10 09:24:03 2kvm2 kernel: [<ffffffffa02f83fa>] > bnx2x_frag_alloc.isra.62+0x2a/0x40 [bnx2x] > Mar 10 09:24:03 2kvm2 kernel: [<ffffffffa02f92f7>] bnx2x_rx_int+0x227/0x17b0 > [bnx2x] > Mar 10 09:24:03 2kvm2 kernel: [<ffffffff81033669>] ? sched_clock+0x9/0x10 > Mar 10 09:24:03 2kvm2 kernel: [<ffffffffa02fc72d>] bnx2x_poll+0x1dd/0x260 > [bnx2x] > Mar 10 09:24:03 2kvm2 kernel: [<ffffffff815705e0>] net_rx_action+0x170/0x380 > Mar 10 09:24:03 2kvm2 kernel: [<ffffffff8108f2cf>] __do_softirq+0xef/0x280 > Mar 10 09:24:03 2kvm2 kernel: [<ffffffff8169859c>] call_softirq+0x1c/0x30 > Mar 10 09:24:03 2kvm2 kernel: [<ffffffff8102d365>] do_softirq+0x65/0xa0 > Mar 10 09:24:03 2kvm2 kernel: [<ffffffff8108f665>] irq_exit+0x115/0x120 > Mar 10 09:24:03 2kvm2 kernel: [<ffffffff81699138>] do_IRQ+0x58/0xf0 > Mar 10 09:24:03 2kvm2 kernel: [<ffffffff8168e2ad>] > common_interrupt+0x6d/0x6d > Mar 10 09:24:03 2kvm2 kernel: <EOI> [<ffffffff81189a73>] ? > free_hot_cold_page+0x103/0x160 > Mar 10 09:24:03 2kvm2 kernel: [<ffffffff81189b16>] > free_hot_cold_page_list+0x46/0xa0 > Mar 10 09:24:03 2kvm2 kernel: [<ffffffff81195193>] > shrink_page_list+0x543/0xb00 > Mar 10 09:24:03 2kvm2 kernel: [<ffffffff81195dda>] > shrink_inactive_list+0x1fa/0x630 > Mar 10 09:24:03 2kvm2 kernel: [<ffffffff81196975>] shrink_lruvec+0x385/0x770 > Mar 10 09:24:03 2kvm2 kernel: [<ffffffff81196dd6>] shrink_zone+0x76/0x1a0 > Mar 10 09:24:03 2kvm2 kernel: [<ffffffff8119807c>] balance_pgdat+0x48c/0x5e0 > Mar 10 09:24:03 2kvm2 kernel: [<ffffffff81198343>] kswapd+0x173/0x450 > Mar 10 09:24:03 2kvm2 kernel: [<ffffffff810b17d0>] ? > wake_up_atomic_t+0x30/0x30 > Mar 10 09:24:03 2kvm2 kernel: [<ffffffff811981d0>] ? > balance_pgdat+0x5e0/0x5e0 > Mar 10 09:24:03 2kvm2 kernel: [<ffffffff810b06ff>] kthread+0xcf/0xe0 > Mar 10 09:24:03 2kvm2 kernel: [<ffffffff810b0630>] ? > kthread_create_on_node+0x140/0x140 > Mar 10 09:24:03 2kvm2 kernel: [<ffffffff81696a58>] ret_from_fork+0x58/0x90 > Mar 10 09:24:03 2kvm2 kernel: [<ffffffff810b0630>] ? > kthread_create_on_node+0x140/0x140 > Mar 10 09:24:03 2kvm2 kernel: kswapd1: page allocation failure: order:2, > mode:0x104020 > Mar 10 09:24:03 2kvm2 kernel: CPU: 42 PID: 265 Comm: kswapd1 Tainted: G > I ------------ 3.10.0-514.10.2.el7.x86_64 #1 > Mar 10 09:24:03 2kvm2 kernel: Hardware name: Supermicro X10DRC/X10DRi-LN4+, > BIOS 1.0a 08/29/2014 > Mar 10 09:24:03 2kvm2 kernel: 0000000000104020 00000000f7228dc9 > ffff88301f4839d8 ffffffff816864ef > > > and in critical time again > > Mar 10 10:37:53 2kvm2 sanlock[1673]: 2017-03-10 10:37:53+0100 140524 [1673]: > s3 check_our_lease warning 73 last_success 140451 > Mar 10 10:37:54 2kvm2 sanlock[1673]: 2017-03-10 10:37:54+0100 140525 [1673]: > s3 check_our_lease warning 74 last_success 140451 > > Sanlock could not renew the lease for 74 seconds > > > Mar 10 10:37:54 2kvm2 wdmd[1732]: test warning now 140526 ping 140516 close > 0 renewal 140451 expire 140531 client 1673 > sanlock_1603cd90-92ef-4c03-922c-cecb282fd00e:1 > Mar 10 10:37:54 2kvm2 kernel: watchdog watchdog0: watchdog did not stop! > Mar 10 10:37:54 2kvm2 wdmd[1732]: /dev/watchdog0 closed unclean > Mar 10 10:37:55 2kvm2 sanlock[1673]: 2017-03-10 10:37:55+0100 140526 [1673]: > s3 check_our_lease warning 75 last_success 140451 > Mar 10 10:37:55 2kvm2 wdmd[1732]: test warning now 140527 ping 140516 close > 140526 renewal 140451 expire 140531 client 1673 > sanlock_1603cd90-92ef-4c03-922c-cecb282fd00e:1 > Mar 10 10:37:56 2kvm2 sanlock[1673]: 2017-03-10 10:37:56+0100 140527 [1673]: > s3 check_our_lease warning 76 last_success 140451 > Mar 10 10:37:56 2kvm2 wdmd[1732]: test warning now 140528 ping 140516 close > 140526 renewal 140451 expire 140531 client 1673 > sanlock_1603cd90-92ef-4c03-922c-cecb282fd00e:1 > Mar 10 10:37:57 2kvm2 sanlock[1673]: 2017-03-10 10:37:57+0100 140528 [1673]: > s3 check_our_lease warning 77 last_success 140451 > Mar 10 10:37:57 2kvm2 wdmd[1732]: test warning now 140529 ping 140516 close > 140526 renewal 140451 expire 140531 client 1673 > sanlock_1603cd90-92ef-4c03-922c-cecb282fd00e:1 > Mar 10 10:37:58 2kvm2 sanlock[1673]: 2017-03-10 10:37:58+0100 140529 [1673]: > s3 check_our_lease warning 78 last_success 140451 > Mar 10 10:37:58 2kvm2 wdmd[1732]: test warning now 140530 ping 140516 close > 140526 renewal 140451 expire 140531 client 1673 > sanlock_1603cd90-92ef-4c03-922c-cecb282fd00e:1 > Mar 10 10:37:59 2kvm2 sanlock[1673]: 2017-03-10 10:37:59+0100 140530 [1673]: > s3 check_our_lease warning 79 last_success 140451 > Mar 10 10:37:59 2kvm2 wdmd[1732]: test failed rem 55 now 140531 ping 140516 > close 140526 renewal 140451 expire 140531 client 1673 > sanlock_1603cd90-92ef-4c03-922c-cecb282fd00e:1 > Mar 10 10:38:00 2kvm2 sanlock[1673]: 2017-03-10 10:38:00+0100 140531 [1673]: > s3 check_our_lease failed 80 > > Sanlock fail to renew the lease after 80 seconds - game over > > > Mar 10 10:38:00 2kvm2 sanlock[1673]: 2017-03-10 10:38:00+0100 140531 [1673]: > s3 all pids clear > > If this host is the SPM, sanlock just killed vdsm, this explains why > your storage operation fail. > > > Mar 10 10:38:01 2kvm2 wdmd[1732]: /dev/watchdog0 reopen > Mar 10 10:38:10 2kvm2 journal: Cannot start job (query, none) for domain > TEST-LBS_EBSAPP; current job is (query, none) owned by (3284 > remoteDispatchConnectGetAllDomainStats, 0 <null>) for (62s, 0s) > Mar 10 10:38:10 2kvm2 journal: Timed out during operation: cannot acquire > state change lock (held by remoteDispatchConnectGetAllDomainStats) > Mar 10 10:38:11 2kvm2 journal: vdsm vds.dispatcher ERROR SSL error receiving > from <yajsonrpc.betterAsyncore.Dispatcher connected ('::1', 40590, 0, 0) at > 0x3acdd88>: unexpected eof > Mar 10 10:38:40 2kvm2 journal: Cannot start job (query, none) for domain > TEST1-LBS_ATRYA; current job is (query, none) owned by (3288 > remoteDispatchConnectGetAllDomainStats, 0 <null>) for (47s, 0s) > Mar 10 10:38:40 2kvm2 journal: Timed out during operation: cannot acquire > state change lock (held by remoteDispatchConnectGetAllDomainStats) > Mar 10 10:38:41 2kvm2 journal: vdsm vds.dispatcher ERROR SSL error receiving > from <yajsonrpc.betterAsyncore.Dispatcher connected ('::1', 40592, 0, 0) at > 0x3fd5b90>: unexpected eof > Mar 10 10:39:10 2kvm2 journal: Cannot start job (query, none) for domain > TEST-LBS_EBSAPP; current job is (query, none) owned by (3284 > remoteDispatchConnectGetAllDomainStats, 0 <null>) for (122s, 0s) > Mar 10 10:39:10 2kvm2 journal: Timed out during operation: cannot acquire > state change lock (held by remoteDispatchConnectGetAllDomainStats) > Mar 10 10:39:10 2kvm2 journal: Cannot start job (query, none) for domain > TEST1-LBS_ATRYA; current job is (query, none) owned by (3288 > remoteDispatchConnectGetAllDomainStats, 0 <null>) for (77s, 0s) > Mar 10 10:39:10 2kvm2 journal: Timed out during operation: cannot acquire > state change lock (held by remoteDispatchConnectGetAllDomainStats) > Mar 10 10:39:11 2kvm2 journal: vdsm vds.dispatcher ERROR SSL error receiving > from <yajsonrpc.betterAsyncore.Dispatcher connected ('::1', 40594, 0, 0) at > 0x2447290>: unexpected eof > Mar 10 10:39:23 2kvm2 sanlock[1673]: 2017-03-10 10:39:23+0100 140615 [2576]: > s3 delta_renew write time 140 error -202 > Mar 10 10:39:23 2kvm2 sanlock[1673]: 2017-03-10 10:39:23+0100 140615 [2576]: > s3 renewal error -202 delta_length 144 last_success 140451 > Mar 10 10:39:40 2kvm2 journal: Cannot start job (query, none) for domain > TEST-LBS_EBSAPP; current job is (query, none) owned by (3284 > remoteDispatchConnectGetAllDomainStats, 0 <null>) for (152s, 0s) > Mar 10 10:39:40 2kvm2 journal: Timed out during operation: cannot acquire > state change lock (held by remoteDispatchConnectGetAllDomainStats) > Mar 10 10:39:40 2kvm2 journal: Cannot start job (query, none) for domain > TEST1-LBS_ATRYA; current job is (query, none) owned by (3288 > remoteDispatchConnectGetAllDomainStats, 0 <null>) for (107s, 0s) > Mar 10 10:39:40 2kvm2 journal: Timed out during operation: cannot acquire > state change lock (held by remoteDispatchConnectGetAllDomainStats) > Mar 10 10:39:41 2kvm2 journal: vdsm vds.dispatcher ERROR SSL error receiving > from <yajsonrpc.betterAsyncore.Dispatcher connected ('::1', 40596, 0, 0) at > 0x2472ef0>: unexpected eof > Mar 10 10:39:49 2kvm2 kernel: INFO: task qemu-img:42107 blocked for more > than 120 seconds. > > qemu-img is blocked for more than 120 seconds. > > > Mar 10 10:39:49 2kvm2 kernel: "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Mar 10 10:39:49 2kvm2 kernel: qemu-img D ffff88010dad3e30 0 42107 > 3631 0x00000080 > Mar 10 10:39:49 2kvm2 kernel: ffff88010dad3b30 0000000000000082 > ffff8814491f4e70 ffff88010dad3fd8 > Mar 10 10:39:49 2kvm2 kernel: ffff88010dad3fd8 ffff88010dad3fd8 > ffff8814491f4e70 ffff88301f096c40 > Mar 10 10:39:49 2kvm2 kernel: 0000000000000000 7fffffffffffffff > ffff88181f186c00 ffff88010dad3e30 > Mar 10 10:39:49 2kvm2 kernel: Call Trace: > Mar 10 10:39:49 2kvm2 kernel: [<ffffffff8168bbb9>] schedule+0x29/0x70 > Mar 10 10:39:49 2kvm2 kernel: [<ffffffff81689609>] > schedule_timeout+0x239/0x2d0 > Mar 10 10:39:49 2kvm2 kernel: [<ffffffff8168b15e>] > io_schedule_timeout+0xae/0x130 > Mar 10 10:39:49 2kvm2 kernel: [<ffffffff8168b1f8>] io_schedule+0x18/0x20 > Mar 10 10:39:49 2kvm2 kernel: [<ffffffff8124d9e5>] > wait_on_sync_kiocb+0x35/0x80 > Mar 10 10:39:49 2kvm2 kernel: [<ffffffffa0a36091>] > fuse_direct_IO+0x231/0x380 [fuse] > Mar 10 10:39:49 2kvm2 kernel: [<ffffffff812a6ddd>] ? > cap_inode_need_killpriv+0x2d/0x40 > Mar 10 10:39:49 2kvm2 kernel: [<ffffffff812a8cb6>] ? > security_inode_need_killpriv+0x16/0x20 > Mar 10 10:39:49 2kvm2 kernel: [<ffffffff81219e3f>] ? > dentry_needs_remove_privs.part.13+0x1f/0x30 > Mar 10 10:39:49 2kvm2 kernel: [<ffffffff81182a2d>] > generic_file_direct_write+0xcd/0x190 > Mar 10 10:39:49 2kvm2 kernel: [<ffffffffa0a36905>] > fuse_file_aio_write+0x185/0x340 [fuse] > Mar 10 10:39:49 2kvm2 kernel: [<ffffffff811fdabd>] do_sync_write+0x8d/0xd0 > Mar 10 10:39:49 2kvm2 kernel: [<ffffffff811fe32d>] vfs_write+0xbd/0x1e0 > Mar 10 10:39:49 2kvm2 kernel: [<ffffffff811ff002>] SyS_pwrite64+0x92/0xc0 > Mar 10 10:39:49 2kvm2 kernel: [<ffffffff81696b09>] > system_call_fastpath+0x16/0x1b > Mar 10 10:39:49 2kvm2 kernel: INFO: task qemu-img:42111 blocked for more > than 120 seconds. > Mar 10 10:39:49 2kvm2 kernel: "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Mar 10 10:39:49 2kvm2 kernel: qemu-img D ffff8818a76e7e30 0 42111 > 3632 0x00000080 > Mar 10 10:39:49 2kvm2 kernel: ffff8818a76e7b30 0000000000000082 > ffff88188aaeaf10 ffff8818a76e7fd8 > Mar 10 10:39:49 2kvm2 kernel: ffff8818a76e7fd8 ffff8818a76e7fd8 > ffff88188aaeaf10 ffff88301f156c40 > > memory > ======= > > # cat /proc/meminfo > MemTotal: 197983472 kB > MemFree: 834228 kB > MemAvailable: 165541204 kB > Buffers: 45548 kB > Cached: 159596272 kB > SwapCached: 119872 kB > Active: 40803264 kB > Inactive: 148022076 kB > Active(anon): 26594112 kB > Inactive(anon): 2626384 kB > Active(file): 14209152 kB > Inactive(file): 145395692 kB > Unevictable: 50488 kB > Mlocked: 50488 kB > SwapTotal: 4194300 kB > SwapFree: 3612188 kB > Dirty: 624 kB > Writeback: 0 kB > AnonPages: 29185032 kB > Mapped: 85176 kB > Shmem: 25908 kB > Slab: 6203384 kB > SReclaimable: 5857240 kB > SUnreclaim: 346144 kB > KernelStack: 19184 kB > PageTables: 86100 kB > NFS_Unstable: 0 kB > Bounce: 0 kB > WritebackTmp: 0 kB > CommitLimit: 103186036 kB > Committed_AS: 52300288 kB > VmallocTotal: 34359738367 kB > VmallocUsed: 1560580 kB > VmallocChunk: 34257341440 kB > HardwareCorrupted: 0 kB > AnonHugePages: 5566464 kB > HugePages_Total: 0 > HugePages_Free: 0 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > Hugepagesize: 2048 kB > DirectMap4k: 431292 kB > DirectMap2M: 19382272 kB > DirectMap1G: 183500800 kB > > > can anybody help me with this ?? > I've got a small tip about swap problem ( in messages), but not shure ..... > The similar problem occured in older versions in gluster/ovirt testing ( in > huge workload freeez - but not fatal overload ) > > You have a storage issue, you should understand why > your storage is failing. > > There is also kernel failure to allocated page, maybe this is > related to the storage failure? > > Nir > > > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > >
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users