On Tue, Aug 22, 2017 at 1:08 PM, Matthias Leopold <matthias.leop...@meduniwien.ac.at> wrote: > > > Am 2017-08-22 um 09:33 schrieb Maor Lipchuk: >> >> On Mon, Aug 21, 2017 at 6:12 PM, Matthias Leopold >> <matthias.leop...@meduniwien.ac.at> wrote: >>> >>> Hi, >>> >>> we're experimenting with Cinder/Ceph Storage on oVirt 4.1.3. When we >>> tried >>> to snapshot a VM (2 disks on Cinder storage domain) the task never >>> finished >>> and now seems to be in an uninterruptible loop. We tried to stop it in >>> various (brute force) ways, but the below messages (one of the disks as >>> an >>> example) are cluttering engine.log every 10 seconds. We tried the >>> following: >>> >>> - deleting the VM >>> - restarting ovirt-engine service >>> - vdsClient -s 0 getAllTasksStatuses on SPM host (no result) >>> - restarting vdsmd service on SPM host >>> - /usr/share/ovirt-engine/setup/dbutils/taskcleaner.sh -u engine -d >>> engine >>> -c c841c979-70ea-4e06-b9c4-9c5ce014d76d >>> >>> None of this helped. How do we get rid of this failed transaction? >>> >>> thx >>> matthias >>> >>> 2017-08-21 16:40:44,798+02 INFO >>> [org.ovirt.engine.core.utils.transaction.TransactionSupport] >>> (DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] >>> transaction >>> rolled back >>> 2017-08-21 16:40:44,799+02 ERROR >>> [org.ovirt.engine.core.bll.job.ExecutionHandler] >>> (DefaultQuartzScheduler7) >>> [080af640-bac3-4990-8bf4-6829551b538d] Exception: >>> org.springframework.dao.DataIntegrityViolationException: >>> CallableStatementCallback; SQL [{call insertstep(?, ?, ?, ?, ?, ?, ?, ?, >>> ?, >>> ?, ?, ?, ?, ?)}]; ERROR: insert or update on table "step" violates >>> foreign >>> key constraint "fk_step_job" >>> 2017-08-21 16:40:44,805+02 ERROR >>> [org.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand] >>> (DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] Ending >>> command >>> 'org.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand' >>> with >>> failure. >>> 2017-08-21 16:40:44,807+02 WARN >>> [org.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand] >>> (DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] No >>> snapshot >>> was created for VM 'c0235316-81c4-48be-9521-b86b338c7d20' which is in >>> LOCKED >>> status >>> 2017-08-21 16:40:44,810+02 INFO >>> [org.ovirt.engine.core.utils.transaction.TransactionSupport] >>> (DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] >>> transaction >>> rolled back >>> 2017-08-21 16:40:44,810+02 WARN >>> [org.ovirt.engine.core.bll.lock.InMemoryLockManager] >>> (DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] Trying >>> to >>> release exclusive lock which does not exist, lock key: >>> 'c0235316-81c4-48be-9521-b86b338c7d20VM' >>> 2017-08-21 16:40:44,810+02 INFO >>> [org.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand] >>> (DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] Lock >>> freed >>> to object >>> 'EngineLock:{exclusiveLocks='[c0235316-81c4-48be-9521-b86b338c7d20=VM]', >>> sharedLocks=''}' >>> 2017-08-21 16:40:44,829+02 ERROR >>> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] >>> (DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] >>> EVENT_ID: >>> USER_CREATE_SNAPSHOT_FINISHED_FAILURE(69), Correlation ID: >>> 080af640-bac3-4990-8bf4-6829551b538d, Job ID: >>> a3be8af1-8d33-4d35-9672-215ac7c9959f, Call Stack: null, Custom Event ID: >>> -1, >>> Message: Failed to complete snapshot 'test' creation for VM '<UNKNOWN>'. >>> 2017-08-21 16:40:44,829+02 ERROR >>> [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] >>> (DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] Failed >>> invoking callback end method 'onFailed' for command >>> 'c841c979-70ea-4e06-b9c4-9c5ce014d76d' with exception 'null', the >>> callback >>> is marked for end method retries >>> >>> > >> >> >> Hi Matthias, >> >> Can you please attach the full engine log contains the first error >> occurred so we can trace its origin and fix it? >> Does it reproduced constantly? >> >> The engine does not use VDSM tasks to manage Cinder, the engine use >> Cinder as an external provider using the COCO infrastructure for async >> tasks. >> The COCO tasks are managed in the database using the command_entities >> table, basically if you will remove all references of the command id >> from the command_entities and restart engine you should not see it any >> more. >> >> Regards, >> Maor >> > > Hi Maor, > > thanks very much for replying. First i tried to clean the command_entities > table plus restarting engine as you suggested. This didn't work entirely, > these two entries
Can you please try first to stop the engine and only then clean the command_entities. > > engine=# select command_id, command_type, root_command_id, status from > command_entities; > command_id | command_type | root_command_id > | status > --------------------------------------+--------------+--------------------------------------+-------- > c841c979-70ea-4e06-b9c4-9c5ce014d76d | 206 | > c841c979-70ea-4e06-b9c4-9c5ce014d76d | FAILED > 65fa094e-1609-47ea-bf0d-611e3d5b9358 | 206 | > 65fa094e-1609-47ea-bf0d-611e3d5b9358 | FAILED > > > keep appearing and still cause messages in engine.log like > > 2017-08-22 11:54:57,109+02 WARN > [org.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand] > (DefaultQuartzScheduler8) [080af640-bac3-4990-8bf4-6829551b538d] No snapshot > was created for VM 'c0235316-81c4-48be-9521-b86b338c7d20' which is in LOCKED > status > 2017-08-22 11:54:57,112+02 INFO > [org.ovirt.engine.core.utils.transaction.TransactionSupport] > (DefaultQuartzScheduler8) [080af640-bac3-4990-8bf4-6829551b538d] transaction > rolled back > 2017-08-22 11:54:57,112+02 WARN > [org.ovirt.engine.core.bll.lock.InMemoryLockManager] > (DefaultQuartzScheduler8) [080af640-bac3-4990-8bf4-6829551b538d] Trying to > release exclusive lock which does not exist, lock key: > 'c0235316-81c4-48be-9521-b86b338c7d20VM' > 2017-08-22 11:54:57,112+02 INFO > [org.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand] > (DefaultQuartzScheduler8) [080af640-bac3-4990-8bf4-6829551b538d] Lock freed > to object > 'EngineLock:{exclusiveLocks='[c0235316-81c4-48be-9521-b86b338c7d20=VM]', > sharedLocks=''}' > 2017-08-22 11:54:57,131+02 ERROR > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (DefaultQuartzScheduler8) [080af640-bac3-4990-8bf4-6829551b538d] EVENT_ID: > USER_CREATE_SNAPSHOT_FINISHED_FAILURE(69), Correlation ID: > 080af640-bac3-4990-8bf4-6829551b538d, Call Stack: null, Custom Event ID: -1, > Message: Failed to complete snapshot 'test' creation for VM '<UNKNOWN>'. > 2017-08-22 11:54:57,132+02 ERROR > [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] > (DefaultQuartzScheduler8) [080af640-bac3-4990-8bf4-6829551b538d] Failed > invoking callback end method 'onFailed' for command > 'c841c979-70ea-4e06-b9c4-9c5ce014d76d' with exception 'null', the callback > is marked for end method retries > 2017-08-22 11:54:57,138+02 ERROR > [org.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand] > (DefaultQuartzScheduler8) [54caec87-ba59-4e9f-acf3-7fb5d94658e4] Ending > command > 'org.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand' with > failure. > > The SQL error messages like > > ERROR: insert or update on table "step" violates foreign key constraint > "fk_step_job" > > are gone now. Please give me a hint how to get rid of these tasks. > > To trace the original error i'm attaching a minute of engine logs when the > error first occurred (i hope this works when sending to the list). We have > not tried to reproduce it yet, but we are planning to test cinder/ceph > storage more extensively, so we'll be looking at this again. At first i > would feel a lot better when i could clean up the last error completely. > > Thanks a lot again > Best Regards > Matthias Thanks for the logs. If you can add the full engine log maybe I can also look at the reason of the consistent failing. _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users