Re: question: about the bug: current master had lost the ability "Cancel disk mirrors after libvirtd restart"
Please fix your address book, it's 'libvir-list@redhat.com' not 'libvirt-l...@redhat.com' On Tue, Sep 21, 2021 at 00:52:57 +0800, wangjie (P) wrote: > bug reproduce process: > 1、perform migrateToURI3. > 2、kill libvirtd when enter memory migration phase,and restart libvirtd. I presume this is a reproducer and not a normal approach. > 3、perform migrateToURI3 again and again,migrateToURI3 will fail forever with > err-msg "Requested operation is not valid: domain has active block job" > > > I found the reasion which trigger the bug as follow: > > 1、the qemuBlockJobData is not persistent when libvirtd restart,so the job > which return from qemuBlockJobDiskGetJob while always NULL, so > qemuMigrationSrcNBDCopyCancel will not be taken. > > 2、calltrace: > qemuProcessReconnect > ->qemuProcessRecoverJob > ->qemuProcessRecoverMigrationOut > ->qemuMigrationSrcCancel > > 3、code as follow: > qemuMigrationSrcCancel(virQEMUDriver *driver, >virDomainObj *vm) > { > ... ... > for (i = 0; i < vm->def->ndisks; i++) { > virDomainDiskDef *disk = vm->def->disks[i]; > qemuDomainDiskPrivate *diskPriv = QEMU_DOMAIN_DISK_PRIVATE(disk); > qemuBlockJobData *job; > > if (!(job = qemuBlockJobDiskGetJob(disk)) || //the job > is always NULL !!! > !qemuBlockJobIsRunning(job)) I'll have a look. The blockjob data should have been recovered at this point. There's possibility that it's just wrong ordering of function calls. > diskPriv->migrating = false; > > if (diskPriv->migrating) { > qemuBlockJobSyncBegin(job); > storage = true; > } > > virObjectUnref(job); > } > ... ... > > if (storage && > qemuMigrationSrcNBDCopyCancel(driver, vm, true, > QEMU_ASYNC_JOB_NONE, NULL) < 0) > return -1; > ... ... > } Next time please file an issue in the upstream bug tracker.
question: about the bug: current master had lost the ability "Cancel disk mirrors after libvirtd restart"
bug reproduce process: 1、perform migrateToURI3. 2、kill libvirtd when enter memory migration phase,and restart libvirtd. 3、perform migrateToURI3 again and again,migrateToURI3 will fail forever with err-msg "Requested operation is not valid: domain has active block job" I found the reasion which trigger the bug as follow: 1、the qemuBlockJobData is not persistent when libvirtd restart,so the job which return from qemuBlockJobDiskGetJob while always NULL, so qemuMigrationSrcNBDCopyCancel will not be taken. 2、calltrace: qemuProcessReconnect ->qemuProcessRecoverJob ->qemuProcessRecoverMigrationOut ->qemuMigrationSrcCancel 3、code as follow: qemuMigrationSrcCancel(virQEMUDriver *driver, virDomainObj *vm) { ... ... for (i = 0; i < vm->def->ndisks; i++) { virDomainDiskDef *disk = vm->def->disks[i]; qemuDomainDiskPrivate *diskPriv = QEMU_DOMAIN_DISK_PRIVATE(disk); qemuBlockJobData *job; if (!(job = qemuBlockJobDiskGetJob(disk)) || //the job is always NULL !!! !qemuBlockJobIsRunning(job)) diskPriv->migrating = false; if (diskPriv->migrating) { qemuBlockJobSyncBegin(job); storage = true; } virObjectUnref(job); } ... ... if (storage && qemuMigrationSrcNBDCopyCancel(driver, vm, true, QEMU_ASYNC_JOB_NONE, NULL) < 0) return -1; ... ... } 4、I think current master had lost the ability of the followed patch: http://10.175.124.40/cgit/cgit.cgi/code.huawei.com/libvirt.git/commit/?id=e8f263e0d006390c3764aaa07093b2d174b61379 can you give some suggestions to fix it?
Re: question: about the bug: current master had lost the ability "Cancel disk mirrors after libvirtd restart"
I think current master had lost the ability of the followed patch: https://github.com/libvirt/libvirt/commit/e8f263e0d006390c3764aaa07093b2d174b61379 On 2021/9/21 0:52, wangjie (P) wrote: > bug reproduce process: > 1、perform migrateToURI3. > 2、kill libvirtd when enter memory migration phase,and restart libvirtd. > 3、perform migrateToURI3 again and again,migrateToURI3 will fail forever with > err-msg "Requested operation is not valid: domain has active block job" > > > I found the reasion which trigger the bug as follow: > > 1、the qemuBlockJobData is not persistent when libvirtd restart,so the job > which return from qemuBlockJobDiskGetJob while always NULL, so > qemuMigrationSrcNBDCopyCancel will not be taken. > > 2、calltrace: > qemuProcessReconnect > ->qemuProcessRecoverJob > ->qemuProcessRecoverMigrationOut > ->qemuMigrationSrcCancel > > 3、code as follow: > qemuMigrationSrcCancel(virQEMUDriver *driver, >virDomainObj *vm) > { > ... ... > for (i = 0; i < vm->def->ndisks; i++) { > virDomainDiskDef *disk = vm->def->disks[i]; > qemuDomainDiskPrivate *diskPriv = QEMU_DOMAIN_DISK_PRIVATE(disk); > qemuBlockJobData *job; > > if (!(job = qemuBlockJobDiskGetJob(disk)) || //the job > is always NULL !!! > !qemuBlockJobIsRunning(job)) > diskPriv->migrating = false; > > if (diskPriv->migrating) { > qemuBlockJobSyncBegin(job); > storage = true; > } > > virObjectUnref(job); > } > ... ... > > if (storage && > qemuMigrationSrcNBDCopyCancel(driver, vm, true, > QEMU_ASYNC_JOB_NONE, NULL) < 0) > return -1; > ... ... > } > > > 4、I think current master had lost the ability of the followed patch: > http://10.175.124.40/cgit/cgit.cgi/code.huawei.com/libvirt.git/commit/?id=e8f263e0d006390c3764aaa07093b2d174b61379 > > > can you give some suggestions to fix it? > > > >