Re: question: about the bug: current master had lost the ability "Cancel disk mirrors after libvirtd restart"

2021-09-21 Thread Peter Krempa
Please fix your address book, it's 'libvir-list@redhat.com' not
'libvirt-l...@redhat.com'

On Tue, Sep 21, 2021 at 00:52:57 +0800, wangjie (P) wrote:
> bug reproduce process:
> 1、perform migrateToURI3.
> 2、kill libvirtd when enter memory migration phase,and restart libvirtd.

I presume this is a reproducer and not a normal approach.

> 3、perform migrateToURI3 again and again,migrateToURI3 will fail forever with 
> err-msg "Requested operation is not valid: domain has active block job"
> 
> 
> I found the reasion which trigger the bug as follow:
> 
> 1、the qemuBlockJobData is not persistent when libvirtd restart,so the job 
> which return from qemuBlockJobDiskGetJob while always NULL, so 
> qemuMigrationSrcNBDCopyCancel will not be taken.
> 
> 2、calltrace:
> qemuProcessReconnect
> ->qemuProcessRecoverJob
>   ->qemuProcessRecoverMigrationOut
> ->qemuMigrationSrcCancel
> 
> 3、code as follow:
> qemuMigrationSrcCancel(virQEMUDriver *driver,
>virDomainObj *vm)
> {
> ... ...
> for (i = 0; i < vm->def->ndisks; i++) {
> virDomainDiskDef *disk = vm->def->disks[i];
> qemuDomainDiskPrivate *diskPriv = QEMU_DOMAIN_DISK_PRIVATE(disk);
> qemuBlockJobData *job;
> 
> if (!(job = qemuBlockJobDiskGetJob(disk)) ||  //the job 
> is always NULL !!!
> !qemuBlockJobIsRunning(job))

I'll have a look. The blockjob data should have been recovered at this
point. There's possibility that it's just wrong ordering of function
calls.

> diskPriv->migrating = false;
> 
> if (diskPriv->migrating) {
> qemuBlockJobSyncBegin(job);
> storage = true;
> }
> 
> virObjectUnref(job);
> }
> ... ...
> 
> if (storage &&
> qemuMigrationSrcNBDCopyCancel(driver, vm, true,
>   QEMU_ASYNC_JOB_NONE, NULL) < 0)
> return -1;
> ... ...
> }

Next time please file an issue in the upstream bug tracker.



question: about the bug: current master had lost the ability "Cancel disk mirrors after libvirtd restart"

2021-09-20 Thread wangjie (P)
bug reproduce process:
1、perform migrateToURI3.
2、kill libvirtd when enter memory migration phase,and restart libvirtd.
3、perform migrateToURI3 again and again,migrateToURI3 will fail forever with 
err-msg "Requested operation is not valid: domain has active block job"


I found the reasion which trigger the bug as follow:

1、the qemuBlockJobData is not persistent when libvirtd restart,so the job which 
return from qemuBlockJobDiskGetJob while always NULL, so 
qemuMigrationSrcNBDCopyCancel will not be taken.

2、calltrace:
qemuProcessReconnect
->qemuProcessRecoverJob
  ->qemuProcessRecoverMigrationOut
->qemuMigrationSrcCancel

3、code as follow:
qemuMigrationSrcCancel(virQEMUDriver *driver,
   virDomainObj *vm)
{
... ...
for (i = 0; i < vm->def->ndisks; i++) {
virDomainDiskDef *disk = vm->def->disks[i];
qemuDomainDiskPrivate *diskPriv = QEMU_DOMAIN_DISK_PRIVATE(disk);
qemuBlockJobData *job;

if (!(job = qemuBlockJobDiskGetJob(disk)) ||  //the job is 
always NULL !!!
!qemuBlockJobIsRunning(job))
diskPriv->migrating = false;

if (diskPriv->migrating) {
qemuBlockJobSyncBegin(job);
storage = true;
}

virObjectUnref(job);
}
... ...

if (storage &&
qemuMigrationSrcNBDCopyCancel(driver, vm, true,
  QEMU_ASYNC_JOB_NONE, NULL) < 0)
return -1;
... ...
}


4、I think current master had lost the ability of the followed patch:
http://10.175.124.40/cgit/cgit.cgi/code.huawei.com/libvirt.git/commit/?id=e8f263e0d006390c3764aaa07093b2d174b61379


can you give some suggestions to fix it?







Re: question: about the bug: current master had lost the ability "Cancel disk mirrors after libvirtd restart"

2021-09-20 Thread wangjie (P)
I think current master had lost the ability of the followed patch:
https://github.com/libvirt/libvirt/commit/e8f263e0d006390c3764aaa07093b2d174b61379

On 2021/9/21 0:52, wangjie (P) wrote:
> bug reproduce process:
> 1、perform migrateToURI3.
> 2、kill libvirtd when enter memory migration phase,and restart libvirtd.
> 3、perform migrateToURI3 again and again,migrateToURI3 will fail forever with 
> err-msg "Requested operation is not valid: domain has active block job"
> 
> 
> I found the reasion which trigger the bug as follow:
> 
> 1、the qemuBlockJobData is not persistent when libvirtd restart,so the job 
> which return from qemuBlockJobDiskGetJob while always NULL, so 
> qemuMigrationSrcNBDCopyCancel will not be taken.
> 
> 2、calltrace:
> qemuProcessReconnect
> ->qemuProcessRecoverJob
>   ->qemuProcessRecoverMigrationOut
> ->qemuMigrationSrcCancel
> 
> 3、code as follow:
> qemuMigrationSrcCancel(virQEMUDriver *driver,
>virDomainObj *vm)
> {
> ... ...
> for (i = 0; i < vm->def->ndisks; i++) {
> virDomainDiskDef *disk = vm->def->disks[i];
> qemuDomainDiskPrivate *diskPriv = QEMU_DOMAIN_DISK_PRIVATE(disk);
> qemuBlockJobData *job;
> 
> if (!(job = qemuBlockJobDiskGetJob(disk)) ||  //the job 
> is always NULL !!!
> !qemuBlockJobIsRunning(job))
> diskPriv->migrating = false;
> 
> if (diskPriv->migrating) {
> qemuBlockJobSyncBegin(job);
> storage = true;
> }
> 
> virObjectUnref(job);
> }
> ... ...
> 
> if (storage &&
> qemuMigrationSrcNBDCopyCancel(driver, vm, true,
>   QEMU_ASYNC_JOB_NONE, NULL) < 0)
> return -1;
> ... ...
> }
> 
> 
> 4、I think current master had lost the ability of the followed patch:
> http://10.175.124.40/cgit/cgit.cgi/code.huawei.com/libvirt.git/commit/?id=e8f263e0d006390c3764aaa07093b2d174b61379
> 
> 
> can you give some suggestions to fix it?
> 
> 
> 
>