Re: [PATCH Kernel v22 0/8] Add UAPIs to support migration for VFIO devices

Kirti Wankhede Thu, 21 May 2020 00:30:09 -0700



On 5/21/2020 12:34 PM, Yan Zhao wrote:

On Thu, May 21, 2020 at 12:39:48PM +0530, Kirti Wankhede wrote:



On 5/21/2020 10:38 AM, Yan Zhao wrote:

On Wed, May 20, 2020 at 10:46:12AM -0600, Alex Williamson wrote:

On Wed, 20 May 2020 19:10:07 +0530
Kirti Wankhede <kwankh...@nvidia.com> wrote:

On 5/20/2020 8:25 AM, Yan Zhao wrote:

On Tue, May 19, 2020 at 10:58:04AM -0600, Alex Williamson wrote:

Hi folks,

My impression is that we're getting pretty close to a workable
implementation here with v22 plus respins of patches 5, 6, and 8.  We
also have a matching QEMU series and a proposal for a new i40e
consumer, as well as I assume GVT-g updates happening internally at
Intel.  I expect all of the latter needs further review and discussion,
but we should be at the point where we can validate these proposed
kernel interfaces.  Therefore I'd like to make a call for reviews so
that we can get this wrapped up for the v5.8 merge window.  I know
Connie has some outstanding documentation comments and I'd like to make
sure everyone has an opportunity to check that their comments have been
addressed and we don't discover any new blocking issues.  Please send
your Acked-by/Reviewed-by/Tested-by tags if you're satisfied with this
interface and implementation.  Thanks!

hi Alex and Kirti,
after porting to qemu v22 and kernel v22, it is found out that
it can not even pass basic live migration test with error like

"Failed to get dirty bitmap for iova: 0xca000 size: 0x3000 err: 22"


Thanks for testing Yan.
I think last moment change in below cause this failure

https://lore.kernel.org/kvm/1589871178-8282-1-git-send-email-kwankh...@nvidia.com/

   >         if (dma->iova > iova + size)
   >                 break;

Surprisingly with my basic testing with 2G sys mem QEMU didn't raise
abort on g_free, but I do hit this with large sys mem.
With above change, that function iterated through next vfio_dma as well.
Check should be as below:

-               if (dma->iova > iova + size)
+               if (dma->iova > iova + size -1)



Or just:

        if (dma->iova >= iova + size)

Thanks,
Alex

                           break;

Another fix is in QEMU.
https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg04751.html

   > > +        range->bitmap.size = ROUND_UP(pages, 64) / 8;
   >
   > ROUND_UP(npages/8, sizeof(u64))?
   >

If npages < 8, npages/8 is 0 and ROUND_UP(0, 8) returns 0.

Changing it as below

-        range->bitmap.size = ROUND_UP(pages / 8, sizeof(uint64_t));
+        range->bitmap.size = ROUND_UP(pages, sizeof(__u64) *
BITS_PER_BYTE) /
+                             BITS_PER_BYTE;

I'm updating patches with these fixes and Cornelia's suggestion soon.

Due to short of time I may not be able to address all the concerns
raised on previous versions of QEMU, I'm trying make QEMU side code
available for testing for others with latest kernel changes. Don't
worry, I will revisit comments on QEMU patches. Right now first priority
is to test kernel UAPI and prepare kernel patches for 5.8

hi Kirti
by updating kernel/qemu to v23, still met below two types of errors.
just basic migration test.
(the guest VM size is 2G for all reported bugs).

"Failed to get dirty bitmap for iova: 0xfe011000 size: 0x3fb0 err: 22"


size doesn't look correct here, below check should be failing.
  range.size & (iommu_pgsize - 1)

or

"qemu-system-x86_64-lm: vfio_load_state: Error allocating buffer
qemu-system-x86_64-lm: error while loading state section id 49(vfio)
qemu-system-x86_64-lm: load of migration failed: Cannot allocate memory"


Above error is from:
         buf = g_try_malloc0(data_size);
         if (!buf) {
             error_report("%s: Error allocating buffer ", __func__);
             return -ENOMEM;
         }

Seems you are running out of memory?

no. my host memory is about 60G.
just migrate with command "migrate -d xxx" without speed limit.
FYI.

Probably you will have to figure out why g_try_malloc0() is failing.what is data_size when it fails?


Thanks,
Kirti

Re: [PATCH Kernel v22 0/8] Add UAPIs to support migration for VFIO devices

Reply via email to