Re: [virtio-dev] Re: [PATCH v9 2/5] virtio-balloon: VIRTIO_BALLOON_F_BALLOON_CHUNKS

2017-05-08 Thread Wei Wang

On 05/09/2017 01:40 AM, Michael S. Tsirkin wrote:

On Sun, May 07, 2017 at 04:19:28AM +, Wang, Wei W wrote:

On 05/06/2017 06:26 AM, Michael S. Tsirkin wrote:

On Thu, Apr 27, 2017 at 02:31:49PM +0800, Wei Wang wrote:

On 04/27/2017 07:20 AM, Michael S. Tsirkin wrote:

On Wed, Apr 26, 2017 at 11:03:34AM +, Wang, Wei W wrote:

Hi Michael, could you please give some feedback?

I'm sorry, I'm not sure feedback on what you are requesting.

Oh, just some trivial things (e.g. use a field in the header,
hdr->chunks to indicate the number of chunks in the payload) that
wasn't confirmed.

I will prepare the new version with fixing the agreed issues, and we
can continue to discuss those parts if you still find them improper.



The interface looks reasonable now, even though there's a way to
make it even simpler if we can limit chunk size to 2G (in fact 4G -
1). Do you think we can live with this limitation?

Yes, I think we can. So, is it good to change to use the previous
64-bit chunk format (52-bit base + 12-bit size)?

This isn't what I meant. virtio ring has descriptors with a 64 bit address and 
32 bit
size.

If size < 4g is not a significant limitation, why not just use that to pass
address/size in a standard s/g list, possibly using INDIRECT?

OK, I see your point, thanks. Post the two options here for an analysis:
Option1 (what we have now):
struct virtio_balloon_page_chunk {
 __le64 chunk_num;
 struct virtio_balloon_page_chunk_entry entry[];
};
Option2:
struct virtio_balloon_page_chunk {
 __le64 chunk_num;
 struct scatterlist entry[];
};

This isn't what I meant really :) I meant vring_desc.


OK. Repost the code change:

Option2:
struct virtio_balloon_page_chunk {
__le64 chunk_num;
struct ving_desc entry[];
};

We pre-allocate a table of desc, and each desc is used to hold a chunk.

In that case, the virtqueue_add() function, which deals with sg, is not
usable for us. We may need to add a new one,
virtqueue_add_indirect_desc(),
to add a pre-allocated indirect descriptor table to vring.





I don't have an issue to change it to Option2, but I would prefer Option1,
because I think there is no be obvious difference between the two options,
while Option1 appears to have little advantages here:
1) "struct virtio_balloon_page_chunk_entry" has smaller size than
"struct scatterlist", so the same size of allocated page chunk buffer
can hold more entry[] using Option1;
2) INDIRECT needs on demand kmalloc();

Within alloc_indirect?  We can fix that with a separate patch.



3) no 4G size limit;

Do you see lots of >=4g chunks in practice?

It wouldn't be much in practice, but we still need the extra code to
handle the case - break larger chunks into less-than 4g ones.

Best,
Wei
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: [PATCH v9 2/5] virtio-balloon: VIRTIO_BALLOON_F_BALLOON_CHUNKS

2017-05-08 Thread Michael S. Tsirkin
On Sun, May 07, 2017 at 04:19:28AM +, Wang, Wei W wrote:
> On 05/06/2017 06:26 AM, Michael S. Tsirkin wrote:
> > On Thu, Apr 27, 2017 at 02:31:49PM +0800, Wei Wang wrote:
> > > On 04/27/2017 07:20 AM, Michael S. Tsirkin wrote:
> > > > On Wed, Apr 26, 2017 at 11:03:34AM +, Wang, Wei W wrote:
> > > > > Hi Michael, could you please give some feedback?
> > > > I'm sorry, I'm not sure feedback on what you are requesting.
> > > Oh, just some trivial things (e.g. use a field in the header,
> > > hdr->chunks to indicate the number of chunks in the payload) that
> > > wasn't confirmed.
> > >
> > > I will prepare the new version with fixing the agreed issues, and we
> > > can continue to discuss those parts if you still find them improper.
> > >
> > >
> > > >
> > > > The interface looks reasonable now, even though there's a way to
> > > > make it even simpler if we can limit chunk size to 2G (in fact 4G -
> > > > 1). Do you think we can live with this limitation?
> > > Yes, I think we can. So, is it good to change to use the previous
> > > 64-bit chunk format (52-bit base + 12-bit size)?
> > 
> > This isn't what I meant. virtio ring has descriptors with a 64 bit address 
> > and 32 bit
> > size.
> > 
> > If size < 4g is not a significant limitation, why not just use that to pass
> > address/size in a standard s/g list, possibly using INDIRECT?
> 
> OK, I see your point, thanks. Post the two options here for an analysis:
> Option1 (what we have now):
> struct virtio_balloon_page_chunk {
> __le64 chunk_num;
> struct virtio_balloon_page_chunk_entry entry[];
> };
> Option2:
> struct virtio_balloon_page_chunk {
> __le64 chunk_num;
> struct scatterlist entry[];
> };

This isn't what I meant really :) I meant vring_desc.

> I don't have an issue to change it to Option2, but I would prefer Option1,
> because I think there is no be obvious difference between the two options,
> while Option1 appears to have little advantages here:
> 1) "struct virtio_balloon_page_chunk_entry" has smaller size than
> "struct scatterlist", so the same size of allocated page chunk buffer
> can hold more entry[] using Option1;
> 2) INDIRECT needs on demand kmalloc();

Within alloc_indirect?  We can fix that with a separate patch.


> 3) no 4G size limit;

Do you see lots of >=4g chunks in practice?

> What do you think?
> 
> Best,
> Wei
> 
>

OTOH using existing vring APIs handles things like DMA transparently.


-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [virtio-dev] Re: [PATCH v9 2/5] virtio-balloon: VIRTIO_BALLOON_F_BALLOON_CHUNKS

2017-05-06 Thread Wang, Wei W
On 05/06/2017 06:26 AM, Michael S. Tsirkin wrote:
> On Thu, Apr 27, 2017 at 02:31:49PM +0800, Wei Wang wrote:
> > On 04/27/2017 07:20 AM, Michael S. Tsirkin wrote:
> > > On Wed, Apr 26, 2017 at 11:03:34AM +, Wang, Wei W wrote:
> > > > Hi Michael, could you please give some feedback?
> > > I'm sorry, I'm not sure feedback on what you are requesting.
> > Oh, just some trivial things (e.g. use a field in the header,
> > hdr->chunks to indicate the number of chunks in the payload) that
> > wasn't confirmed.
> >
> > I will prepare the new version with fixing the agreed issues, and we
> > can continue to discuss those parts if you still find them improper.
> >
> >
> > >
> > > The interface looks reasonable now, even though there's a way to
> > > make it even simpler if we can limit chunk size to 2G (in fact 4G -
> > > 1). Do you think we can live with this limitation?
> > Yes, I think we can. So, is it good to change to use the previous
> > 64-bit chunk format (52-bit base + 12-bit size)?
> 
> This isn't what I meant. virtio ring has descriptors with a 64 bit address 
> and 32 bit
> size.
> 
> If size < 4g is not a significant limitation, why not just use that to pass
> address/size in a standard s/g list, possibly using INDIRECT?

OK, I see your point, thanks. Post the two options here for an analysis:
Option1 (what we have now):
struct virtio_balloon_page_chunk {
__le64 chunk_num;
struct virtio_balloon_page_chunk_entry entry[];
};
Option2:
struct virtio_balloon_page_chunk {
__le64 chunk_num;
struct scatterlist entry[];
};

I don't have an issue to change it to Option2, but I would prefer Option1,
because I think there is no be obvious difference between the two options,
while Option1 appears to have little advantages here:
1) "struct virtio_balloon_page_chunk_entry" has smaller size than
"struct scatterlist", so the same size of allocated page chunk buffer
can hold more entry[] using Option1;
2) INDIRECT needs on demand kmalloc();
3) no 4G size limit;

What do you think?

Best,
Wei



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: [PATCH v9 2/5] virtio-balloon: VIRTIO_BALLOON_F_BALLOON_CHUNKS

2017-05-05 Thread Michael S. Tsirkin
On Thu, Apr 27, 2017 at 02:31:49PM +0800, Wei Wang wrote:
> On 04/27/2017 07:20 AM, Michael S. Tsirkin wrote:
> > On Wed, Apr 26, 2017 at 11:03:34AM +, Wang, Wei W wrote:
> > > Hi Michael, could you please give some feedback?
> > I'm sorry, I'm not sure feedback on what you are requesting.
> Oh, just some trivial things (e.g. use a field in the
> header, hdr->chunks to indicate the number of chunks
> in the payload) that wasn't confirmed.
> 
> I will prepare the new version with fixing the agreed issues,
> and we can continue to discuss those parts if you still find
> them improper.
> 
> 
> > 
> > The interface looks reasonable now, even though there's
> > a way to make it even simpler if we can limit chunk size
> > to 2G (in fact 4G - 1). Do you think we can live with this
> > limitation?
> Yes, I think we can. So, is it good to change to use the
> previous 64-bit chunk format (52-bit base + 12-bit size)?

This isn't what I meant. virtio ring has descriptors with
a 64 bit address and 32 bit size.

If size < 4g is not a significant limitation, why not just
use that to pass address/size in a standard s/g list,
possibly using INDIRECT?

> 
> > 
> > But the code still needs some cleanup.
> > 
> 
> OK. We'll also still to discuss your comments in the patch 05.
> 
> Best,
> Wei
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: [PATCH v9 2/5] virtio-balloon: VIRTIO_BALLOON_F_BALLOON_CHUNKS

2017-04-27 Thread Wei Wang

On 04/27/2017 07:20 AM, Michael S. Tsirkin wrote:

On Wed, Apr 26, 2017 at 11:03:34AM +, Wang, Wei W wrote:

Hi Michael, could you please give some feedback?

I'm sorry, I'm not sure feedback on what you are requesting.

Oh, just some trivial things (e.g. use a field in the
header, hdr->chunks to indicate the number of chunks
in the payload) that wasn't confirmed.

I will prepare the new version with fixing the agreed issues,
and we can continue to discuss those parts if you still find
them improper.




The interface looks reasonable now, even though there's
a way to make it even simpler if we can limit chunk size
to 2G (in fact 4G - 1). Do you think we can live with this
limitation?

Yes, I think we can. So, is it good to change to use the
previous 64-bit chunk format (52-bit base + 12-bit size)?




But the code still needs some cleanup.



OK. We'll also still to discuss your comments in the patch 05.

Best,
Wei
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: [PATCH v9 2/5] virtio-balloon: VIRTIO_BALLOON_F_BALLOON_CHUNKS

2017-04-26 Thread Michael S. Tsirkin
On Wed, Apr 26, 2017 at 11:03:34AM +, Wang, Wei W wrote:
> Hi Michael, could you please give some feedback?

I'm sorry, I'm not sure feedback on what you are requesting.

The interface looks reasonable now, even though there's
a way to make it even simpler if we can limit chunk size
to 2G (in fact 4G - 1). Do you think we can live with this
limitation?

But the code still needs some cleanup.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [virtio-dev] Re: [PATCH v9 2/5] virtio-balloon: VIRTIO_BALLOON_F_BALLOON_CHUNKS

2017-04-26 Thread Wang, Wei W
Hi Michael, could you please give some feedback?

On Monday, April 17, 2017 11:35 AM, Wei Wang wrote:
> On 04/15/2017 05:38 AM, Michael S. Tsirkin wrote:
> > On Fri, Apr 14, 2017 at 04:37:52PM +0800, Wei Wang wrote:
> >> On 04/14/2017 12:34 AM, Michael S. Tsirkin wrote:
> >>> On Thu, Apr 13, 2017 at 05:35:05PM +0800, Wei Wang wrote:
> >>>
> >>> So we don't need the bitmap to talk to host, it is just a data
> >>> structure we chose to maintain lists of pages, right?
> >> Right. bitmap is the way to gather pages to chunk.
> >> It's only needed in the balloon page case.
> >> For the unused page case, we don't need it, since the free page
> >> blocks are already chunks.
> >>
> >>> OK as far as it goes but you need much better isolation for it.
> >>> Build a data structure with APIs such as _init, _cleanup, _add,
> >>> _clear, _find_first, _find_next.
> >>> Completely unrelated to pages, it just maintains bits.
> >>> Then use it here.
> >>>
> >>>
> static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
> module_param(oom_pages, int, S_IRUSR | S_IWUSR);
> MODULE_PARM_DESC(oom_pages, "pages to free on OOM"); @@ -50,6
>  +54,10 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
> static struct vfsmount *balloon_mnt;
> #endif
>  +/* Types of pages to chunk */
>  +#define PAGE_CHUNK_TYPE_BALLOON 0
>  +
> >>> Doesn't look like you are ever adding more types in this patchset.
> >>> Pls keep code simple, generalize it later.
> >>>
> >> "#define PAGE_CHUNK_TYPE_UNUSED 1" is added in another patch.
> > I would say add the extra code there too. Or maybe we can avoid adding
> > it altogether.
> 
> I'm trying to have the two features( i.e. "balloon pages" and "unused pages")
> decoupled while trying to use common functions to deal with the commonalities.
> That's the reason to define the above macro.
> Without the macro, we will need to have separate functions, for example,
> instead of one "add_one_chunk()", we need to have
> add_one_balloon_page_chunk() and add_one_unused_page_chunk(), and some
> of the implementations will be kind of duplicate in the two functions.
> Probably we can add it when the second feature comes to the code.
> 
> >
> >> Types of page to chunk are treated differently. Different types of
> >> page chunks are sent to the host via different protocols.
> >>
> >> 1) PAGE_CHUNK_TYPE_BALLOON: Ballooned (i.e. inflated/deflated) pages
> >> to chunk.  For the ballooned type, it uses the basic chunk msg format:
> >>
> >> virtio_balloon_page_chunk_hdr +
> >> virtio_balloon_page_chunk * MAX_PAGE_CHUNKS
> >>
> >> 2) PAGE_CHUNK_TYPE_UNUSED: unused pages to chunk. It uses this miscq
> >> msg
> >> format:
> >> miscq_hdr +
> >> virtio_balloon_page_chunk_hdr +
> >> virtio_balloon_page_chunk * MAX_PAGE_CHUNKS
> >>
> >> The chunk msg is actually the payload of the miscq msg.
> >>
> >>
> > So just combine the two message formats and then it'll all be easier?
> >
> 
> Yes, it'll be simple with only one msg format. But the problem I see here is 
> that
> miscq hdr is something necessary for the "unused page"
> usage, but not needed by the "balloon page" usage. To be more precise, struct
> virtio_balloon_miscq_hdr {
>   __le16 cmd;
>   __le16 flags;
> };
> 'cmd' specifies  the command from the miscq (I envision that miscq will be
> further used to handle other possible miscellaneous requests either from the
> host or to the host), so 'cmd' is necessary for the miscq. But the inflateq is
> exclusively used for inflating pages, so adding a command to it would be
> redundant and look a little bewildered there.
> 'flags': We currently use bit 0 of flags to indicate the completion ofa 
> command,
> this is also useful in the "unused page" usage, and not needed by the "balloon
> page" usage.
>  +#define MAX_PAGE_CHUNKS 4096
> >>> This is an order-4 allocation. I'd make it 4095 and then it's an
> >>> order-3 one.
> >> Sounds good, thanks.
> >> I think it would be better to make it 4090. Leave some space for the
> >> hdr as well.
> > And miscq hdr. In fact just let compiler do the math - something like:
> > (8 * PAGE_SIZE - sizeof(hdr)) / sizeof(chunk)
> Agree, thanks.
> 
> >
> > I skimmed explanation of algorithms below but please make sure code
> > speaks for itself and add comments inline to document it.
> > Whenever you answered me inline this is where you want to try to make
> > code clearer and add comments.
> >
> > Also, pls find ways to abstract the data structure so we don't need to
> > deal with its internals all over the code.
> >
> >
> > 
> >
> {
>   struct scatterlist sg;
>  +struct virtio_balloon_page_chunk_hdr *hdr;
>  +void *buf;
>   unsigned int len;
>  -sg_init_one(, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
>  +switch (type) {
>  +case PAGE_CHUNK_TYPE_BALLOON:
>  +hdr = vb->balloon_page_chunk_hdr;
>  +len = 0;
> 

Re: [virtio-dev] Re: [PATCH v9 2/5] virtio-balloon: VIRTIO_BALLOON_F_BALLOON_CHUNKS

2017-04-16 Thread Wei Wang

On 04/15/2017 05:38 AM, Michael S. Tsirkin wrote:

On Fri, Apr 14, 2017 at 04:37:52PM +0800, Wei Wang wrote:

On 04/14/2017 12:34 AM, Michael S. Tsirkin wrote:

On Thu, Apr 13, 2017 at 05:35:05PM +0800, Wei Wang wrote:

So we don't need the bitmap to talk to host, it is just
a data structure we chose to maintain lists of pages, right?

Right. bitmap is the way to gather pages to chunk.
It's only needed in the balloon page case.
For the unused page case, we don't need it, since the free
page blocks are already chunks.


OK as far as it goes but you need much better isolation for it.
Build a data structure with APIs such as _init, _cleanup, _add, _clear,
_find_first, _find_next.
Completely unrelated to pages, it just maintains bits.
Then use it here.



   static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
   module_param(oom_pages, int, S_IRUSR | S_IWUSR);
   MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
@@ -50,6 +54,10 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
   static struct vfsmount *balloon_mnt;
   #endif
+/* Types of pages to chunk */
+#define PAGE_CHUNK_TYPE_BALLOON 0
+

Doesn't look like you are ever adding more types in this
patchset.  Pls keep code simple, generalize it later.


"#define PAGE_CHUNK_TYPE_UNUSED 1" is added in another patch.

I would say add the extra code there too. Or maybe we can avoid
adding it altogether.


I'm trying to have the two features( i.e. "balloon pages" and
"unused pages") decoupled while trying to use common functions
to deal with the commonalities. That's the reason to define
the above macro.
Without the macro, we will need to have separate functions,
for example, instead of one "add_one_chunk()", we need to
have add_one_balloon_page_chunk() and
add_one_unused_page_chunk(),
and some of the implementations will be kind of duplicate in the
two functions.
Probably we can add it when the second feature comes to
the code.




Types of page to chunk are treated differently. Different types of page
chunks are sent to the host via different protocols.

1) PAGE_CHUNK_TYPE_BALLOON: Ballooned (i.e. inflated/deflated) pages
to chunk.  For the ballooned type, it uses the basic chunk msg format:

virtio_balloon_page_chunk_hdr +
virtio_balloon_page_chunk * MAX_PAGE_CHUNKS

2) PAGE_CHUNK_TYPE_UNUSED: unused pages to chunk. It uses this miscq msg
format:
miscq_hdr +
virtio_balloon_page_chunk_hdr +
virtio_balloon_page_chunk * MAX_PAGE_CHUNKS

The chunk msg is actually the payload of the miscq msg.



So just combine the two message formats and then it'll all be easier?



Yes, it'll be simple with only one msg format. But the problem I see
here is that miscq hdr is something necessary for the "unused page"
usage, but not needed by the "balloon page" usage. To be more
precise,
struct virtio_balloon_miscq_hdr {
 __le16 cmd;
 __le16 flags;
};
'cmd' specifies  the command from the miscq (I envision that
miscq will be further used to handle other possible miscellaneous
requests either from the host or to the host), so 'cmd' is necessary
for the miscq. But the inflateq is exclusively used for inflating
pages, so adding a command to it would be redundant and look a little
bewildered there.
'flags': We currently use bit 0 of flags to indicate the completion
ofa command, this is also useful in the "unused page" usage, and not
needed by the "balloon page" usage.

+#define MAX_PAGE_CHUNKS 4096

This is an order-4 allocation. I'd make it 4095 and then it's
an order-3 one.

Sounds good, thanks.
I think it would be better to make it 4090. Leave some space for the hdr
as well.

And miscq hdr. In fact just let compiler do the math - something like:
(8 * PAGE_SIZE - sizeof(hdr)) / sizeof(chunk)

Agree, thanks.



I skimmed explanation of algorithms below but please make sure
code speaks for itself and add comments inline to document it.
Whenever you answered me inline this is where you want to
try to make code clearer and add comments.

Also, pls find ways to abstract the data structure so we don't
need to deal with its internals all over the code.





   {
struct scatterlist sg;
+   struct virtio_balloon_page_chunk_hdr *hdr;
+   void *buf;
unsigned int len;
-   sg_init_one(, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
+   switch (type) {
+   case PAGE_CHUNK_TYPE_BALLOON:
+   hdr = vb->balloon_page_chunk_hdr;
+   len = 0;
+   break;
+   default:
+   dev_warn(>vdev->dev, "%s: chunk %d of unknown pages\n",
+__func__, type);
+   return;
+   }
-   /* We should always be able to add one buffer to an empty queue. */
-   virtqueue_add_outbuf(vq, , 1, vb, GFP_KERNEL);
-   virtqueue_kick(vq);
+   buf = (void *)hdr - len;

Moving back to before the header? How can this make sense?
It works fine since len is 0, so just buf = hdr.


For the unused page chunk case, it follows its own protocol:
miscq_hdr + payload(chunk msg).
  "buf = (void 

Re: [virtio-dev] Re: [PATCH v9 2/5] virtio-balloon: VIRTIO_BALLOON_F_BALLOON_CHUNKS

2017-04-14 Thread Michael S. Tsirkin
On Fri, Apr 14, 2017 at 04:37:52PM +0800, Wei Wang wrote:
> On 04/14/2017 12:34 AM, Michael S. Tsirkin wrote:
> > On Thu, Apr 13, 2017 at 05:35:05PM +0800, Wei Wang wrote:
> > 
> > So we don't need the bitmap to talk to host, it is just
> > a data structure we chose to maintain lists of pages, right?
> Right. bitmap is the way to gather pages to chunk.
> It's only needed in the balloon page case.
> For the unused page case, we don't need it, since the free
> page blocks are already chunks.
> 
> > OK as far as it goes but you need much better isolation for it.
> > Build a data structure with APIs such as _init, _cleanup, _add, _clear,
> > _find_first, _find_next.
> > Completely unrelated to pages, it just maintains bits.
> > Then use it here.
> > 
> > 
> > >   static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
> > >   module_param(oom_pages, int, S_IRUSR | S_IWUSR);
> > >   MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
> > > @@ -50,6 +54,10 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
> > >   static struct vfsmount *balloon_mnt;
> > >   #endif
> > > +/* Types of pages to chunk */
> > > +#define PAGE_CHUNK_TYPE_BALLOON 0
> > > +
> > Doesn't look like you are ever adding more types in this
> > patchset.  Pls keep code simple, generalize it later.
> > 
> "#define PAGE_CHUNK_TYPE_UNUSED 1" is added in another patch.

I would say add the extra code there too. Or maybe we can avoid
adding it altogether.

> Types of page to chunk are treated differently. Different types of page
> chunks are sent to the host via different protocols.
> 
> 1) PAGE_CHUNK_TYPE_BALLOON: Ballooned (i.e. inflated/deflated) pages
> to chunk.  For the ballooned type, it uses the basic chunk msg format:
> 
> virtio_balloon_page_chunk_hdr +
> virtio_balloon_page_chunk * MAX_PAGE_CHUNKS
> 
> 2) PAGE_CHUNK_TYPE_UNUSED: unused pages to chunk. It uses this miscq msg
> format:
> miscq_hdr +
> virtio_balloon_page_chunk_hdr +
> virtio_balloon_page_chunk * MAX_PAGE_CHUNKS
> 
> The chunk msg is actually the payload of the miscq msg.
> 
> 

So just combine the two message formats and then it'll all be easier?


> > > +#define MAX_PAGE_CHUNKS 4096
> > This is an order-4 allocation. I'd make it 4095 and then it's
> > an order-3 one.
> 
> Sounds good, thanks.
> I think it would be better to make it 4090. Leave some space for the hdr
> as well.

And miscq hdr. In fact just let compiler do the math - something like:
(8 * PAGE_SIZE - sizeof(hdr)) / sizeof(chunk)


I skimmed explanation of algorithms below but please make sure
code speaks for itself and add comments inline to document it.
Whenever you answered me inline this is where you want to
try to make code clearer and add comments.

Also, pls find ways to abstract the data structure so we don't
need to deal with its internals all over the code.




> > 
> > >   {
> > >   struct scatterlist sg;
> > > + struct virtio_balloon_page_chunk_hdr *hdr;
> > > + void *buf;
> > >   unsigned int len;
> > > - sg_init_one(, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> > > + switch (type) {
> > > + case PAGE_CHUNK_TYPE_BALLOON:
> > > + hdr = vb->balloon_page_chunk_hdr;
> > > + len = 0;
> > > + break;
> > > + default:
> > > + dev_warn(>vdev->dev, "%s: chunk %d of unknown pages\n",
> > > +  __func__, type);
> > > + return;
> > > + }
> > > - /* We should always be able to add one buffer to an empty queue. */
> > > - virtqueue_add_outbuf(vq, , 1, vb, GFP_KERNEL);
> > > - virtqueue_kick(vq);
> > > + buf = (void *)hdr - len;
> > Moving back to before the header? How can this make sense?
> > It works fine since len is 0, so just buf = hdr.
> > 
> For the unused page chunk case, it follows its own protocol:
> miscq_hdr + payload(chunk msg).
>  "buf = (void *)hdr - len" moves the buf pointer to the miscq_hdr, to send
> the entire miscq msg.

Well just pass the correct pointer in.

> Please check the patch for implementing the unused page chunk,
> it will be clear. If necessary, I can put "buf = (void *)hdr - len" from
> that patch.

Exactly. And all this pointer math is very messy. Please look for ways
to clean it. It's generally easy to fill structures:

struct foo *foo = kmalloc(..., sizeof(*foo) + n * sizeof(foo->a[0]));
for (i = 0; i < n; ++i)
foo->a[i] = b;

this is the kind of code that's easy to understand and it's
obvious there are no overflows and no info leaks here.

> 
> > > + len += sizeof(struct virtio_balloon_page_chunk_hdr);
> > > + len += hdr->chunks * sizeof(struct virtio_balloon_page_chunk);
> > > + sg_init_table(, 1);
> > > + sg_set_buf(, buf, len);
> > > + if (!virtqueue_add_outbuf(vq, , 1, vb, GFP_KERNEL)) {
> > > + virtqueue_kick(vq);
> > > + if (busy_wait)
> > > + while (!virtqueue_get_buf(vq, ) &&
> > > +!virtqueue_is_broken(vq))
> > > + cpu_relax();
> > > + else
> > > + wait_event(vb->acked,