Re: [RFC v4 3/5] virtio_ring: add packed ring support

2018-05-20 Thread Tiwei Bie
On Mon, May 21, 2018 at 10:30:51AM +0800, Jason Wang wrote:
> On 2018年05月19日 10:29, Tiwei Bie wrote:
> > > I don't hope so.
> > > 
> > > > I agreed driver should track the DMA addrs or some
> > > > other necessary things from the very beginning. And
> > > > I also repeated the spec to emphasize that it does
> > > > make sense. And I'd like to do that.
> > > > 
> > > > What I was saying is that, to support OOO, we may
> > > > need to manage these context (which saves DMA addrs
> > > > etc) via a list which is similar to the desc list
> > > > maintained via `next` in split ring instead of an
> > > > array whose elements always can be indexed directly.
> > > My point is these context is a must (not only for OOO).
> > Yeah, and I have the exactly same point after you
> > pointed that I shouldn't get the addrs from descs.
> > I do think it makes sense. I'll do it in the next
> > version. I don't have any doubt about it. All my
> > questions are about the OOO, instead of whether we
> > should save context or not. It just seems that you
> > thought I don't want to do it, and were trying to
> > convince me that I should do it.
> 
> Right, but looks like I was wrong :)
> 
> > 
> > > > The desc ring in split ring is an array, but its
> > > > free entries are managed as list via next. I was
> > > > just wondering, do we want to manage such a list
> > > > because of OOO. It's just a very simple question
> > > > that I want to hear your opinion... (It doesn't
> > > > means anything, e.g. It doesn't mean I don't want
> > > > to support OOO. It's just a simple question...)
> > > So the question is yes. But I admit I don't have better idea other than 
> > > what
> > > you propose here (something like split ring which is a little bit sad).
> > > Maybe Michael had.
> > Yeah, that's why I asked this question. It will
> > make the packed ring a bit similar to split ring
> > at least in the driver part. So I want to draw
> > your attention on this to make sure that we're
> > on the same page.
> 
> Yes. I think we are.

Cool. Glad to hear that! Thanks! :)

Best regards,
Tiwei Bie

> 
> Thanks
> 
> > Best regards,
> > Tiwei Bie
> > 
> 


Re: [RFC v4 3/5] virtio_ring: add packed ring support

2018-05-20 Thread Tiwei Bie
On Mon, May 21, 2018 at 10:30:51AM +0800, Jason Wang wrote:
> On 2018年05月19日 10:29, Tiwei Bie wrote:
> > > I don't hope so.
> > > 
> > > > I agreed driver should track the DMA addrs or some
> > > > other necessary things from the very beginning. And
> > > > I also repeated the spec to emphasize that it does
> > > > make sense. And I'd like to do that.
> > > > 
> > > > What I was saying is that, to support OOO, we may
> > > > need to manage these context (which saves DMA addrs
> > > > etc) via a list which is similar to the desc list
> > > > maintained via `next` in split ring instead of an
> > > > array whose elements always can be indexed directly.
> > > My point is these context is a must (not only for OOO).
> > Yeah, and I have the exactly same point after you
> > pointed that I shouldn't get the addrs from descs.
> > I do think it makes sense. I'll do it in the next
> > version. I don't have any doubt about it. All my
> > questions are about the OOO, instead of whether we
> > should save context or not. It just seems that you
> > thought I don't want to do it, and were trying to
> > convince me that I should do it.
> 
> Right, but looks like I was wrong :)
> 
> > 
> > > > The desc ring in split ring is an array, but its
> > > > free entries are managed as list via next. I was
> > > > just wondering, do we want to manage such a list
> > > > because of OOO. It's just a very simple question
> > > > that I want to hear your opinion... (It doesn't
> > > > means anything, e.g. It doesn't mean I don't want
> > > > to support OOO. It's just a simple question...)
> > > So the question is yes. But I admit I don't have better idea other than 
> > > what
> > > you propose here (something like split ring which is a little bit sad).
> > > Maybe Michael had.
> > Yeah, that's why I asked this question. It will
> > make the packed ring a bit similar to split ring
> > at least in the driver part. So I want to draw
> > your attention on this to make sure that we're
> > on the same page.
> 
> Yes. I think we are.

Cool. Glad to hear that! Thanks! :)

Best regards,
Tiwei Bie

> 
> Thanks
> 
> > Best regards,
> > Tiwei Bie
> > 
> 


Re: [RFC v4 3/5] virtio_ring: add packed ring support

2018-05-20 Thread Jason Wang



On 2018年05月19日 10:29, Tiwei Bie wrote:

I don't hope so.


I agreed driver should track the DMA addrs or some
other necessary things from the very beginning. And
I also repeated the spec to emphasize that it does
make sense. And I'd like to do that.

What I was saying is that, to support OOO, we may
need to manage these context (which saves DMA addrs
etc) via a list which is similar to the desc list
maintained via `next` in split ring instead of an
array whose elements always can be indexed directly.

My point is these context is a must (not only for OOO).

Yeah, and I have the exactly same point after you
pointed that I shouldn't get the addrs from descs.
I do think it makes sense. I'll do it in the next
version. I don't have any doubt about it. All my
questions are about the OOO, instead of whether we
should save context or not. It just seems that you
thought I don't want to do it, and were trying to
convince me that I should do it.


Right, but looks like I was wrong :)




The desc ring in split ring is an array, but its
free entries are managed as list via next. I was
just wondering, do we want to manage such a list
because of OOO. It's just a very simple question
that I want to hear your opinion... (It doesn't
means anything, e.g. It doesn't mean I don't want
to support OOO. It's just a simple question...)

So the question is yes. But I admit I don't have better idea other than what
you propose here (something like split ring which is a little bit sad).
Maybe Michael had.

Yeah, that's why I asked this question. It will
make the packed ring a bit similar to split ring
at least in the driver part. So I want to draw
your attention on this to make sure that we're
on the same page.


Yes. I think we are.

Thanks


Best regards,
Tiwei Bie





Re: [RFC v4 3/5] virtio_ring: add packed ring support

2018-05-20 Thread Jason Wang



On 2018年05月19日 10:29, Tiwei Bie wrote:

I don't hope so.


I agreed driver should track the DMA addrs or some
other necessary things from the very beginning. And
I also repeated the spec to emphasize that it does
make sense. And I'd like to do that.

What I was saying is that, to support OOO, we may
need to manage these context (which saves DMA addrs
etc) via a list which is similar to the desc list
maintained via `next` in split ring instead of an
array whose elements always can be indexed directly.

My point is these context is a must (not only for OOO).

Yeah, and I have the exactly same point after you
pointed that I shouldn't get the addrs from descs.
I do think it makes sense. I'll do it in the next
version. I don't have any doubt about it. All my
questions are about the OOO, instead of whether we
should save context or not. It just seems that you
thought I don't want to do it, and were trying to
convince me that I should do it.


Right, but looks like I was wrong :)




The desc ring in split ring is an array, but its
free entries are managed as list via next. I was
just wondering, do we want to manage such a list
because of OOO. It's just a very simple question
that I want to hear your opinion... (It doesn't
means anything, e.g. It doesn't mean I don't want
to support OOO. It's just a simple question...)

So the question is yes. But I admit I don't have better idea other than what
you propose here (something like split ring which is a little bit sad).
Maybe Michael had.

Yeah, that's why I asked this question. It will
make the packed ring a bit similar to split ring
at least in the driver part. So I want to draw
your attention on this to make sure that we're
on the same page.


Yes. I think we are.

Thanks


Best regards,
Tiwei Bie





Re: [RFC v4 3/5] virtio_ring: add packed ring support

2018-05-18 Thread Tiwei Bie
On Sat, May 19, 2018 at 09:12:30AM +0800, Jason Wang wrote:
> On 2018年05月18日 22:33, Tiwei Bie wrote:
> > On Fri, May 18, 2018 at 09:17:05PM +0800, Jason Wang wrote:
> > > On 2018年05月18日 19:29, Tiwei Bie wrote:
> > > > On Thu, May 17, 2018 at 08:01:52PM +0800, Jason Wang wrote:
> > > > > On 2018年05月16日 22:33, Tiwei Bie wrote:
> > > > > > On Wed, May 16, 2018 at 10:05:44PM +0800, Jason Wang wrote:
> > > > > > > On 2018年05月16日 21:45, Tiwei Bie wrote:
> > > > > > > > On Wed, May 16, 2018 at 08:51:43PM +0800, Jason Wang wrote:
> > > > > > > > > On 2018年05月16日 20:39, Tiwei Bie wrote:
> > > > > > > > > > On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote:
> > > > > > > > > > > On 2018年05月16日 16:37, Tiwei Bie wrote:
> > > > > > [...]
> > > > > > > > > > > > +static void detach_buf_packed(struct vring_virtqueue 
> > > > > > > > > > > > *vq, unsigned int head,
> > > > > > > > > > > > + unsigned int id, void 
> > > > > > > > > > > > **ctx)
> > > > > > > > > > > > +{
> > > > > > > > > > > > +   struct vring_packed_desc *desc;
> > > > > > > > > > > > +   unsigned int i, j;
> > > > > > > > > > > > +
> > > > > > > > > > > > +   /* Clear data ptr. */
> > > > > > > > > > > > +   vq->desc_state[id].data = NULL;
> > > > > > > > > > > > +
> > > > > > > > > > > > +   i = head;
> > > > > > > > > > > > +
> > > > > > > > > > > > +   for (j = 0; j < vq->desc_state[id].num; j++) {
> > > > > > > > > > > > +   desc = >vring_packed.desc[i];
> > > > > > > > > > > > +   vring_unmap_one_packed(vq, desc);
> > > > > > > > > > > As mentioned in previous discussion, this probably won't 
> > > > > > > > > > > work for the case
> > > > > > > > > > > of out of order completion since it depends on the 
> > > > > > > > > > > information in the
> > > > > > > > > > > descriptor ring. We probably need to extend ctx to record 
> > > > > > > > > > > such information.
> > > > > > > > > > Above code doesn't depend on the information in the 
> > > > > > > > > > descriptor
> > > > > > > > > > ring. The vq->desc_state[] is the extended ctx.
> > > > > > > > > > 
> > > > > > > > > > Best regards,
> > > > > > > > > > Tiwei Bie
> > > > > > > > > Yes, but desc is a pointer to descriptor ring I think so
> > > > > > > > > vring_unmap_one_packed() still depends on the content of 
> > > > > > > > > descriptor ring?
> > > > > > > > > 
> > > > > > > > I got your point now. I think it makes sense to reserve
> > > > > > > > the bits of the addr field. Driver shouldn't try to get
> > > > > > > > addrs from the descriptors when cleanup the descriptors
> > > > > > > > no matter whether we support out-of-order or not.
> > > > > > > Maybe I was wrong, but I remember spec mentioned something like 
> > > > > > > this.
> > > > > > You're right. Spec mentioned this. I was just repeating
> > > > > > the spec to emphasize that it does make sense. :)
> > > > > > 
> > > > > > > > But combining it with the out-of-order support, it will
> > > > > > > > mean that the driver still needs to maintain a desc/ctx
> > > > > > > > list that is very similar to the desc ring in the split
> > > > > > > > ring. I'm not quite sure whether it's something we want.
> > > > > > > > If it is true, I'll do it. So do you think we also want
> > > > > > > > to maintain such a desc/ctx list for packed ring?
> > > > > > > To make it work for OOO backends I think we need something like 
> > > > > > > this
> > > > > > > (hardware NIC drivers are usually have something like this).
> > > > > > Which hardware NIC drivers have this?
> > > > > It's quite common I think, e.g driver track e.g dma addr and page frag
> > > > > somewhere. e.g the ring->rx_info in mlx4 driver.
> > > > It seems that I had a misunderstanding on your
> > > > previous comments. I know it's quite common for
> > > > drivers to track e.g. DMA addrs somewhere (and
> > > > I think one reason behind this is that they want
> > > > to reuse the bits of addr field).
> > > Yes, we may want this for virtio-net as well in the future.
> > > 
> > > >But tracking
> > > > addrs somewhere doesn't means supporting OOO.
> > > > I thought you were saying it's quite common for
> > > > hardware NIC drivers to support OOO (i.e. NICs
> > > > will return the descriptors OOO):
> > > > 
> > > > I'm not familiar with mlx4, maybe I'm wrong.
> > > > I just had a quick glance. And I found below
> > > > comments in mlx4_en_process_rx_cq():
> > > > 
> > > > ```
> > > > /* We assume a 1:1 mapping between CQEs and Rx descriptors, so Rx
> > > >* descriptor offset can be deduced from the CQE index instead of
> > > >* reading 'cqe->index' */
> > > > index = cq->mcq.cons_index & ring->size_mask;
> > > > cqe = mlx4_en_get_cqe(cq->buf, index, priv->cqe_size) + factor;
> > > > ```
> > > > 
> > > > It seems that although they have a completion
> > > > queue, they are still using the ring in order.
> > > I guess so (at least from the above bits). Git grep -i "out of order" in
> > > 

Re: [RFC v4 3/5] virtio_ring: add packed ring support

2018-05-18 Thread Tiwei Bie
On Sat, May 19, 2018 at 09:12:30AM +0800, Jason Wang wrote:
> On 2018年05月18日 22:33, Tiwei Bie wrote:
> > On Fri, May 18, 2018 at 09:17:05PM +0800, Jason Wang wrote:
> > > On 2018年05月18日 19:29, Tiwei Bie wrote:
> > > > On Thu, May 17, 2018 at 08:01:52PM +0800, Jason Wang wrote:
> > > > > On 2018年05月16日 22:33, Tiwei Bie wrote:
> > > > > > On Wed, May 16, 2018 at 10:05:44PM +0800, Jason Wang wrote:
> > > > > > > On 2018年05月16日 21:45, Tiwei Bie wrote:
> > > > > > > > On Wed, May 16, 2018 at 08:51:43PM +0800, Jason Wang wrote:
> > > > > > > > > On 2018年05月16日 20:39, Tiwei Bie wrote:
> > > > > > > > > > On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote:
> > > > > > > > > > > On 2018年05月16日 16:37, Tiwei Bie wrote:
> > > > > > [...]
> > > > > > > > > > > > +static void detach_buf_packed(struct vring_virtqueue 
> > > > > > > > > > > > *vq, unsigned int head,
> > > > > > > > > > > > + unsigned int id, void 
> > > > > > > > > > > > **ctx)
> > > > > > > > > > > > +{
> > > > > > > > > > > > +   struct vring_packed_desc *desc;
> > > > > > > > > > > > +   unsigned int i, j;
> > > > > > > > > > > > +
> > > > > > > > > > > > +   /* Clear data ptr. */
> > > > > > > > > > > > +   vq->desc_state[id].data = NULL;
> > > > > > > > > > > > +
> > > > > > > > > > > > +   i = head;
> > > > > > > > > > > > +
> > > > > > > > > > > > +   for (j = 0; j < vq->desc_state[id].num; j++) {
> > > > > > > > > > > > +   desc = >vring_packed.desc[i];
> > > > > > > > > > > > +   vring_unmap_one_packed(vq, desc);
> > > > > > > > > > > As mentioned in previous discussion, this probably won't 
> > > > > > > > > > > work for the case
> > > > > > > > > > > of out of order completion since it depends on the 
> > > > > > > > > > > information in the
> > > > > > > > > > > descriptor ring. We probably need to extend ctx to record 
> > > > > > > > > > > such information.
> > > > > > > > > > Above code doesn't depend on the information in the 
> > > > > > > > > > descriptor
> > > > > > > > > > ring. The vq->desc_state[] is the extended ctx.
> > > > > > > > > > 
> > > > > > > > > > Best regards,
> > > > > > > > > > Tiwei Bie
> > > > > > > > > Yes, but desc is a pointer to descriptor ring I think so
> > > > > > > > > vring_unmap_one_packed() still depends on the content of 
> > > > > > > > > descriptor ring?
> > > > > > > > > 
> > > > > > > > I got your point now. I think it makes sense to reserve
> > > > > > > > the bits of the addr field. Driver shouldn't try to get
> > > > > > > > addrs from the descriptors when cleanup the descriptors
> > > > > > > > no matter whether we support out-of-order or not.
> > > > > > > Maybe I was wrong, but I remember spec mentioned something like 
> > > > > > > this.
> > > > > > You're right. Spec mentioned this. I was just repeating
> > > > > > the spec to emphasize that it does make sense. :)
> > > > > > 
> > > > > > > > But combining it with the out-of-order support, it will
> > > > > > > > mean that the driver still needs to maintain a desc/ctx
> > > > > > > > list that is very similar to the desc ring in the split
> > > > > > > > ring. I'm not quite sure whether it's something we want.
> > > > > > > > If it is true, I'll do it. So do you think we also want
> > > > > > > > to maintain such a desc/ctx list for packed ring?
> > > > > > > To make it work for OOO backends I think we need something like 
> > > > > > > this
> > > > > > > (hardware NIC drivers are usually have something like this).
> > > > > > Which hardware NIC drivers have this?
> > > > > It's quite common I think, e.g driver track e.g dma addr and page frag
> > > > > somewhere. e.g the ring->rx_info in mlx4 driver.
> > > > It seems that I had a misunderstanding on your
> > > > previous comments. I know it's quite common for
> > > > drivers to track e.g. DMA addrs somewhere (and
> > > > I think one reason behind this is that they want
> > > > to reuse the bits of addr field).
> > > Yes, we may want this for virtio-net as well in the future.
> > > 
> > > >But tracking
> > > > addrs somewhere doesn't means supporting OOO.
> > > > I thought you were saying it's quite common for
> > > > hardware NIC drivers to support OOO (i.e. NICs
> > > > will return the descriptors OOO):
> > > > 
> > > > I'm not familiar with mlx4, maybe I'm wrong.
> > > > I just had a quick glance. And I found below
> > > > comments in mlx4_en_process_rx_cq():
> > > > 
> > > > ```
> > > > /* We assume a 1:1 mapping between CQEs and Rx descriptors, so Rx
> > > >* descriptor offset can be deduced from the CQE index instead of
> > > >* reading 'cqe->index' */
> > > > index = cq->mcq.cons_index & ring->size_mask;
> > > > cqe = mlx4_en_get_cqe(cq->buf, index, priv->cqe_size) + factor;
> > > > ```
> > > > 
> > > > It seems that although they have a completion
> > > > queue, they are still using the ring in order.
> > > I guess so (at least from the above bits). Git grep -i "out of order" in
> > > 

Re: [RFC v4 3/5] virtio_ring: add packed ring support

2018-05-18 Thread Jason Wang



On 2018年05月18日 22:33, Tiwei Bie wrote:

On Fri, May 18, 2018 at 09:17:05PM +0800, Jason Wang wrote:

On 2018年05月18日 19:29, Tiwei Bie wrote:

On Thu, May 17, 2018 at 08:01:52PM +0800, Jason Wang wrote:

On 2018年05月16日 22:33, Tiwei Bie wrote:

On Wed, May 16, 2018 at 10:05:44PM +0800, Jason Wang wrote:

On 2018年05月16日 21:45, Tiwei Bie wrote:

On Wed, May 16, 2018 at 08:51:43PM +0800, Jason Wang wrote:

On 2018年05月16日 20:39, Tiwei Bie wrote:

On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote:

On 2018年05月16日 16:37, Tiwei Bie wrote:

[...]

+static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int head,
+ unsigned int id, void **ctx)
+{
+   struct vring_packed_desc *desc;
+   unsigned int i, j;
+
+   /* Clear data ptr. */
+   vq->desc_state[id].data = NULL;
+
+   i = head;
+
+   for (j = 0; j < vq->desc_state[id].num; j++) {
+   desc = >vring_packed.desc[i];
+   vring_unmap_one_packed(vq, desc);

As mentioned in previous discussion, this probably won't work for the case
of out of order completion since it depends on the information in the
descriptor ring. We probably need to extend ctx to record such information.

Above code doesn't depend on the information in the descriptor
ring. The vq->desc_state[] is the extended ctx.

Best regards,
Tiwei Bie

Yes, but desc is a pointer to descriptor ring I think so
vring_unmap_one_packed() still depends on the content of descriptor ring?


I got your point now. I think it makes sense to reserve
the bits of the addr field. Driver shouldn't try to get
addrs from the descriptors when cleanup the descriptors
no matter whether we support out-of-order or not.

Maybe I was wrong, but I remember spec mentioned something like this.

You're right. Spec mentioned this. I was just repeating
the spec to emphasize that it does make sense. :)


But combining it with the out-of-order support, it will
mean that the driver still needs to maintain a desc/ctx
list that is very similar to the desc ring in the split
ring. I'm not quite sure whether it's something we want.
If it is true, I'll do it. So do you think we also want
to maintain such a desc/ctx list for packed ring?

To make it work for OOO backends I think we need something like this
(hardware NIC drivers are usually have something like this).

Which hardware NIC drivers have this?

It's quite common I think, e.g driver track e.g dma addr and page frag
somewhere. e.g the ring->rx_info in mlx4 driver.

It seems that I had a misunderstanding on your
previous comments. I know it's quite common for
drivers to track e.g. DMA addrs somewhere (and
I think one reason behind this is that they want
to reuse the bits of addr field).

Yes, we may want this for virtio-net as well in the future.


   But tracking
addrs somewhere doesn't means supporting OOO.
I thought you were saying it's quite common for
hardware NIC drivers to support OOO (i.e. NICs
will return the descriptors OOO):

I'm not familiar with mlx4, maybe I'm wrong.
I just had a quick glance. And I found below
comments in mlx4_en_process_rx_cq():

```
/* We assume a 1:1 mapping between CQEs and Rx descriptors, so Rx
   * descriptor offset can be deduced from the CQE index instead of
   * reading 'cqe->index' */
index = cq->mcq.cons_index & ring->size_mask;
cqe = mlx4_en_get_cqe(cq->buf, index, priv->cqe_size) + factor;
```

It seems that although they have a completion
queue, they are still using the ring in order.

I guess so (at least from the above bits). Git grep -i "out of order" in
drivers/net gives some hints. Looks like there're few deivces do this.


I guess maybe storage device may want OOO.

Right, some iSCSI did.

But tracking them elsewhere is not only for OOO.

Spec said:

for element address

"
In a used descriptor, Element Address is unused.
"

for Next flag:

"
For example, if descriptors are used in the same order in which they are
made available, this will result in
the used descriptor overwriting the first available descriptor in the list,
the used descriptor for the next list
overwriting the first available descriptor in the next list, etc.
"

for in order completion:

"
This will result in the used descriptor overwriting the first available
descriptor in the batch, the used descriptor
for the next batch overwriting the first available descriptor in the next
batch, etc.
"

So:

- It's an alignment to the spec
- device may (or should) overwrite the descriptor make also make address
field useless.

You didn't get my point...


I don't hope so.


I agreed driver should track the DMA addrs or some
other necessary things from the very beginning. And
I also repeated the spec to emphasize that it does
make sense. And I'd like to do that.

What I was saying is that, to support OOO, we may
need to manage these context (which saves DMA addrs
etc) via a list which is similar to the desc list
maintained via `next` in split ring instead of an
array whose elements always 

Re: [RFC v4 3/5] virtio_ring: add packed ring support

2018-05-18 Thread Jason Wang



On 2018年05月18日 22:33, Tiwei Bie wrote:

On Fri, May 18, 2018 at 09:17:05PM +0800, Jason Wang wrote:

On 2018年05月18日 19:29, Tiwei Bie wrote:

On Thu, May 17, 2018 at 08:01:52PM +0800, Jason Wang wrote:

On 2018年05月16日 22:33, Tiwei Bie wrote:

On Wed, May 16, 2018 at 10:05:44PM +0800, Jason Wang wrote:

On 2018年05月16日 21:45, Tiwei Bie wrote:

On Wed, May 16, 2018 at 08:51:43PM +0800, Jason Wang wrote:

On 2018年05月16日 20:39, Tiwei Bie wrote:

On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote:

On 2018年05月16日 16:37, Tiwei Bie wrote:

[...]

+static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int head,
+ unsigned int id, void **ctx)
+{
+   struct vring_packed_desc *desc;
+   unsigned int i, j;
+
+   /* Clear data ptr. */
+   vq->desc_state[id].data = NULL;
+
+   i = head;
+
+   for (j = 0; j < vq->desc_state[id].num; j++) {
+   desc = >vring_packed.desc[i];
+   vring_unmap_one_packed(vq, desc);

As mentioned in previous discussion, this probably won't work for the case
of out of order completion since it depends on the information in the
descriptor ring. We probably need to extend ctx to record such information.

Above code doesn't depend on the information in the descriptor
ring. The vq->desc_state[] is the extended ctx.

Best regards,
Tiwei Bie

Yes, but desc is a pointer to descriptor ring I think so
vring_unmap_one_packed() still depends on the content of descriptor ring?


I got your point now. I think it makes sense to reserve
the bits of the addr field. Driver shouldn't try to get
addrs from the descriptors when cleanup the descriptors
no matter whether we support out-of-order or not.

Maybe I was wrong, but I remember spec mentioned something like this.

You're right. Spec mentioned this. I was just repeating
the spec to emphasize that it does make sense. :)


But combining it with the out-of-order support, it will
mean that the driver still needs to maintain a desc/ctx
list that is very similar to the desc ring in the split
ring. I'm not quite sure whether it's something we want.
If it is true, I'll do it. So do you think we also want
to maintain such a desc/ctx list for packed ring?

To make it work for OOO backends I think we need something like this
(hardware NIC drivers are usually have something like this).

Which hardware NIC drivers have this?

It's quite common I think, e.g driver track e.g dma addr and page frag
somewhere. e.g the ring->rx_info in mlx4 driver.

It seems that I had a misunderstanding on your
previous comments. I know it's quite common for
drivers to track e.g. DMA addrs somewhere (and
I think one reason behind this is that they want
to reuse the bits of addr field).

Yes, we may want this for virtio-net as well in the future.


   But tracking
addrs somewhere doesn't means supporting OOO.
I thought you were saying it's quite common for
hardware NIC drivers to support OOO (i.e. NICs
will return the descriptors OOO):

I'm not familiar with mlx4, maybe I'm wrong.
I just had a quick glance. And I found below
comments in mlx4_en_process_rx_cq():

```
/* We assume a 1:1 mapping between CQEs and Rx descriptors, so Rx
   * descriptor offset can be deduced from the CQE index instead of
   * reading 'cqe->index' */
index = cq->mcq.cons_index & ring->size_mask;
cqe = mlx4_en_get_cqe(cq->buf, index, priv->cqe_size) + factor;
```

It seems that although they have a completion
queue, they are still using the ring in order.

I guess so (at least from the above bits). Git grep -i "out of order" in
drivers/net gives some hints. Looks like there're few deivces do this.


I guess maybe storage device may want OOO.

Right, some iSCSI did.

But tracking them elsewhere is not only for OOO.

Spec said:

for element address

"
In a used descriptor, Element Address is unused.
"

for Next flag:

"
For example, if descriptors are used in the same order in which they are
made available, this will result in
the used descriptor overwriting the first available descriptor in the list,
the used descriptor for the next list
overwriting the first available descriptor in the next list, etc.
"

for in order completion:

"
This will result in the used descriptor overwriting the first available
descriptor in the batch, the used descriptor
for the next batch overwriting the first available descriptor in the next
batch, etc.
"

So:

- It's an alignment to the spec
- device may (or should) overwrite the descriptor make also make address
field useless.

You didn't get my point...


I don't hope so.


I agreed driver should track the DMA addrs or some
other necessary things from the very beginning. And
I also repeated the spec to emphasize that it does
make sense. And I'd like to do that.

What I was saying is that, to support OOO, we may
need to manage these context (which saves DMA addrs
etc) via a list which is similar to the desc list
maintained via `next` in split ring instead of an
array whose elements always 

Re: [RFC v4 3/5] virtio_ring: add packed ring support

2018-05-18 Thread Tiwei Bie
On Fri, May 18, 2018 at 09:17:05PM +0800, Jason Wang wrote:
> On 2018年05月18日 19:29, Tiwei Bie wrote:
> > On Thu, May 17, 2018 at 08:01:52PM +0800, Jason Wang wrote:
> > > On 2018年05月16日 22:33, Tiwei Bie wrote:
> > > > On Wed, May 16, 2018 at 10:05:44PM +0800, Jason Wang wrote:
> > > > > On 2018年05月16日 21:45, Tiwei Bie wrote:
> > > > > > On Wed, May 16, 2018 at 08:51:43PM +0800, Jason Wang wrote:
> > > > > > > On 2018年05月16日 20:39, Tiwei Bie wrote:
> > > > > > > > On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote:
> > > > > > > > > On 2018年05月16日 16:37, Tiwei Bie wrote:
> > > > [...]
> > > > > > > > > > +static void detach_buf_packed(struct vring_virtqueue *vq, 
> > > > > > > > > > unsigned int head,
> > > > > > > > > > + unsigned int id, void **ctx)
> > > > > > > > > > +{
> > > > > > > > > > +   struct vring_packed_desc *desc;
> > > > > > > > > > +   unsigned int i, j;
> > > > > > > > > > +
> > > > > > > > > > +   /* Clear data ptr. */
> > > > > > > > > > +   vq->desc_state[id].data = NULL;
> > > > > > > > > > +
> > > > > > > > > > +   i = head;
> > > > > > > > > > +
> > > > > > > > > > +   for (j = 0; j < vq->desc_state[id].num; j++) {
> > > > > > > > > > +   desc = >vring_packed.desc[i];
> > > > > > > > > > +   vring_unmap_one_packed(vq, desc);
> > > > > > > > > As mentioned in previous discussion, this probably won't work 
> > > > > > > > > for the case
> > > > > > > > > of out of order completion since it depends on the 
> > > > > > > > > information in the
> > > > > > > > > descriptor ring. We probably need to extend ctx to record 
> > > > > > > > > such information.
> > > > > > > > Above code doesn't depend on the information in the descriptor
> > > > > > > > ring. The vq->desc_state[] is the extended ctx.
> > > > > > > > 
> > > > > > > > Best regards,
> > > > > > > > Tiwei Bie
> > > > > > > Yes, but desc is a pointer to descriptor ring I think so
> > > > > > > vring_unmap_one_packed() still depends on the content of 
> > > > > > > descriptor ring?
> > > > > > > 
> > > > > > I got your point now. I think it makes sense to reserve
> > > > > > the bits of the addr field. Driver shouldn't try to get
> > > > > > addrs from the descriptors when cleanup the descriptors
> > > > > > no matter whether we support out-of-order or not.
> > > > > Maybe I was wrong, but I remember spec mentioned something like this.
> > > > You're right. Spec mentioned this. I was just repeating
> > > > the spec to emphasize that it does make sense. :)
> > > > 
> > > > > > But combining it with the out-of-order support, it will
> > > > > > mean that the driver still needs to maintain a desc/ctx
> > > > > > list that is very similar to the desc ring in the split
> > > > > > ring. I'm not quite sure whether it's something we want.
> > > > > > If it is true, I'll do it. So do you think we also want
> > > > > > to maintain such a desc/ctx list for packed ring?
> > > > > To make it work for OOO backends I think we need something like this
> > > > > (hardware NIC drivers are usually have something like this).
> > > > Which hardware NIC drivers have this?
> > > It's quite common I think, e.g driver track e.g dma addr and page frag
> > > somewhere. e.g the ring->rx_info in mlx4 driver.
> > It seems that I had a misunderstanding on your
> > previous comments. I know it's quite common for
> > drivers to track e.g. DMA addrs somewhere (and
> > I think one reason behind this is that they want
> > to reuse the bits of addr field).
> 
> Yes, we may want this for virtio-net as well in the future.
> 
> >   But tracking
> > addrs somewhere doesn't means supporting OOO.
> > I thought you were saying it's quite common for
> > hardware NIC drivers to support OOO (i.e. NICs
> > will return the descriptors OOO):
> > 
> > I'm not familiar with mlx4, maybe I'm wrong.
> > I just had a quick glance. And I found below
> > comments in mlx4_en_process_rx_cq():
> > 
> > ```
> > /* We assume a 1:1 mapping between CQEs and Rx descriptors, so Rx
> >   * descriptor offset can be deduced from the CQE index instead of
> >   * reading 'cqe->index' */
> > index = cq->mcq.cons_index & ring->size_mask;
> > cqe = mlx4_en_get_cqe(cq->buf, index, priv->cqe_size) + factor;
> > ```
> > 
> > It seems that although they have a completion
> > queue, they are still using the ring in order.
> 
> I guess so (at least from the above bits). Git grep -i "out of order" in
> drivers/net gives some hints. Looks like there're few deivces do this.
> 
> > I guess maybe storage device may want OOO.
> 
> Right, some iSCSI did.
> 
> But tracking them elsewhere is not only for OOO.
> 
> Spec said:
> 
> for element address
> 
> "
> In a used descriptor, Element Address is unused.
> "
> 
> for Next flag:
> 
> "
> For example, if descriptors are used in the same order in which they are
> made available, this will result in
> the used descriptor overwriting the first available descriptor in the list,
> the used descriptor for the next list
> 

Re: [RFC v4 3/5] virtio_ring: add packed ring support

2018-05-18 Thread Tiwei Bie
On Fri, May 18, 2018 at 09:17:05PM +0800, Jason Wang wrote:
> On 2018年05月18日 19:29, Tiwei Bie wrote:
> > On Thu, May 17, 2018 at 08:01:52PM +0800, Jason Wang wrote:
> > > On 2018年05月16日 22:33, Tiwei Bie wrote:
> > > > On Wed, May 16, 2018 at 10:05:44PM +0800, Jason Wang wrote:
> > > > > On 2018年05月16日 21:45, Tiwei Bie wrote:
> > > > > > On Wed, May 16, 2018 at 08:51:43PM +0800, Jason Wang wrote:
> > > > > > > On 2018年05月16日 20:39, Tiwei Bie wrote:
> > > > > > > > On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote:
> > > > > > > > > On 2018年05月16日 16:37, Tiwei Bie wrote:
> > > > [...]
> > > > > > > > > > +static void detach_buf_packed(struct vring_virtqueue *vq, 
> > > > > > > > > > unsigned int head,
> > > > > > > > > > + unsigned int id, void **ctx)
> > > > > > > > > > +{
> > > > > > > > > > +   struct vring_packed_desc *desc;
> > > > > > > > > > +   unsigned int i, j;
> > > > > > > > > > +
> > > > > > > > > > +   /* Clear data ptr. */
> > > > > > > > > > +   vq->desc_state[id].data = NULL;
> > > > > > > > > > +
> > > > > > > > > > +   i = head;
> > > > > > > > > > +
> > > > > > > > > > +   for (j = 0; j < vq->desc_state[id].num; j++) {
> > > > > > > > > > +   desc = >vring_packed.desc[i];
> > > > > > > > > > +   vring_unmap_one_packed(vq, desc);
> > > > > > > > > As mentioned in previous discussion, this probably won't work 
> > > > > > > > > for the case
> > > > > > > > > of out of order completion since it depends on the 
> > > > > > > > > information in the
> > > > > > > > > descriptor ring. We probably need to extend ctx to record 
> > > > > > > > > such information.
> > > > > > > > Above code doesn't depend on the information in the descriptor
> > > > > > > > ring. The vq->desc_state[] is the extended ctx.
> > > > > > > > 
> > > > > > > > Best regards,
> > > > > > > > Tiwei Bie
> > > > > > > Yes, but desc is a pointer to descriptor ring I think so
> > > > > > > vring_unmap_one_packed() still depends on the content of 
> > > > > > > descriptor ring?
> > > > > > > 
> > > > > > I got your point now. I think it makes sense to reserve
> > > > > > the bits of the addr field. Driver shouldn't try to get
> > > > > > addrs from the descriptors when cleanup the descriptors
> > > > > > no matter whether we support out-of-order or not.
> > > > > Maybe I was wrong, but I remember spec mentioned something like this.
> > > > You're right. Spec mentioned this. I was just repeating
> > > > the spec to emphasize that it does make sense. :)
> > > > 
> > > > > > But combining it with the out-of-order support, it will
> > > > > > mean that the driver still needs to maintain a desc/ctx
> > > > > > list that is very similar to the desc ring in the split
> > > > > > ring. I'm not quite sure whether it's something we want.
> > > > > > If it is true, I'll do it. So do you think we also want
> > > > > > to maintain such a desc/ctx list for packed ring?
> > > > > To make it work for OOO backends I think we need something like this
> > > > > (hardware NIC drivers are usually have something like this).
> > > > Which hardware NIC drivers have this?
> > > It's quite common I think, e.g driver track e.g dma addr and page frag
> > > somewhere. e.g the ring->rx_info in mlx4 driver.
> > It seems that I had a misunderstanding on your
> > previous comments. I know it's quite common for
> > drivers to track e.g. DMA addrs somewhere (and
> > I think one reason behind this is that they want
> > to reuse the bits of addr field).
> 
> Yes, we may want this for virtio-net as well in the future.
> 
> >   But tracking
> > addrs somewhere doesn't means supporting OOO.
> > I thought you were saying it's quite common for
> > hardware NIC drivers to support OOO (i.e. NICs
> > will return the descriptors OOO):
> > 
> > I'm not familiar with mlx4, maybe I'm wrong.
> > I just had a quick glance. And I found below
> > comments in mlx4_en_process_rx_cq():
> > 
> > ```
> > /* We assume a 1:1 mapping between CQEs and Rx descriptors, so Rx
> >   * descriptor offset can be deduced from the CQE index instead of
> >   * reading 'cqe->index' */
> > index = cq->mcq.cons_index & ring->size_mask;
> > cqe = mlx4_en_get_cqe(cq->buf, index, priv->cqe_size) + factor;
> > ```
> > 
> > It seems that although they have a completion
> > queue, they are still using the ring in order.
> 
> I guess so (at least from the above bits). Git grep -i "out of order" in
> drivers/net gives some hints. Looks like there're few deivces do this.
> 
> > I guess maybe storage device may want OOO.
> 
> Right, some iSCSI did.
> 
> But tracking them elsewhere is not only for OOO.
> 
> Spec said:
> 
> for element address
> 
> "
> In a used descriptor, Element Address is unused.
> "
> 
> for Next flag:
> 
> "
> For example, if descriptors are used in the same order in which they are
> made available, this will result in
> the used descriptor overwriting the first available descriptor in the list,
> the used descriptor for the next list
> 

Re: [RFC v4 3/5] virtio_ring: add packed ring support

2018-05-18 Thread Jason Wang



On 2018年05月18日 19:29, Tiwei Bie wrote:

On Thu, May 17, 2018 at 08:01:52PM +0800, Jason Wang wrote:

On 2018年05月16日 22:33, Tiwei Bie wrote:

On Wed, May 16, 2018 at 10:05:44PM +0800, Jason Wang wrote:

On 2018年05月16日 21:45, Tiwei Bie wrote:

On Wed, May 16, 2018 at 08:51:43PM +0800, Jason Wang wrote:

On 2018年05月16日 20:39, Tiwei Bie wrote:

On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote:

On 2018年05月16日 16:37, Tiwei Bie wrote:

[...]

+static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int head,
+ unsigned int id, void **ctx)
+{
+   struct vring_packed_desc *desc;
+   unsigned int i, j;
+
+   /* Clear data ptr. */
+   vq->desc_state[id].data = NULL;
+
+   i = head;
+
+   for (j = 0; j < vq->desc_state[id].num; j++) {
+   desc = >vring_packed.desc[i];
+   vring_unmap_one_packed(vq, desc);

As mentioned in previous discussion, this probably won't work for the case
of out of order completion since it depends on the information in the
descriptor ring. We probably need to extend ctx to record such information.

Above code doesn't depend on the information in the descriptor
ring. The vq->desc_state[] is the extended ctx.

Best regards,
Tiwei Bie

Yes, but desc is a pointer to descriptor ring I think so
vring_unmap_one_packed() still depends on the content of descriptor ring?


I got your point now. I think it makes sense to reserve
the bits of the addr field. Driver shouldn't try to get
addrs from the descriptors when cleanup the descriptors
no matter whether we support out-of-order or not.

Maybe I was wrong, but I remember spec mentioned something like this.

You're right. Spec mentioned this. I was just repeating
the spec to emphasize that it does make sense. :)


But combining it with the out-of-order support, it will
mean that the driver still needs to maintain a desc/ctx
list that is very similar to the desc ring in the split
ring. I'm not quite sure whether it's something we want.
If it is true, I'll do it. So do you think we also want
to maintain such a desc/ctx list for packed ring?

To make it work for OOO backends I think we need something like this
(hardware NIC drivers are usually have something like this).

Which hardware NIC drivers have this?

It's quite common I think, e.g driver track e.g dma addr and page frag
somewhere. e.g the ring->rx_info in mlx4 driver.

It seems that I had a misunderstanding on your
previous comments. I know it's quite common for
drivers to track e.g. DMA addrs somewhere (and
I think one reason behind this is that they want
to reuse the bits of addr field).


Yes, we may want this for virtio-net as well in the future.


  But tracking
addrs somewhere doesn't means supporting OOO.
I thought you were saying it's quite common for
hardware NIC drivers to support OOO (i.e. NICs
will return the descriptors OOO):

I'm not familiar with mlx4, maybe I'm wrong.
I just had a quick glance. And I found below
comments in mlx4_en_process_rx_cq():

```
/* We assume a 1:1 mapping between CQEs and Rx descriptors, so Rx
  * descriptor offset can be deduced from the CQE index instead of
  * reading 'cqe->index' */
index = cq->mcq.cons_index & ring->size_mask;
cqe = mlx4_en_get_cqe(cq->buf, index, priv->cqe_size) + factor;
```

It seems that although they have a completion
queue, they are still using the ring in order.


I guess so (at least from the above bits). Git grep -i "out of order" in 
drivers/net gives some hints. Looks like there're few deivces do this.



I guess maybe storage device may want OOO.


Right, some iSCSI did.

But tracking them elsewhere is not only for OOO.

Spec said:

for element address

"
In a used descriptor, Element Address is unused.
"

for Next flag:

"
For example, if descriptors are used in the same order in which they are 
made available, this will result in
the used descriptor overwriting the first available descriptor in the 
list, the used descriptor for the next list

overwriting the first available descriptor in the next list, etc.
"

for in order completion:

"
This will result in the used descriptor overwriting the first available 
descriptor in the batch, the used descriptor
for the next batch overwriting the first available descriptor in the 
next batch, etc.

"

So:

- It's an alignment to the spec
- device may (or should) overwrite the descriptor make also make address 
field useless.


Thanks



Best regards,
Tiwei Bie


Thanks


Not for the patch, but it looks like having a OUT_OF_ORDER feature bit is
much more simpler to be started with.

+1

Best regards,
Tiwei Bie




Re: [RFC v4 3/5] virtio_ring: add packed ring support

2018-05-18 Thread Jason Wang



On 2018年05月18日 19:29, Tiwei Bie wrote:

On Thu, May 17, 2018 at 08:01:52PM +0800, Jason Wang wrote:

On 2018年05月16日 22:33, Tiwei Bie wrote:

On Wed, May 16, 2018 at 10:05:44PM +0800, Jason Wang wrote:

On 2018年05月16日 21:45, Tiwei Bie wrote:

On Wed, May 16, 2018 at 08:51:43PM +0800, Jason Wang wrote:

On 2018年05月16日 20:39, Tiwei Bie wrote:

On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote:

On 2018年05月16日 16:37, Tiwei Bie wrote:

[...]

+static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int head,
+ unsigned int id, void **ctx)
+{
+   struct vring_packed_desc *desc;
+   unsigned int i, j;
+
+   /* Clear data ptr. */
+   vq->desc_state[id].data = NULL;
+
+   i = head;
+
+   for (j = 0; j < vq->desc_state[id].num; j++) {
+   desc = >vring_packed.desc[i];
+   vring_unmap_one_packed(vq, desc);

As mentioned in previous discussion, this probably won't work for the case
of out of order completion since it depends on the information in the
descriptor ring. We probably need to extend ctx to record such information.

Above code doesn't depend on the information in the descriptor
ring. The vq->desc_state[] is the extended ctx.

Best regards,
Tiwei Bie

Yes, but desc is a pointer to descriptor ring I think so
vring_unmap_one_packed() still depends on the content of descriptor ring?


I got your point now. I think it makes sense to reserve
the bits of the addr field. Driver shouldn't try to get
addrs from the descriptors when cleanup the descriptors
no matter whether we support out-of-order or not.

Maybe I was wrong, but I remember spec mentioned something like this.

You're right. Spec mentioned this. I was just repeating
the spec to emphasize that it does make sense. :)


But combining it with the out-of-order support, it will
mean that the driver still needs to maintain a desc/ctx
list that is very similar to the desc ring in the split
ring. I'm not quite sure whether it's something we want.
If it is true, I'll do it. So do you think we also want
to maintain such a desc/ctx list for packed ring?

To make it work for OOO backends I think we need something like this
(hardware NIC drivers are usually have something like this).

Which hardware NIC drivers have this?

It's quite common I think, e.g driver track e.g dma addr and page frag
somewhere. e.g the ring->rx_info in mlx4 driver.

It seems that I had a misunderstanding on your
previous comments. I know it's quite common for
drivers to track e.g. DMA addrs somewhere (and
I think one reason behind this is that they want
to reuse the bits of addr field).


Yes, we may want this for virtio-net as well in the future.


  But tracking
addrs somewhere doesn't means supporting OOO.
I thought you were saying it's quite common for
hardware NIC drivers to support OOO (i.e. NICs
will return the descriptors OOO):

I'm not familiar with mlx4, maybe I'm wrong.
I just had a quick glance. And I found below
comments in mlx4_en_process_rx_cq():

```
/* We assume a 1:1 mapping between CQEs and Rx descriptors, so Rx
  * descriptor offset can be deduced from the CQE index instead of
  * reading 'cqe->index' */
index = cq->mcq.cons_index & ring->size_mask;
cqe = mlx4_en_get_cqe(cq->buf, index, priv->cqe_size) + factor;
```

It seems that although they have a completion
queue, they are still using the ring in order.


I guess so (at least from the above bits). Git grep -i "out of order" in 
drivers/net gives some hints. Looks like there're few deivces do this.



I guess maybe storage device may want OOO.


Right, some iSCSI did.

But tracking them elsewhere is not only for OOO.

Spec said:

for element address

"
In a used descriptor, Element Address is unused.
"

for Next flag:

"
For example, if descriptors are used in the same order in which they are 
made available, this will result in
the used descriptor overwriting the first available descriptor in the 
list, the used descriptor for the next list

overwriting the first available descriptor in the next list, etc.
"

for in order completion:

"
This will result in the used descriptor overwriting the first available 
descriptor in the batch, the used descriptor
for the next batch overwriting the first available descriptor in the 
next batch, etc.

"

So:

- It's an alignment to the spec
- device may (or should) overwrite the descriptor make also make address 
field useless.


Thanks



Best regards,
Tiwei Bie


Thanks


Not for the patch, but it looks like having a OUT_OF_ORDER feature bit is
much more simpler to be started with.

+1

Best regards,
Tiwei Bie




Re: [RFC v4 3/5] virtio_ring: add packed ring support

2018-05-18 Thread Tiwei Bie
On Thu, May 17, 2018 at 08:01:52PM +0800, Jason Wang wrote:
> On 2018年05月16日 22:33, Tiwei Bie wrote:
> > On Wed, May 16, 2018 at 10:05:44PM +0800, Jason Wang wrote:
> > > On 2018年05月16日 21:45, Tiwei Bie wrote:
> > > > On Wed, May 16, 2018 at 08:51:43PM +0800, Jason Wang wrote:
> > > > > On 2018年05月16日 20:39, Tiwei Bie wrote:
> > > > > > On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote:
> > > > > > > On 2018年05月16日 16:37, Tiwei Bie wrote:
> > [...]
> > > > > > > > +static void detach_buf_packed(struct vring_virtqueue *vq, 
> > > > > > > > unsigned int head,
> > > > > > > > + unsigned int id, void **ctx)
> > > > > > > > +{
> > > > > > > > +   struct vring_packed_desc *desc;
> > > > > > > > +   unsigned int i, j;
> > > > > > > > +
> > > > > > > > +   /* Clear data ptr. */
> > > > > > > > +   vq->desc_state[id].data = NULL;
> > > > > > > > +
> > > > > > > > +   i = head;
> > > > > > > > +
> > > > > > > > +   for (j = 0; j < vq->desc_state[id].num; j++) {
> > > > > > > > +   desc = >vring_packed.desc[i];
> > > > > > > > +   vring_unmap_one_packed(vq, desc);
> > > > > > > As mentioned in previous discussion, this probably won't work for 
> > > > > > > the case
> > > > > > > of out of order completion since it depends on the information in 
> > > > > > > the
> > > > > > > descriptor ring. We probably need to extend ctx to record such 
> > > > > > > information.
> > > > > > Above code doesn't depend on the information in the descriptor
> > > > > > ring. The vq->desc_state[] is the extended ctx.
> > > > > > 
> > > > > > Best regards,
> > > > > > Tiwei Bie
> > > > > Yes, but desc is a pointer to descriptor ring I think so
> > > > > vring_unmap_one_packed() still depends on the content of descriptor 
> > > > > ring?
> > > > > 
> > > > I got your point now. I think it makes sense to reserve
> > > > the bits of the addr field. Driver shouldn't try to get
> > > > addrs from the descriptors when cleanup the descriptors
> > > > no matter whether we support out-of-order or not.
> > > Maybe I was wrong, but I remember spec mentioned something like this.
> > You're right. Spec mentioned this. I was just repeating
> > the spec to emphasize that it does make sense. :)
> > 
> > > > But combining it with the out-of-order support, it will
> > > > mean that the driver still needs to maintain a desc/ctx
> > > > list that is very similar to the desc ring in the split
> > > > ring. I'm not quite sure whether it's something we want.
> > > > If it is true, I'll do it. So do you think we also want
> > > > to maintain such a desc/ctx list for packed ring?
> > > To make it work for OOO backends I think we need something like this
> > > (hardware NIC drivers are usually have something like this).
> > Which hardware NIC drivers have this?
> 
> It's quite common I think, e.g driver track e.g dma addr and page frag
> somewhere. e.g the ring->rx_info in mlx4 driver.

It seems that I had a misunderstanding on your
previous comments. I know it's quite common for
drivers to track e.g. DMA addrs somewhere (and
I think one reason behind this is that they want
to reuse the bits of addr field). But tracking
addrs somewhere doesn't means supporting OOO.
I thought you were saying it's quite common for
hardware NIC drivers to support OOO (i.e. NICs
will return the descriptors OOO):

I'm not familiar with mlx4, maybe I'm wrong.
I just had a quick glance. And I found below
comments in mlx4_en_process_rx_cq():

```
/* We assume a 1:1 mapping between CQEs and Rx descriptors, so Rx
 * descriptor offset can be deduced from the CQE index instead of
 * reading 'cqe->index' */
index = cq->mcq.cons_index & ring->size_mask;
cqe = mlx4_en_get_cqe(cq->buf, index, priv->cqe_size) + factor;
```

It seems that although they have a completion
queue, they are still using the ring in order.

I guess maybe storage device may want OOO.

Best regards,
Tiwei Bie

> 
> Thanks
> 
> > 
> > > Not for the patch, but it looks like having a OUT_OF_ORDER feature bit is
> > > much more simpler to be started with.
> > +1
> > 
> > Best regards,
> > Tiwei Bie
> 


Re: [RFC v4 3/5] virtio_ring: add packed ring support

2018-05-18 Thread Tiwei Bie
On Thu, May 17, 2018 at 08:01:52PM +0800, Jason Wang wrote:
> On 2018年05月16日 22:33, Tiwei Bie wrote:
> > On Wed, May 16, 2018 at 10:05:44PM +0800, Jason Wang wrote:
> > > On 2018年05月16日 21:45, Tiwei Bie wrote:
> > > > On Wed, May 16, 2018 at 08:51:43PM +0800, Jason Wang wrote:
> > > > > On 2018年05月16日 20:39, Tiwei Bie wrote:
> > > > > > On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote:
> > > > > > > On 2018年05月16日 16:37, Tiwei Bie wrote:
> > [...]
> > > > > > > > +static void detach_buf_packed(struct vring_virtqueue *vq, 
> > > > > > > > unsigned int head,
> > > > > > > > + unsigned int id, void **ctx)
> > > > > > > > +{
> > > > > > > > +   struct vring_packed_desc *desc;
> > > > > > > > +   unsigned int i, j;
> > > > > > > > +
> > > > > > > > +   /* Clear data ptr. */
> > > > > > > > +   vq->desc_state[id].data = NULL;
> > > > > > > > +
> > > > > > > > +   i = head;
> > > > > > > > +
> > > > > > > > +   for (j = 0; j < vq->desc_state[id].num; j++) {
> > > > > > > > +   desc = >vring_packed.desc[i];
> > > > > > > > +   vring_unmap_one_packed(vq, desc);
> > > > > > > As mentioned in previous discussion, this probably won't work for 
> > > > > > > the case
> > > > > > > of out of order completion since it depends on the information in 
> > > > > > > the
> > > > > > > descriptor ring. We probably need to extend ctx to record such 
> > > > > > > information.
> > > > > > Above code doesn't depend on the information in the descriptor
> > > > > > ring. The vq->desc_state[] is the extended ctx.
> > > > > > 
> > > > > > Best regards,
> > > > > > Tiwei Bie
> > > > > Yes, but desc is a pointer to descriptor ring I think so
> > > > > vring_unmap_one_packed() still depends on the content of descriptor 
> > > > > ring?
> > > > > 
> > > > I got your point now. I think it makes sense to reserve
> > > > the bits of the addr field. Driver shouldn't try to get
> > > > addrs from the descriptors when cleanup the descriptors
> > > > no matter whether we support out-of-order or not.
> > > Maybe I was wrong, but I remember spec mentioned something like this.
> > You're right. Spec mentioned this. I was just repeating
> > the spec to emphasize that it does make sense. :)
> > 
> > > > But combining it with the out-of-order support, it will
> > > > mean that the driver still needs to maintain a desc/ctx
> > > > list that is very similar to the desc ring in the split
> > > > ring. I'm not quite sure whether it's something we want.
> > > > If it is true, I'll do it. So do you think we also want
> > > > to maintain such a desc/ctx list for packed ring?
> > > To make it work for OOO backends I think we need something like this
> > > (hardware NIC drivers are usually have something like this).
> > Which hardware NIC drivers have this?
> 
> It's quite common I think, e.g driver track e.g dma addr and page frag
> somewhere. e.g the ring->rx_info in mlx4 driver.

It seems that I had a misunderstanding on your
previous comments. I know it's quite common for
drivers to track e.g. DMA addrs somewhere (and
I think one reason behind this is that they want
to reuse the bits of addr field). But tracking
addrs somewhere doesn't means supporting OOO.
I thought you were saying it's quite common for
hardware NIC drivers to support OOO (i.e. NICs
will return the descriptors OOO):

I'm not familiar with mlx4, maybe I'm wrong.
I just had a quick glance. And I found below
comments in mlx4_en_process_rx_cq():

```
/* We assume a 1:1 mapping between CQEs and Rx descriptors, so Rx
 * descriptor offset can be deduced from the CQE index instead of
 * reading 'cqe->index' */
index = cq->mcq.cons_index & ring->size_mask;
cqe = mlx4_en_get_cqe(cq->buf, index, priv->cqe_size) + factor;
```

It seems that although they have a completion
queue, they are still using the ring in order.

I guess maybe storage device may want OOO.

Best regards,
Tiwei Bie

> 
> Thanks
> 
> > 
> > > Not for the patch, but it looks like having a OUT_OF_ORDER feature bit is
> > > much more simpler to be started with.
> > +1
> > 
> > Best regards,
> > Tiwei Bie
> 


Re: [RFC v4 3/5] virtio_ring: add packed ring support

2018-05-17 Thread Jason Wang



On 2018年05月16日 22:33, Tiwei Bie wrote:

On Wed, May 16, 2018 at 10:05:44PM +0800, Jason Wang wrote:

On 2018年05月16日 21:45, Tiwei Bie wrote:

On Wed, May 16, 2018 at 08:51:43PM +0800, Jason Wang wrote:

On 2018年05月16日 20:39, Tiwei Bie wrote:

On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote:

On 2018年05月16日 16:37, Tiwei Bie wrote:

[...]

+static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int head,
+ unsigned int id, void **ctx)
+{
+   struct vring_packed_desc *desc;
+   unsigned int i, j;
+
+   /* Clear data ptr. */
+   vq->desc_state[id].data = NULL;
+
+   i = head;
+
+   for (j = 0; j < vq->desc_state[id].num; j++) {
+   desc = >vring_packed.desc[i];
+   vring_unmap_one_packed(vq, desc);

As mentioned in previous discussion, this probably won't work for the case
of out of order completion since it depends on the information in the
descriptor ring. We probably need to extend ctx to record such information.

Above code doesn't depend on the information in the descriptor
ring. The vq->desc_state[] is the extended ctx.

Best regards,
Tiwei Bie

Yes, but desc is a pointer to descriptor ring I think so
vring_unmap_one_packed() still depends on the content of descriptor ring?


I got your point now. I think it makes sense to reserve
the bits of the addr field. Driver shouldn't try to get
addrs from the descriptors when cleanup the descriptors
no matter whether we support out-of-order or not.

Maybe I was wrong, but I remember spec mentioned something like this.

You're right. Spec mentioned this. I was just repeating
the spec to emphasize that it does make sense. :)


But combining it with the out-of-order support, it will
mean that the driver still needs to maintain a desc/ctx
list that is very similar to the desc ring in the split
ring. I'm not quite sure whether it's something we want.
If it is true, I'll do it. So do you think we also want
to maintain such a desc/ctx list for packed ring?

To make it work for OOO backends I think we need something like this
(hardware NIC drivers are usually have something like this).

Which hardware NIC drivers have this?


It's quite common I think, e.g driver track e.g dma addr and page frag 
somewhere. e.g the ring->rx_info in mlx4 driver.


Thanks




Not for the patch, but it looks like having a OUT_OF_ORDER feature bit is
much more simpler to be started with.

+1

Best regards,
Tiwei Bie




Re: [RFC v4 3/5] virtio_ring: add packed ring support

2018-05-17 Thread Jason Wang



On 2018年05月16日 22:33, Tiwei Bie wrote:

On Wed, May 16, 2018 at 10:05:44PM +0800, Jason Wang wrote:

On 2018年05月16日 21:45, Tiwei Bie wrote:

On Wed, May 16, 2018 at 08:51:43PM +0800, Jason Wang wrote:

On 2018年05月16日 20:39, Tiwei Bie wrote:

On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote:

On 2018年05月16日 16:37, Tiwei Bie wrote:

[...]

+static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int head,
+ unsigned int id, void **ctx)
+{
+   struct vring_packed_desc *desc;
+   unsigned int i, j;
+
+   /* Clear data ptr. */
+   vq->desc_state[id].data = NULL;
+
+   i = head;
+
+   for (j = 0; j < vq->desc_state[id].num; j++) {
+   desc = >vring_packed.desc[i];
+   vring_unmap_one_packed(vq, desc);

As mentioned in previous discussion, this probably won't work for the case
of out of order completion since it depends on the information in the
descriptor ring. We probably need to extend ctx to record such information.

Above code doesn't depend on the information in the descriptor
ring. The vq->desc_state[] is the extended ctx.

Best regards,
Tiwei Bie

Yes, but desc is a pointer to descriptor ring I think so
vring_unmap_one_packed() still depends on the content of descriptor ring?


I got your point now. I think it makes sense to reserve
the bits of the addr field. Driver shouldn't try to get
addrs from the descriptors when cleanup the descriptors
no matter whether we support out-of-order or not.

Maybe I was wrong, but I remember spec mentioned something like this.

You're right. Spec mentioned this. I was just repeating
the spec to emphasize that it does make sense. :)


But combining it with the out-of-order support, it will
mean that the driver still needs to maintain a desc/ctx
list that is very similar to the desc ring in the split
ring. I'm not quite sure whether it's something we want.
If it is true, I'll do it. So do you think we also want
to maintain such a desc/ctx list for packed ring?

To make it work for OOO backends I think we need something like this
(hardware NIC drivers are usually have something like this).

Which hardware NIC drivers have this?


It's quite common I think, e.g driver track e.g dma addr and page frag 
somewhere. e.g the ring->rx_info in mlx4 driver.


Thanks




Not for the patch, but it looks like having a OUT_OF_ORDER feature bit is
much more simpler to be started with.

+1

Best regards,
Tiwei Bie




Re: [RFC v4 3/5] virtio_ring: add packed ring support

2018-05-16 Thread Tiwei Bie
On Wed, May 16, 2018 at 10:05:44PM +0800, Jason Wang wrote:
> On 2018年05月16日 21:45, Tiwei Bie wrote:
> > On Wed, May 16, 2018 at 08:51:43PM +0800, Jason Wang wrote:
> > > On 2018年05月16日 20:39, Tiwei Bie wrote:
> > > > On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote:
> > > > > On 2018年05月16日 16:37, Tiwei Bie wrote:
[...]
> > > > > > +static void detach_buf_packed(struct vring_virtqueue *vq, unsigned 
> > > > > > int head,
> > > > > > + unsigned int id, void **ctx)
> > > > > > +{
> > > > > > +   struct vring_packed_desc *desc;
> > > > > > +   unsigned int i, j;
> > > > > > +
> > > > > > +   /* Clear data ptr. */
> > > > > > +   vq->desc_state[id].data = NULL;
> > > > > > +
> > > > > > +   i = head;
> > > > > > +
> > > > > > +   for (j = 0; j < vq->desc_state[id].num; j++) {
> > > > > > +   desc = >vring_packed.desc[i];
> > > > > > +   vring_unmap_one_packed(vq, desc);
> > > > > As mentioned in previous discussion, this probably won't work for the 
> > > > > case
> > > > > of out of order completion since it depends on the information in the
> > > > > descriptor ring. We probably need to extend ctx to record such 
> > > > > information.
> > > > Above code doesn't depend on the information in the descriptor
> > > > ring. The vq->desc_state[] is the extended ctx.
> > > > 
> > > > Best regards,
> > > > Tiwei Bie
> > > Yes, but desc is a pointer to descriptor ring I think so
> > > vring_unmap_one_packed() still depends on the content of descriptor ring?
> > > 
> > I got your point now. I think it makes sense to reserve
> > the bits of the addr field. Driver shouldn't try to get
> > addrs from the descriptors when cleanup the descriptors
> > no matter whether we support out-of-order or not.
> 
> Maybe I was wrong, but I remember spec mentioned something like this.

You're right. Spec mentioned this. I was just repeating
the spec to emphasize that it does make sense. :)

> 
> > 
> > But combining it with the out-of-order support, it will
> > mean that the driver still needs to maintain a desc/ctx
> > list that is very similar to the desc ring in the split
> > ring. I'm not quite sure whether it's something we want.
> > If it is true, I'll do it. So do you think we also want
> > to maintain such a desc/ctx list for packed ring?
> 
> To make it work for OOO backends I think we need something like this
> (hardware NIC drivers are usually have something like this).

Which hardware NIC drivers have this?

> 
> Not for the patch, but it looks like having a OUT_OF_ORDER feature bit is
> much more simpler to be started with.

+1

Best regards,
Tiwei Bie


Re: [RFC v4 3/5] virtio_ring: add packed ring support

2018-05-16 Thread Tiwei Bie
On Wed, May 16, 2018 at 10:05:44PM +0800, Jason Wang wrote:
> On 2018年05月16日 21:45, Tiwei Bie wrote:
> > On Wed, May 16, 2018 at 08:51:43PM +0800, Jason Wang wrote:
> > > On 2018年05月16日 20:39, Tiwei Bie wrote:
> > > > On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote:
> > > > > On 2018年05月16日 16:37, Tiwei Bie wrote:
[...]
> > > > > > +static void detach_buf_packed(struct vring_virtqueue *vq, unsigned 
> > > > > > int head,
> > > > > > + unsigned int id, void **ctx)
> > > > > > +{
> > > > > > +   struct vring_packed_desc *desc;
> > > > > > +   unsigned int i, j;
> > > > > > +
> > > > > > +   /* Clear data ptr. */
> > > > > > +   vq->desc_state[id].data = NULL;
> > > > > > +
> > > > > > +   i = head;
> > > > > > +
> > > > > > +   for (j = 0; j < vq->desc_state[id].num; j++) {
> > > > > > +   desc = >vring_packed.desc[i];
> > > > > > +   vring_unmap_one_packed(vq, desc);
> > > > > As mentioned in previous discussion, this probably won't work for the 
> > > > > case
> > > > > of out of order completion since it depends on the information in the
> > > > > descriptor ring. We probably need to extend ctx to record such 
> > > > > information.
> > > > Above code doesn't depend on the information in the descriptor
> > > > ring. The vq->desc_state[] is the extended ctx.
> > > > 
> > > > Best regards,
> > > > Tiwei Bie
> > > Yes, but desc is a pointer to descriptor ring I think so
> > > vring_unmap_one_packed() still depends on the content of descriptor ring?
> > > 
> > I got your point now. I think it makes sense to reserve
> > the bits of the addr field. Driver shouldn't try to get
> > addrs from the descriptors when cleanup the descriptors
> > no matter whether we support out-of-order or not.
> 
> Maybe I was wrong, but I remember spec mentioned something like this.

You're right. Spec mentioned this. I was just repeating
the spec to emphasize that it does make sense. :)

> 
> > 
> > But combining it with the out-of-order support, it will
> > mean that the driver still needs to maintain a desc/ctx
> > list that is very similar to the desc ring in the split
> > ring. I'm not quite sure whether it's something we want.
> > If it is true, I'll do it. So do you think we also want
> > to maintain such a desc/ctx list for packed ring?
> 
> To make it work for OOO backends I think we need something like this
> (hardware NIC drivers are usually have something like this).

Which hardware NIC drivers have this?

> 
> Not for the patch, but it looks like having a OUT_OF_ORDER feature bit is
> much more simpler to be started with.

+1

Best regards,
Tiwei Bie


Re: [RFC v4 3/5] virtio_ring: add packed ring support

2018-05-16 Thread Jason Wang



On 2018年05月16日 21:45, Tiwei Bie wrote:

On Wed, May 16, 2018 at 08:51:43PM +0800, Jason Wang wrote:

On 2018年05月16日 20:39, Tiwei Bie wrote:

On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote:

On 2018年05月16日 16:37, Tiwei Bie wrote:

[...]

struct vring_virtqueue {
@@ -116,6 +117,9 @@ struct vring_virtqueue {
/* Last written value to driver->flags in
 * guest byte order. */
u16 event_flags_shadow;
+
+   /* ID allocation. */
+   struct idr buffer_id;

I'm not sure idr is fit for the performance critical case here. Need to
measure its performance impact, especially if we have few unused slots.

I'm also not sure.. But fortunately, it should be quite easy
to replace it with something else without changing other code.
If it will really hurt the performance, I'll change it.

We may want to do some benchmarking/profiling to see.

Yeah!


};
};

[...]

+static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int head,
+ unsigned int id, void **ctx)
+{
+   struct vring_packed_desc *desc;
+   unsigned int i, j;
+
+   /* Clear data ptr. */
+   vq->desc_state[id].data = NULL;
+
+   i = head;
+
+   for (j = 0; j < vq->desc_state[id].num; j++) {
+   desc = >vring_packed.desc[i];
+   vring_unmap_one_packed(vq, desc);

As mentioned in previous discussion, this probably won't work for the case
of out of order completion since it depends on the information in the
descriptor ring. We probably need to extend ctx to record such information.

Above code doesn't depend on the information in the descriptor
ring. The vq->desc_state[] is the extended ctx.

Best regards,
Tiwei Bie

Yes, but desc is a pointer to descriptor ring I think so
vring_unmap_one_packed() still depends on the content of descriptor ring?


I got your point now. I think it makes sense to reserve
the bits of the addr field. Driver shouldn't try to get
addrs from the descriptors when cleanup the descriptors
no matter whether we support out-of-order or not.


Maybe I was wrong, but I remember spec mentioned something like this.



But combining it with the out-of-order support, it will
mean that the driver still needs to maintain a desc/ctx
list that is very similar to the desc ring in the split
ring. I'm not quite sure whether it's something we want.
If it is true, I'll do it. So do you think we also want
to maintain such a desc/ctx list for packed ring?


To make it work for OOO backends I think we need something like this 
(hardware NIC drivers are usually have something like this).


Not for the patch, but it looks like having a OUT_OF_ORDER feature bit 
is much more simpler to be started with.


Thanks



Best regards,
Tiwei Bie




Re: [RFC v4 3/5] virtio_ring: add packed ring support

2018-05-16 Thread Jason Wang



On 2018年05月16日 21:45, Tiwei Bie wrote:

On Wed, May 16, 2018 at 08:51:43PM +0800, Jason Wang wrote:

On 2018年05月16日 20:39, Tiwei Bie wrote:

On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote:

On 2018年05月16日 16:37, Tiwei Bie wrote:

[...]

struct vring_virtqueue {
@@ -116,6 +117,9 @@ struct vring_virtqueue {
/* Last written value to driver->flags in
 * guest byte order. */
u16 event_flags_shadow;
+
+   /* ID allocation. */
+   struct idr buffer_id;

I'm not sure idr is fit for the performance critical case here. Need to
measure its performance impact, especially if we have few unused slots.

I'm also not sure.. But fortunately, it should be quite easy
to replace it with something else without changing other code.
If it will really hurt the performance, I'll change it.

We may want to do some benchmarking/profiling to see.

Yeah!


};
};

[...]

+static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int head,
+ unsigned int id, void **ctx)
+{
+   struct vring_packed_desc *desc;
+   unsigned int i, j;
+
+   /* Clear data ptr. */
+   vq->desc_state[id].data = NULL;
+
+   i = head;
+
+   for (j = 0; j < vq->desc_state[id].num; j++) {
+   desc = >vring_packed.desc[i];
+   vring_unmap_one_packed(vq, desc);

As mentioned in previous discussion, this probably won't work for the case
of out of order completion since it depends on the information in the
descriptor ring. We probably need to extend ctx to record such information.

Above code doesn't depend on the information in the descriptor
ring. The vq->desc_state[] is the extended ctx.

Best regards,
Tiwei Bie

Yes, but desc is a pointer to descriptor ring I think so
vring_unmap_one_packed() still depends on the content of descriptor ring?


I got your point now. I think it makes sense to reserve
the bits of the addr field. Driver shouldn't try to get
addrs from the descriptors when cleanup the descriptors
no matter whether we support out-of-order or not.


Maybe I was wrong, but I remember spec mentioned something like this.



But combining it with the out-of-order support, it will
mean that the driver still needs to maintain a desc/ctx
list that is very similar to the desc ring in the split
ring. I'm not quite sure whether it's something we want.
If it is true, I'll do it. So do you think we also want
to maintain such a desc/ctx list for packed ring?


To make it work for OOO backends I think we need something like this 
(hardware NIC drivers are usually have something like this).


Not for the patch, but it looks like having a OUT_OF_ORDER feature bit 
is much more simpler to be started with.


Thanks



Best regards,
Tiwei Bie




Re: [RFC v4 3/5] virtio_ring: add packed ring support

2018-05-16 Thread Tiwei Bie
On Wed, May 16, 2018 at 08:51:43PM +0800, Jason Wang wrote:
> On 2018年05月16日 20:39, Tiwei Bie wrote:
> > On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote:
> > > On 2018年05月16日 16:37, Tiwei Bie wrote:
> > [...]
> > > >struct vring_virtqueue {
> > > > @@ -116,6 +117,9 @@ struct vring_virtqueue {
> > > > /* Last written value to driver->flags in
> > > >  * guest byte order. */
> > > > u16 event_flags_shadow;
> > > > +
> > > > +   /* ID allocation. */
> > > > +   struct idr buffer_id;
> > > I'm not sure idr is fit for the performance critical case here. Need to
> > > measure its performance impact, especially if we have few unused slots.
> > I'm also not sure.. But fortunately, it should be quite easy
> > to replace it with something else without changing other code.
> > If it will really hurt the performance, I'll change it.
> 
> We may want to do some benchmarking/profiling to see.

Yeah!

> 
> > 
> > > > };
> > > > };
> > [...]
> > > > +static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int 
> > > > head,
> > > > + unsigned int id, void **ctx)
> > > > +{
> > > > +   struct vring_packed_desc *desc;
> > > > +   unsigned int i, j;
> > > > +
> > > > +   /* Clear data ptr. */
> > > > +   vq->desc_state[id].data = NULL;
> > > > +
> > > > +   i = head;
> > > > +
> > > > +   for (j = 0; j < vq->desc_state[id].num; j++) {
> > > > +   desc = >vring_packed.desc[i];
> > > > +   vring_unmap_one_packed(vq, desc);
> > > As mentioned in previous discussion, this probably won't work for the case
> > > of out of order completion since it depends on the information in the
> > > descriptor ring. We probably need to extend ctx to record such 
> > > information.
> > Above code doesn't depend on the information in the descriptor
> > ring. The vq->desc_state[] is the extended ctx.
> > 
> > Best regards,
> > Tiwei Bie
> 
> Yes, but desc is a pointer to descriptor ring I think so
> vring_unmap_one_packed() still depends on the content of descriptor ring?
> 

I got your point now. I think it makes sense to reserve
the bits of the addr field. Driver shouldn't try to get
addrs from the descriptors when cleanup the descriptors
no matter whether we support out-of-order or not.

But combining it with the out-of-order support, it will
mean that the driver still needs to maintain a desc/ctx
list that is very similar to the desc ring in the split
ring. I'm not quite sure whether it's something we want.
If it is true, I'll do it. So do you think we also want
to maintain such a desc/ctx list for packed ring?

Best regards,
Tiwei Bie


Re: [RFC v4 3/5] virtio_ring: add packed ring support

2018-05-16 Thread Tiwei Bie
On Wed, May 16, 2018 at 08:51:43PM +0800, Jason Wang wrote:
> On 2018年05月16日 20:39, Tiwei Bie wrote:
> > On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote:
> > > On 2018年05月16日 16:37, Tiwei Bie wrote:
> > [...]
> > > >struct vring_virtqueue {
> > > > @@ -116,6 +117,9 @@ struct vring_virtqueue {
> > > > /* Last written value to driver->flags in
> > > >  * guest byte order. */
> > > > u16 event_flags_shadow;
> > > > +
> > > > +   /* ID allocation. */
> > > > +   struct idr buffer_id;
> > > I'm not sure idr is fit for the performance critical case here. Need to
> > > measure its performance impact, especially if we have few unused slots.
> > I'm also not sure.. But fortunately, it should be quite easy
> > to replace it with something else without changing other code.
> > If it will really hurt the performance, I'll change it.
> 
> We may want to do some benchmarking/profiling to see.

Yeah!

> 
> > 
> > > > };
> > > > };
> > [...]
> > > > +static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int 
> > > > head,
> > > > + unsigned int id, void **ctx)
> > > > +{
> > > > +   struct vring_packed_desc *desc;
> > > > +   unsigned int i, j;
> > > > +
> > > > +   /* Clear data ptr. */
> > > > +   vq->desc_state[id].data = NULL;
> > > > +
> > > > +   i = head;
> > > > +
> > > > +   for (j = 0; j < vq->desc_state[id].num; j++) {
> > > > +   desc = >vring_packed.desc[i];
> > > > +   vring_unmap_one_packed(vq, desc);
> > > As mentioned in previous discussion, this probably won't work for the case
> > > of out of order completion since it depends on the information in the
> > > descriptor ring. We probably need to extend ctx to record such 
> > > information.
> > Above code doesn't depend on the information in the descriptor
> > ring. The vq->desc_state[] is the extended ctx.
> > 
> > Best regards,
> > Tiwei Bie
> 
> Yes, but desc is a pointer to descriptor ring I think so
> vring_unmap_one_packed() still depends on the content of descriptor ring?
> 

I got your point now. I think it makes sense to reserve
the bits of the addr field. Driver shouldn't try to get
addrs from the descriptors when cleanup the descriptors
no matter whether we support out-of-order or not.

But combining it with the out-of-order support, it will
mean that the driver still needs to maintain a desc/ctx
list that is very similar to the desc ring in the split
ring. I'm not quite sure whether it's something we want.
If it is true, I'll do it. So do you think we also want
to maintain such a desc/ctx list for packed ring?

Best regards,
Tiwei Bie


Re: [RFC v4 3/5] virtio_ring: add packed ring support

2018-05-16 Thread Jason Wang



On 2018年05月16日 20:39, Tiwei Bie wrote:

On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote:

On 2018年05月16日 16:37, Tiwei Bie wrote:

[...]

   struct vring_virtqueue {
@@ -116,6 +117,9 @@ struct vring_virtqueue {
/* Last written value to driver->flags in
 * guest byte order. */
u16 event_flags_shadow;
+
+   /* ID allocation. */
+   struct idr buffer_id;

I'm not sure idr is fit for the performance critical case here. Need to
measure its performance impact, especially if we have few unused slots.

I'm also not sure.. But fortunately, it should be quite easy
to replace it with something else without changing other code.
If it will really hurt the performance, I'll change it.


We may want to do some benchmarking/profiling to see.




};
};

[...]

+static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int head,
+ unsigned int id, void **ctx)
+{
+   struct vring_packed_desc *desc;
+   unsigned int i, j;
+
+   /* Clear data ptr. */
+   vq->desc_state[id].data = NULL;
+
+   i = head;
+
+   for (j = 0; j < vq->desc_state[id].num; j++) {
+   desc = >vring_packed.desc[i];
+   vring_unmap_one_packed(vq, desc);

As mentioned in previous discussion, this probably won't work for the case
of out of order completion since it depends on the information in the
descriptor ring. We probably need to extend ctx to record such information.

Above code doesn't depend on the information in the descriptor
ring. The vq->desc_state[] is the extended ctx.

Best regards,
Tiwei Bie


Yes, but desc is a pointer to descriptor ring I think so 
vring_unmap_one_packed() still depends on the content of descriptor ring?


Thanks


Re: [RFC v4 3/5] virtio_ring: add packed ring support

2018-05-16 Thread Jason Wang



On 2018年05月16日 20:39, Tiwei Bie wrote:

On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote:

On 2018年05月16日 16:37, Tiwei Bie wrote:

[...]

   struct vring_virtqueue {
@@ -116,6 +117,9 @@ struct vring_virtqueue {
/* Last written value to driver->flags in
 * guest byte order. */
u16 event_flags_shadow;
+
+   /* ID allocation. */
+   struct idr buffer_id;

I'm not sure idr is fit for the performance critical case here. Need to
measure its performance impact, especially if we have few unused slots.

I'm also not sure.. But fortunately, it should be quite easy
to replace it with something else without changing other code.
If it will really hurt the performance, I'll change it.


We may want to do some benchmarking/profiling to see.




};
};

[...]

+static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int head,
+ unsigned int id, void **ctx)
+{
+   struct vring_packed_desc *desc;
+   unsigned int i, j;
+
+   /* Clear data ptr. */
+   vq->desc_state[id].data = NULL;
+
+   i = head;
+
+   for (j = 0; j < vq->desc_state[id].num; j++) {
+   desc = >vring_packed.desc[i];
+   vring_unmap_one_packed(vq, desc);

As mentioned in previous discussion, this probably won't work for the case
of out of order completion since it depends on the information in the
descriptor ring. We probably need to extend ctx to record such information.

Above code doesn't depend on the information in the descriptor
ring. The vq->desc_state[] is the extended ctx.

Best regards,
Tiwei Bie


Yes, but desc is a pointer to descriptor ring I think so 
vring_unmap_one_packed() still depends on the content of descriptor ring?


Thanks


Re: [RFC v4 3/5] virtio_ring: add packed ring support

2018-05-16 Thread Tiwei Bie
On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote:
> On 2018年05月16日 16:37, Tiwei Bie wrote:
[...]
> >   struct vring_virtqueue {
> > @@ -116,6 +117,9 @@ struct vring_virtqueue {
> > /* Last written value to driver->flags in
> >  * guest byte order. */
> > u16 event_flags_shadow;
> > +
> > +   /* ID allocation. */
> > +   struct idr buffer_id;
> 
> I'm not sure idr is fit for the performance critical case here. Need to
> measure its performance impact, especially if we have few unused slots.

I'm also not sure.. But fortunately, it should be quite easy
to replace it with something else without changing other code.
If it will really hurt the performance, I'll change it.

> 
> > };
> > };
[...]
> > +static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int 
> > head,
> > + unsigned int id, void **ctx)
> > +{
> > +   struct vring_packed_desc *desc;
> > +   unsigned int i, j;
> > +
> > +   /* Clear data ptr. */
> > +   vq->desc_state[id].data = NULL;
> > +
> > +   i = head;
> > +
> > +   for (j = 0; j < vq->desc_state[id].num; j++) {
> > +   desc = >vring_packed.desc[i];
> > +   vring_unmap_one_packed(vq, desc);
> 
> As mentioned in previous discussion, this probably won't work for the case
> of out of order completion since it depends on the information in the
> descriptor ring. We probably need to extend ctx to record such information.

Above code doesn't depend on the information in the descriptor
ring. The vq->desc_state[] is the extended ctx.

Best regards,
Tiwei Bie


Re: [RFC v4 3/5] virtio_ring: add packed ring support

2018-05-16 Thread Tiwei Bie
On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote:
> On 2018年05月16日 16:37, Tiwei Bie wrote:
[...]
> >   struct vring_virtqueue {
> > @@ -116,6 +117,9 @@ struct vring_virtqueue {
> > /* Last written value to driver->flags in
> >  * guest byte order. */
> > u16 event_flags_shadow;
> > +
> > +   /* ID allocation. */
> > +   struct idr buffer_id;
> 
> I'm not sure idr is fit for the performance critical case here. Need to
> measure its performance impact, especially if we have few unused slots.

I'm also not sure.. But fortunately, it should be quite easy
to replace it with something else without changing other code.
If it will really hurt the performance, I'll change it.

> 
> > };
> > };
[...]
> > +static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int 
> > head,
> > + unsigned int id, void **ctx)
> > +{
> > +   struct vring_packed_desc *desc;
> > +   unsigned int i, j;
> > +
> > +   /* Clear data ptr. */
> > +   vq->desc_state[id].data = NULL;
> > +
> > +   i = head;
> > +
> > +   for (j = 0; j < vq->desc_state[id].num; j++) {
> > +   desc = >vring_packed.desc[i];
> > +   vring_unmap_one_packed(vq, desc);
> 
> As mentioned in previous discussion, this probably won't work for the case
> of out of order completion since it depends on the information in the
> descriptor ring. We probably need to extend ctx to record such information.

Above code doesn't depend on the information in the descriptor
ring. The vq->desc_state[] is the extended ctx.

Best regards,
Tiwei Bie


Re: [RFC v4 3/5] virtio_ring: add packed ring support

2018-05-16 Thread Jason Wang



On 2018年05月16日 16:37, Tiwei Bie wrote:

This commit introduces the basic support (without EVENT_IDX)
for packed ring.

Signed-off-by: Tiwei Bie 
---
  drivers/virtio/virtio_ring.c | 491 ++-
  1 file changed, 481 insertions(+), 10 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 62d7c407841a..c6c5deb0e3ae 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -58,7 +58,8 @@
  
  struct vring_desc_state {

void *data; /* Data for callback. */
-   struct vring_desc *indir_desc;  /* Indirect descriptor, if any. */
+   void *indir_desc;   /* Indirect descriptor, if any. */
+   int num;/* Descriptor list length. */
  };
  
  struct vring_virtqueue {

@@ -116,6 +117,9 @@ struct vring_virtqueue {
/* Last written value to driver->flags in
 * guest byte order. */
u16 event_flags_shadow;
+
+   /* ID allocation. */
+   struct idr buffer_id;


I'm not sure idr is fit for the performance critical case here. Need to 
measure its performance impact, especially if we have few unused slots.



};
};
  
@@ -142,6 +146,16 @@ struct vring_virtqueue {
  
  #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
  
+static inline bool virtqueue_use_indirect(struct virtqueue *_vq,

+ unsigned int total_sg)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+
+   /* If the host supports indirect descriptor tables, and we have multiple
+* buffers, then go indirect. FIXME: tune this threshold */
+   return (vq->indirect && total_sg > 1 && vq->vq.num_free);
+}
+
  /*
   * Modern virtio devices have feature bits to specify whether they need a
   * quirk and bypass the IOMMU. If not there, just use the DMA API.
@@ -327,9 +341,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
  
  	head = vq->free_head;
  
-	/* If the host supports indirect descriptor tables, and we have multiple

-* buffers, then go indirect. FIXME: tune this threshold */
-   if (vq->indirect && total_sg > 1 && vq->vq.num_free)
+   if (virtqueue_use_indirect(_vq, total_sg))
desc = alloc_indirect_split(_vq, total_sg, gfp);
else {
desc = NULL;
@@ -741,6 +753,63 @@ static inline unsigned vring_size_packed(unsigned int num, 
unsigned long align)
& ~(align - 1)) + sizeof(struct vring_packed_desc_event) * 2;
  }
  
+static void vring_unmap_one_packed(const struct vring_virtqueue *vq,

+  struct vring_packed_desc *desc)
+{
+   u16 flags;
+
+   if (!vring_use_dma_api(vq->vq.vdev))
+   return;
+
+   flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
+
+   if (flags & VRING_DESC_F_INDIRECT) {
+   dma_unmap_single(vring_dma_dev(vq),
+virtio64_to_cpu(vq->vq.vdev, desc->addr),
+virtio32_to_cpu(vq->vq.vdev, desc->len),
+(flags & VRING_DESC_F_WRITE) ?
+DMA_FROM_DEVICE : DMA_TO_DEVICE);
+   } else {
+   dma_unmap_page(vring_dma_dev(vq),
+  virtio64_to_cpu(vq->vq.vdev, desc->addr),
+  virtio32_to_cpu(vq->vq.vdev, desc->len),
+  (flags & VRING_DESC_F_WRITE) ?
+  DMA_FROM_DEVICE : DMA_TO_DEVICE);
+   }
+}
+
+static struct vring_packed_desc *alloc_indirect_packed(struct virtqueue *_vq,
+  unsigned int total_sg,
+  gfp_t gfp)
+{
+   struct vring_packed_desc *desc;
+
+   /*
+* We require lowmem mappings for the descriptors because
+* otherwise virt_to_phys will give us bogus addresses in the
+* virtqueue.
+*/
+   gfp &= ~__GFP_HIGHMEM;
+
+   desc = kmalloc(total_sg * sizeof(struct vring_packed_desc), gfp);
+
+   return desc;
+}
+
+static u16 alloc_id_packed(struct vring_virtqueue *vq)
+{
+   u16 id;
+
+   id = idr_alloc(>buffer_id, NULL, 0, vq->vring_packed.num,
+  GFP_KERNEL);
+   return id;
+}
+
+static void free_id_packed(struct vring_virtqueue *vq, u16 id)
+{
+   idr_remove(>buffer_id, id);
+}
+
  static inline int virtqueue_add_packed(struct virtqueue *_vq,
   struct scatterlist *sgs[],
   unsigned int total_sg,
@@ -750,47 +819,446 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
   void *ctx,
   gfp_t gfp)
  {
+   struct 

Re: [RFC v4 3/5] virtio_ring: add packed ring support

2018-05-16 Thread Jason Wang



On 2018年05月16日 16:37, Tiwei Bie wrote:

This commit introduces the basic support (without EVENT_IDX)
for packed ring.

Signed-off-by: Tiwei Bie 
---
  drivers/virtio/virtio_ring.c | 491 ++-
  1 file changed, 481 insertions(+), 10 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 62d7c407841a..c6c5deb0e3ae 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -58,7 +58,8 @@
  
  struct vring_desc_state {

void *data; /* Data for callback. */
-   struct vring_desc *indir_desc;  /* Indirect descriptor, if any. */
+   void *indir_desc;   /* Indirect descriptor, if any. */
+   int num;/* Descriptor list length. */
  };
  
  struct vring_virtqueue {

@@ -116,6 +117,9 @@ struct vring_virtqueue {
/* Last written value to driver->flags in
 * guest byte order. */
u16 event_flags_shadow;
+
+   /* ID allocation. */
+   struct idr buffer_id;


I'm not sure idr is fit for the performance critical case here. Need to 
measure its performance impact, especially if we have few unused slots.



};
};
  
@@ -142,6 +146,16 @@ struct vring_virtqueue {
  
  #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
  
+static inline bool virtqueue_use_indirect(struct virtqueue *_vq,

+ unsigned int total_sg)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+
+   /* If the host supports indirect descriptor tables, and we have multiple
+* buffers, then go indirect. FIXME: tune this threshold */
+   return (vq->indirect && total_sg > 1 && vq->vq.num_free);
+}
+
  /*
   * Modern virtio devices have feature bits to specify whether they need a
   * quirk and bypass the IOMMU. If not there, just use the DMA API.
@@ -327,9 +341,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
  
  	head = vq->free_head;
  
-	/* If the host supports indirect descriptor tables, and we have multiple

-* buffers, then go indirect. FIXME: tune this threshold */
-   if (vq->indirect && total_sg > 1 && vq->vq.num_free)
+   if (virtqueue_use_indirect(_vq, total_sg))
desc = alloc_indirect_split(_vq, total_sg, gfp);
else {
desc = NULL;
@@ -741,6 +753,63 @@ static inline unsigned vring_size_packed(unsigned int num, 
unsigned long align)
& ~(align - 1)) + sizeof(struct vring_packed_desc_event) * 2;
  }
  
+static void vring_unmap_one_packed(const struct vring_virtqueue *vq,

+  struct vring_packed_desc *desc)
+{
+   u16 flags;
+
+   if (!vring_use_dma_api(vq->vq.vdev))
+   return;
+
+   flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
+
+   if (flags & VRING_DESC_F_INDIRECT) {
+   dma_unmap_single(vring_dma_dev(vq),
+virtio64_to_cpu(vq->vq.vdev, desc->addr),
+virtio32_to_cpu(vq->vq.vdev, desc->len),
+(flags & VRING_DESC_F_WRITE) ?
+DMA_FROM_DEVICE : DMA_TO_DEVICE);
+   } else {
+   dma_unmap_page(vring_dma_dev(vq),
+  virtio64_to_cpu(vq->vq.vdev, desc->addr),
+  virtio32_to_cpu(vq->vq.vdev, desc->len),
+  (flags & VRING_DESC_F_WRITE) ?
+  DMA_FROM_DEVICE : DMA_TO_DEVICE);
+   }
+}
+
+static struct vring_packed_desc *alloc_indirect_packed(struct virtqueue *_vq,
+  unsigned int total_sg,
+  gfp_t gfp)
+{
+   struct vring_packed_desc *desc;
+
+   /*
+* We require lowmem mappings for the descriptors because
+* otherwise virt_to_phys will give us bogus addresses in the
+* virtqueue.
+*/
+   gfp &= ~__GFP_HIGHMEM;
+
+   desc = kmalloc(total_sg * sizeof(struct vring_packed_desc), gfp);
+
+   return desc;
+}
+
+static u16 alloc_id_packed(struct vring_virtqueue *vq)
+{
+   u16 id;
+
+   id = idr_alloc(>buffer_id, NULL, 0, vq->vring_packed.num,
+  GFP_KERNEL);
+   return id;
+}
+
+static void free_id_packed(struct vring_virtqueue *vq, u16 id)
+{
+   idr_remove(>buffer_id, id);
+}
+
  static inline int virtqueue_add_packed(struct virtqueue *_vq,
   struct scatterlist *sgs[],
   unsigned int total_sg,
@@ -750,47 +819,446 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
   void *ctx,
   gfp_t gfp)
  {
+   struct vring_virtqueue *vq = 

[RFC v4 3/5] virtio_ring: add packed ring support

2018-05-16 Thread Tiwei Bie
This commit introduces the basic support (without EVENT_IDX)
for packed ring.

Signed-off-by: Tiwei Bie 
---
 drivers/virtio/virtio_ring.c | 491 ++-
 1 file changed, 481 insertions(+), 10 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 62d7c407841a..c6c5deb0e3ae 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -58,7 +58,8 @@
 
 struct vring_desc_state {
void *data; /* Data for callback. */
-   struct vring_desc *indir_desc;  /* Indirect descriptor, if any. */
+   void *indir_desc;   /* Indirect descriptor, if any. */
+   int num;/* Descriptor list length. */
 };
 
 struct vring_virtqueue {
@@ -116,6 +117,9 @@ struct vring_virtqueue {
/* Last written value to driver->flags in
 * guest byte order. */
u16 event_flags_shadow;
+
+   /* ID allocation. */
+   struct idr buffer_id;
};
};
 
@@ -142,6 +146,16 @@ struct vring_virtqueue {
 
 #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
 
+static inline bool virtqueue_use_indirect(struct virtqueue *_vq,
+ unsigned int total_sg)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+
+   /* If the host supports indirect descriptor tables, and we have multiple
+* buffers, then go indirect. FIXME: tune this threshold */
+   return (vq->indirect && total_sg > 1 && vq->vq.num_free);
+}
+
 /*
  * Modern virtio devices have feature bits to specify whether they need a
  * quirk and bypass the IOMMU. If not there, just use the DMA API.
@@ -327,9 +341,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 
head = vq->free_head;
 
-   /* If the host supports indirect descriptor tables, and we have multiple
-* buffers, then go indirect. FIXME: tune this threshold */
-   if (vq->indirect && total_sg > 1 && vq->vq.num_free)
+   if (virtqueue_use_indirect(_vq, total_sg))
desc = alloc_indirect_split(_vq, total_sg, gfp);
else {
desc = NULL;
@@ -741,6 +753,63 @@ static inline unsigned vring_size_packed(unsigned int num, 
unsigned long align)
& ~(align - 1)) + sizeof(struct vring_packed_desc_event) * 2;
 }
 
+static void vring_unmap_one_packed(const struct vring_virtqueue *vq,
+  struct vring_packed_desc *desc)
+{
+   u16 flags;
+
+   if (!vring_use_dma_api(vq->vq.vdev))
+   return;
+
+   flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
+
+   if (flags & VRING_DESC_F_INDIRECT) {
+   dma_unmap_single(vring_dma_dev(vq),
+virtio64_to_cpu(vq->vq.vdev, desc->addr),
+virtio32_to_cpu(vq->vq.vdev, desc->len),
+(flags & VRING_DESC_F_WRITE) ?
+DMA_FROM_DEVICE : DMA_TO_DEVICE);
+   } else {
+   dma_unmap_page(vring_dma_dev(vq),
+  virtio64_to_cpu(vq->vq.vdev, desc->addr),
+  virtio32_to_cpu(vq->vq.vdev, desc->len),
+  (flags & VRING_DESC_F_WRITE) ?
+  DMA_FROM_DEVICE : DMA_TO_DEVICE);
+   }
+}
+
+static struct vring_packed_desc *alloc_indirect_packed(struct virtqueue *_vq,
+  unsigned int total_sg,
+  gfp_t gfp)
+{
+   struct vring_packed_desc *desc;
+
+   /*
+* We require lowmem mappings for the descriptors because
+* otherwise virt_to_phys will give us bogus addresses in the
+* virtqueue.
+*/
+   gfp &= ~__GFP_HIGHMEM;
+
+   desc = kmalloc(total_sg * sizeof(struct vring_packed_desc), gfp);
+
+   return desc;
+}
+
+static u16 alloc_id_packed(struct vring_virtqueue *vq)
+{
+   u16 id;
+
+   id = idr_alloc(>buffer_id, NULL, 0, vq->vring_packed.num,
+  GFP_KERNEL);
+   return id;
+}
+
+static void free_id_packed(struct vring_virtqueue *vq, u16 id)
+{
+   idr_remove(>buffer_id, id);
+}
+
 static inline int virtqueue_add_packed(struct virtqueue *_vq,
   struct scatterlist *sgs[],
   unsigned int total_sg,
@@ -750,47 +819,446 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
   void *ctx,
   gfp_t gfp)
 {
+   struct vring_virtqueue *vq = to_vvq(_vq);
+   struct vring_packed_desc *desc;
+   struct scatterlist *sg;
+   unsigned int i, n, descs_used, uninitialized_var(prev), err_idx;
+   __virtio16 

[RFC v4 3/5] virtio_ring: add packed ring support

2018-05-16 Thread Tiwei Bie
This commit introduces the basic support (without EVENT_IDX)
for packed ring.

Signed-off-by: Tiwei Bie 
---
 drivers/virtio/virtio_ring.c | 491 ++-
 1 file changed, 481 insertions(+), 10 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 62d7c407841a..c6c5deb0e3ae 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -58,7 +58,8 @@
 
 struct vring_desc_state {
void *data; /* Data for callback. */
-   struct vring_desc *indir_desc;  /* Indirect descriptor, if any. */
+   void *indir_desc;   /* Indirect descriptor, if any. */
+   int num;/* Descriptor list length. */
 };
 
 struct vring_virtqueue {
@@ -116,6 +117,9 @@ struct vring_virtqueue {
/* Last written value to driver->flags in
 * guest byte order. */
u16 event_flags_shadow;
+
+   /* ID allocation. */
+   struct idr buffer_id;
};
};
 
@@ -142,6 +146,16 @@ struct vring_virtqueue {
 
 #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
 
+static inline bool virtqueue_use_indirect(struct virtqueue *_vq,
+ unsigned int total_sg)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+
+   /* If the host supports indirect descriptor tables, and we have multiple
+* buffers, then go indirect. FIXME: tune this threshold */
+   return (vq->indirect && total_sg > 1 && vq->vq.num_free);
+}
+
 /*
  * Modern virtio devices have feature bits to specify whether they need a
  * quirk and bypass the IOMMU. If not there, just use the DMA API.
@@ -327,9 +341,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 
head = vq->free_head;
 
-   /* If the host supports indirect descriptor tables, and we have multiple
-* buffers, then go indirect. FIXME: tune this threshold */
-   if (vq->indirect && total_sg > 1 && vq->vq.num_free)
+   if (virtqueue_use_indirect(_vq, total_sg))
desc = alloc_indirect_split(_vq, total_sg, gfp);
else {
desc = NULL;
@@ -741,6 +753,63 @@ static inline unsigned vring_size_packed(unsigned int num, 
unsigned long align)
& ~(align - 1)) + sizeof(struct vring_packed_desc_event) * 2;
 }
 
+static void vring_unmap_one_packed(const struct vring_virtqueue *vq,
+  struct vring_packed_desc *desc)
+{
+   u16 flags;
+
+   if (!vring_use_dma_api(vq->vq.vdev))
+   return;
+
+   flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
+
+   if (flags & VRING_DESC_F_INDIRECT) {
+   dma_unmap_single(vring_dma_dev(vq),
+virtio64_to_cpu(vq->vq.vdev, desc->addr),
+virtio32_to_cpu(vq->vq.vdev, desc->len),
+(flags & VRING_DESC_F_WRITE) ?
+DMA_FROM_DEVICE : DMA_TO_DEVICE);
+   } else {
+   dma_unmap_page(vring_dma_dev(vq),
+  virtio64_to_cpu(vq->vq.vdev, desc->addr),
+  virtio32_to_cpu(vq->vq.vdev, desc->len),
+  (flags & VRING_DESC_F_WRITE) ?
+  DMA_FROM_DEVICE : DMA_TO_DEVICE);
+   }
+}
+
+static struct vring_packed_desc *alloc_indirect_packed(struct virtqueue *_vq,
+  unsigned int total_sg,
+  gfp_t gfp)
+{
+   struct vring_packed_desc *desc;
+
+   /*
+* We require lowmem mappings for the descriptors because
+* otherwise virt_to_phys will give us bogus addresses in the
+* virtqueue.
+*/
+   gfp &= ~__GFP_HIGHMEM;
+
+   desc = kmalloc(total_sg * sizeof(struct vring_packed_desc), gfp);
+
+   return desc;
+}
+
+static u16 alloc_id_packed(struct vring_virtqueue *vq)
+{
+   u16 id;
+
+   id = idr_alloc(>buffer_id, NULL, 0, vq->vring_packed.num,
+  GFP_KERNEL);
+   return id;
+}
+
+static void free_id_packed(struct vring_virtqueue *vq, u16 id)
+{
+   idr_remove(>buffer_id, id);
+}
+
 static inline int virtqueue_add_packed(struct virtqueue *_vq,
   struct scatterlist *sgs[],
   unsigned int total_sg,
@@ -750,47 +819,446 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
   void *ctx,
   gfp_t gfp)
 {
+   struct vring_virtqueue *vq = to_vvq(_vq);
+   struct vring_packed_desc *desc;
+   struct scatterlist *sg;
+   unsigned int i, n, descs_used, uninitialized_var(prev), err_idx;
+   __virtio16 uninitialized_var(head_flags),