Re: [Qemu-devel] [PATCH v5 1/6] vhost-user: Support transferring inflight buffer between qemu and backend

2019-01-29 Thread Yongji Xie
On Wed, 30 Jan 2019 at 12:07, Michael S. Tsirkin  wrote:
>
> On Wed, Jan 30, 2019 at 11:49:56AM +0800, Yongji Xie wrote:
> > On Wed, 30 Jan 2019 at 10:30, Michael S. Tsirkin  wrote:
> > >
> > > On Wed, Jan 30, 2019 at 10:07:28AM +0800, Yongji Xie wrote:
> > > > On Tue, 29 Jan 2019 at 22:15, Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > On Tue, Jan 29, 2019 at 02:15:35PM +0800, Yongji Xie wrote:
> > > > > > On Tue, 29 Jan 2019 at 12:26, Michael S. Tsirkin  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Tue, Jan 29, 2019 at 12:11:55PM +0800, Stefan Hajnoczi wrote:
> > > > > > > > On Tue, Jan 22, 2019 at 04:31:47PM +0800, elohi...@gmail.com 
> > > > > > > > wrote:
> > > > > > > > > +typedef struct DescState {
> > > > > > > > > +uint8_t inuse;
> > > > > > > > > +uint8_t version;
> > > > > > > > > +uint16_t used_idx;
> > > > > > > > > +uint16_t avail_idx;
> > > > > > > > > +uint16_t reserved;
> > > > > > > > > +} DescState;
> > > > > > > > > +
> > > > > > > > > +typedef struct QueueRegion {
> > > > > > > > > +uint8_t valid;
> > > > > > >
> > > > > > > what's this?
> > > > > > >
> > > > > >
> > > > > > We can use this to check whether this buffer is reset by qemu.
> > > > >
> > > > >
> > > > > I'd use version here. Document that it's 1 currently.
> > > >
> > > > If we put version into the shared buffer, QEMU will reset it when vm
> > > > reset. Then if backend restart at the same time, the version of this
> > > > buffer will be lost. So I still let QEMU maintain it and send it back
> > > > through message's payload.
> > >
> > > I don't get it. If it's reset there's no contents.
> > > If there's no contents then who cares what the version is?
> > >
> >
> > What I thought before is that we should not update the buffer when vm
> > reset. But now seems like it's unnecessary. I will update this patch
> > as you said. Thank you!
> >
> > > > > Or do we want feature flags so we can support downgrades too?
> > > > >
> > > >
> > > > We faced the same problem that the feature flags will be lost if QEMU
> > > > do not maintain it. So maybe we should put this into message's
> > > > payload?
> > >
> > > I don't understand why we care. We only maintain inflight so we
> > > don't need to reset the device. if device is reset we don't
> > > need to maintain them at all.
> > >
> > > > >
> > > > > > > > > +uint16_t desc_num;
> > > > > > >
> > > > > > > there's padding before this field. Pls make it explicit.
> > > > > > >
> > > > > >
> > > > > > Will do it.
> > > > > >
> > > > > > > > > +DescState desc[0];
> > > > > > > > > +} QueueRegion;
> > > > > > > > > +
> > > > > > > > > +The struct DescState is used to describe one 
> > > > > > > > > head-descriptor's state. The
> > > > > > > > > +fields have following meanings:
> > > > > > > > > +
> > > > > > > > > +inuse: Indicate whether the descriptor is inuse or not.
> > > > > > >
> > > > > > > inuse by what?
> > > > > > >
> > > > > >
> > > > > > Maybe inflight is better?
> > > > > >
> > > > > > > > > +
> > > > > > > > > +version: Indicate whether we have an atomic update to 
> > > > > > > > > used ring and
> > > > > > > > > +inflight buffer when slave crash at that point. This 
> > > > > > > > > field should be
> > > > > > > > > +increased by one before and after this two updates. An 
> > > > > > > > > odd version
> > > > > > > > > +indicates an in-progress update.
> > > > > > >
> > > > > > > I'm not sure I understand what does the above say. Also does this
> > > > > > > require two atomics? Seems pretty expensive. And why is it called
> > > > > > > version?
> > > > > > >
> > > > > > > > > +
> > > > > > > > > +used_idx: Store old index of used ring before we update 
> > > > > > > > > used ring and
> > > > > > > > > +inflight buffer so that slave can know whether an odd 
> > > > > > > > > version inflight
> > > > > > > > > +head-descriptor in inflight buffer is processed or not.
> > > > > > >
> > > > > > > Here too.
> > > > > > >
> > > > > >
> > > > > > Sorry, the above description may be not clear. This two fields are
> > > > > > used to indicate whether we have an in-progress update to used ring
> > > > > > and inflight buffer. If slave crash before the update to used_ring 
> > > > > > and
> > > > > > after the update to inflight buffer, the version should be odd and
> > > > > > used_idx should be equal to used_ring.idx. Then we need to roll back
> > > > > > the update to inflight buffer. As for the name of the version filed,
> > > > > > actually I didn't find a good one, so I just copy it from struct
> > > > > > kvm_steal_time...
> > > > > >
> > > > > > > > > +
> > > > > > > > > +avail_idx: Used to preserve the descriptor's order in 
> > > > > > > > > avail ring so that
> > > > > > > > > +slave can resubmit descriptors in order.
> > > > > > >
> > > > > > > Why would that be necessary?
> > > > > > >
> > > > > >
> > > > > > Maybe some devices will be able to use it to preserve order after
> > > > > > 

Re: [Qemu-devel] [PATCH v5 1/6] vhost-user: Support transferring inflight buffer between qemu and backend

2019-01-29 Thread Michael S. Tsirkin
On Wed, Jan 30, 2019 at 11:49:56AM +0800, Yongji Xie wrote:
> On Wed, 30 Jan 2019 at 10:30, Michael S. Tsirkin  wrote:
> >
> > On Wed, Jan 30, 2019 at 10:07:28AM +0800, Yongji Xie wrote:
> > > On Tue, 29 Jan 2019 at 22:15, Michael S. Tsirkin  wrote:
> > > >
> > > > On Tue, Jan 29, 2019 at 02:15:35PM +0800, Yongji Xie wrote:
> > > > > On Tue, 29 Jan 2019 at 12:26, Michael S. Tsirkin  
> > > > > wrote:
> > > > > >
> > > > > > On Tue, Jan 29, 2019 at 12:11:55PM +0800, Stefan Hajnoczi wrote:
> > > > > > > On Tue, Jan 22, 2019 at 04:31:47PM +0800, elohi...@gmail.com 
> > > > > > > wrote:
> > > > > > > > +typedef struct DescState {
> > > > > > > > +uint8_t inuse;
> > > > > > > > +uint8_t version;
> > > > > > > > +uint16_t used_idx;
> > > > > > > > +uint16_t avail_idx;
> > > > > > > > +uint16_t reserved;
> > > > > > > > +} DescState;
> > > > > > > > +
> > > > > > > > +typedef struct QueueRegion {
> > > > > > > > +uint8_t valid;
> > > > > >
> > > > > > what's this?
> > > > > >
> > > > >
> > > > > We can use this to check whether this buffer is reset by qemu.
> > > >
> > > >
> > > > I'd use version here. Document that it's 1 currently.
> > >
> > > If we put version into the shared buffer, QEMU will reset it when vm
> > > reset. Then if backend restart at the same time, the version of this
> > > buffer will be lost. So I still let QEMU maintain it and send it back
> > > through message's payload.
> >
> > I don't get it. If it's reset there's no contents.
> > If there's no contents then who cares what the version is?
> >
> 
> What I thought before is that we should not update the buffer when vm
> reset. But now seems like it's unnecessary. I will update this patch
> as you said. Thank you!
> 
> > > > Or do we want feature flags so we can support downgrades too?
> > > >
> > >
> > > We faced the same problem that the feature flags will be lost if QEMU
> > > do not maintain it. So maybe we should put this into message's
> > > payload?
> >
> > I don't understand why we care. We only maintain inflight so we
> > don't need to reset the device. if device is reset we don't
> > need to maintain them at all.
> >
> > > >
> > > > > > > > +uint16_t desc_num;
> > > > > >
> > > > > > there's padding before this field. Pls make it explicit.
> > > > > >
> > > > >
> > > > > Will do it.
> > > > >
> > > > > > > > +DescState desc[0];
> > > > > > > > +} QueueRegion;
> > > > > > > > +
> > > > > > > > +The struct DescState is used to describe one head-descriptor's 
> > > > > > > > state. The
> > > > > > > > +fields have following meanings:
> > > > > > > > +
> > > > > > > > +inuse: Indicate whether the descriptor is inuse or not.
> > > > > >
> > > > > > inuse by what?
> > > > > >
> > > > >
> > > > > Maybe inflight is better?
> > > > >
> > > > > > > > +
> > > > > > > > +version: Indicate whether we have an atomic update to used 
> > > > > > > > ring and
> > > > > > > > +inflight buffer when slave crash at that point. This field 
> > > > > > > > should be
> > > > > > > > +increased by one before and after this two updates. An odd 
> > > > > > > > version
> > > > > > > > +indicates an in-progress update.
> > > > > >
> > > > > > I'm not sure I understand what does the above say. Also does this
> > > > > > require two atomics? Seems pretty expensive. And why is it called
> > > > > > version?
> > > > > >
> > > > > > > > +
> > > > > > > > +used_idx: Store old index of used ring before we update 
> > > > > > > > used ring and
> > > > > > > > +inflight buffer so that slave can know whether an odd 
> > > > > > > > version inflight
> > > > > > > > +head-descriptor in inflight buffer is processed or not.
> > > > > >
> > > > > > Here too.
> > > > > >
> > > > >
> > > > > Sorry, the above description may be not clear. This two fields are
> > > > > used to indicate whether we have an in-progress update to used ring
> > > > > and inflight buffer. If slave crash before the update to used_ring and
> > > > > after the update to inflight buffer, the version should be odd and
> > > > > used_idx should be equal to used_ring.idx. Then we need to roll back
> > > > > the update to inflight buffer. As for the name of the version filed,
> > > > > actually I didn't find a good one, so I just copy it from struct
> > > > > kvm_steal_time...
> > > > >
> > > > > > > > +
> > > > > > > > +avail_idx: Used to preserve the descriptor's order in 
> > > > > > > > avail ring so that
> > > > > > > > +slave can resubmit descriptors in order.
> > > > > >
> > > > > > Why would that be necessary?
> > > > > >
> > > > >
> > > > > Maybe some devices will be able to use it to preserve order after
> > > > > reconnecting in future?
> > > >
> > > > If buffers are used in order then old entries in the ring
> > > > are not overwritten so inflight tracking is not
> > > > necessary. This is exactly the case with vhost user net today.
> > > >
> > >
> > > OK, looks reasonable to me.
> > >
> > > > > > >
> > > > > > > Will a 

Re: [Qemu-devel] [PATCH v5 1/6] vhost-user: Support transferring inflight buffer between qemu and backend

2019-01-29 Thread Yongji Xie
On Wed, 30 Jan 2019 at 10:30, Michael S. Tsirkin  wrote:
>
> On Wed, Jan 30, 2019 at 10:07:28AM +0800, Yongji Xie wrote:
> > On Tue, 29 Jan 2019 at 22:15, Michael S. Tsirkin  wrote:
> > >
> > > On Tue, Jan 29, 2019 at 02:15:35PM +0800, Yongji Xie wrote:
> > > > On Tue, 29 Jan 2019 at 12:26, Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > On Tue, Jan 29, 2019 at 12:11:55PM +0800, Stefan Hajnoczi wrote:
> > > > > > On Tue, Jan 22, 2019 at 04:31:47PM +0800, elohi...@gmail.com wrote:
> > > > > > > +typedef struct DescState {
> > > > > > > +uint8_t inuse;
> > > > > > > +uint8_t version;
> > > > > > > +uint16_t used_idx;
> > > > > > > +uint16_t avail_idx;
> > > > > > > +uint16_t reserved;
> > > > > > > +} DescState;
> > > > > > > +
> > > > > > > +typedef struct QueueRegion {
> > > > > > > +uint8_t valid;
> > > > >
> > > > > what's this?
> > > > >
> > > >
> > > > We can use this to check whether this buffer is reset by qemu.
> > >
> > >
> > > I'd use version here. Document that it's 1 currently.
> >
> > If we put version into the shared buffer, QEMU will reset it when vm
> > reset. Then if backend restart at the same time, the version of this
> > buffer will be lost. So I still let QEMU maintain it and send it back
> > through message's payload.
>
> I don't get it. If it's reset there's no contents.
> If there's no contents then who cares what the version is?
>

What I thought before is that we should not update the buffer when vm
reset. But now seems like it's unnecessary. I will update this patch
as you said. Thank you!

> > > Or do we want feature flags so we can support downgrades too?
> > >
> >
> > We faced the same problem that the feature flags will be lost if QEMU
> > do not maintain it. So maybe we should put this into message's
> > payload?
>
> I don't understand why we care. We only maintain inflight so we
> don't need to reset the device. if device is reset we don't
> need to maintain them at all.
>
> > >
> > > > > > > +uint16_t desc_num;
> > > > >
> > > > > there's padding before this field. Pls make it explicit.
> > > > >
> > > >
> > > > Will do it.
> > > >
> > > > > > > +DescState desc[0];
> > > > > > > +} QueueRegion;
> > > > > > > +
> > > > > > > +The struct DescState is used to describe one head-descriptor's 
> > > > > > > state. The
> > > > > > > +fields have following meanings:
> > > > > > > +
> > > > > > > +inuse: Indicate whether the descriptor is inuse or not.
> > > > >
> > > > > inuse by what?
> > > > >
> > > >
> > > > Maybe inflight is better?
> > > >
> > > > > > > +
> > > > > > > +version: Indicate whether we have an atomic update to used 
> > > > > > > ring and
> > > > > > > +inflight buffer when slave crash at that point. This field 
> > > > > > > should be
> > > > > > > +increased by one before and after this two updates. An odd 
> > > > > > > version
> > > > > > > +indicates an in-progress update.
> > > > >
> > > > > I'm not sure I understand what does the above say. Also does this
> > > > > require two atomics? Seems pretty expensive. And why is it called
> > > > > version?
> > > > >
> > > > > > > +
> > > > > > > +used_idx: Store old index of used ring before we update used 
> > > > > > > ring and
> > > > > > > +inflight buffer so that slave can know whether an odd 
> > > > > > > version inflight
> > > > > > > +head-descriptor in inflight buffer is processed or not.
> > > > >
> > > > > Here too.
> > > > >
> > > >
> > > > Sorry, the above description may be not clear. This two fields are
> > > > used to indicate whether we have an in-progress update to used ring
> > > > and inflight buffer. If slave crash before the update to used_ring and
> > > > after the update to inflight buffer, the version should be odd and
> > > > used_idx should be equal to used_ring.idx. Then we need to roll back
> > > > the update to inflight buffer. As for the name of the version filed,
> > > > actually I didn't find a good one, so I just copy it from struct
> > > > kvm_steal_time...
> > > >
> > > > > > > +
> > > > > > > +avail_idx: Used to preserve the descriptor's order in avail 
> > > > > > > ring so that
> > > > > > > +slave can resubmit descriptors in order.
> > > > >
> > > > > Why would that be necessary?
> > > > >
> > > >
> > > > Maybe some devices will be able to use it to preserve order after
> > > > reconnecting in future?
> > >
> > > If buffers are used in order then old entries in the ring
> > > are not overwritten so inflight tracking is not
> > > necessary. This is exactly the case with vhost user net today.
> > >
> >
> > OK, looks reasonable to me.
> >
> > > > > >
> > > > > > Will a completely new "packed vring" inflight shm layout be 
> > > > > > necessary to
> > > > > > support the packed vring layout in VIRTIO 1.1?
> > > > > >
> > > > > > https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-610007
> > > > >
> > > > > Probably.
> > > > >
> > > >
> > > > How about supporting packed 

Re: [Qemu-devel] [PATCH v5 1/6] vhost-user: Support transferring inflight buffer between qemu and backend

2019-01-29 Thread Michael S. Tsirkin
On Wed, Jan 30, 2019 at 10:07:28AM +0800, Yongji Xie wrote:
> On Tue, 29 Jan 2019 at 22:15, Michael S. Tsirkin  wrote:
> >
> > On Tue, Jan 29, 2019 at 02:15:35PM +0800, Yongji Xie wrote:
> > > On Tue, 29 Jan 2019 at 12:26, Michael S. Tsirkin  wrote:
> > > >
> > > > On Tue, Jan 29, 2019 at 12:11:55PM +0800, Stefan Hajnoczi wrote:
> > > > > On Tue, Jan 22, 2019 at 04:31:47PM +0800, elohi...@gmail.com wrote:
> > > > > > +typedef struct DescState {
> > > > > > +uint8_t inuse;
> > > > > > +uint8_t version;
> > > > > > +uint16_t used_idx;
> > > > > > +uint16_t avail_idx;
> > > > > > +uint16_t reserved;
> > > > > > +} DescState;
> > > > > > +
> > > > > > +typedef struct QueueRegion {
> > > > > > +uint8_t valid;
> > > >
> > > > what's this?
> > > >
> > >
> > > We can use this to check whether this buffer is reset by qemu.
> >
> >
> > I'd use version here. Document that it's 1 currently.
> 
> If we put version into the shared buffer, QEMU will reset it when vm
> reset. Then if backend restart at the same time, the version of this
> buffer will be lost. So I still let QEMU maintain it and send it back
> through message's payload.

I don't get it. If it's reset there's no contents.
If there's no contents then who cares what the version is?


> > Or do we want feature flags so we can support downgrades too?
> >
> 
> We faced the same problem that the feature flags will be lost if QEMU
> do not maintain it. So maybe we should put this into message's
> payload?

I don't understand why we care. We only maintain inflight so we
don't need to reset the device. if device is reset we don't
need to maintain them at all.

> >
> > > > > > +uint16_t desc_num;
> > > >
> > > > there's padding before this field. Pls make it explicit.
> > > >
> > >
> > > Will do it.
> > >
> > > > > > +DescState desc[0];
> > > > > > +} QueueRegion;
> > > > > > +
> > > > > > +The struct DescState is used to describe one head-descriptor's 
> > > > > > state. The
> > > > > > +fields have following meanings:
> > > > > > +
> > > > > > +inuse: Indicate whether the descriptor is inuse or not.
> > > >
> > > > inuse by what?
> > > >
> > >
> > > Maybe inflight is better?
> > >
> > > > > > +
> > > > > > +version: Indicate whether we have an atomic update to used 
> > > > > > ring and
> > > > > > +inflight buffer when slave crash at that point. This field 
> > > > > > should be
> > > > > > +increased by one before and after this two updates. An odd 
> > > > > > version
> > > > > > +indicates an in-progress update.
> > > >
> > > > I'm not sure I understand what does the above say. Also does this
> > > > require two atomics? Seems pretty expensive. And why is it called
> > > > version?
> > > >
> > > > > > +
> > > > > > +used_idx: Store old index of used ring before we update used 
> > > > > > ring and
> > > > > > +inflight buffer so that slave can know whether an odd version 
> > > > > > inflight
> > > > > > +head-descriptor in inflight buffer is processed or not.
> > > >
> > > > Here too.
> > > >
> > >
> > > Sorry, the above description may be not clear. This two fields are
> > > used to indicate whether we have an in-progress update to used ring
> > > and inflight buffer. If slave crash before the update to used_ring and
> > > after the update to inflight buffer, the version should be odd and
> > > used_idx should be equal to used_ring.idx. Then we need to roll back
> > > the update to inflight buffer. As for the name of the version filed,
> > > actually I didn't find a good one, so I just copy it from struct
> > > kvm_steal_time...
> > >
> > > > > > +
> > > > > > +avail_idx: Used to preserve the descriptor's order in avail 
> > > > > > ring so that
> > > > > > +slave can resubmit descriptors in order.
> > > >
> > > > Why would that be necessary?
> > > >
> > >
> > > Maybe some devices will be able to use it to preserve order after
> > > reconnecting in future?
> >
> > If buffers are used in order then old entries in the ring
> > are not overwritten so inflight tracking is not
> > necessary. This is exactly the case with vhost user net today.
> >
> 
> OK, looks reasonable to me.
> 
> > > > >
> > > > > Will a completely new "packed vring" inflight shm layout be necessary 
> > > > > to
> > > > > support the packed vring layout in VIRTIO 1.1?
> > > > >
> > > > > https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-610007
> > > >
> > > > Probably.
> > > >
> > >
> > > How about supporting packed virtqueue in guest driver?
> > >
> > > Thanks,
> > > Yongji
> >
> > Depends on the guest right? Linux has it:
> >
> 
> Sorry, actually I mean I prefer to support inflight tracking for
> packed virtqueue in guest driver. This feature is only used by legacy
> virtio 1.0 or virtio 0.9 device. What do you think about it?
> 
> Thanks,
> Yongji

I don't see what does one have to do with the other.  Either we do or we
don't want to do it without downtime and retries. If we don't mind

Re: [Qemu-devel] [PATCH v5 1/6] vhost-user: Support transferring inflight buffer between qemu and backend

2019-01-29 Thread Yongji Xie
On Tue, 29 Jan 2019 at 22:15, Michael S. Tsirkin  wrote:
>
> On Tue, Jan 29, 2019 at 02:15:35PM +0800, Yongji Xie wrote:
> > On Tue, 29 Jan 2019 at 12:26, Michael S. Tsirkin  wrote:
> > >
> > > On Tue, Jan 29, 2019 at 12:11:55PM +0800, Stefan Hajnoczi wrote:
> > > > On Tue, Jan 22, 2019 at 04:31:47PM +0800, elohi...@gmail.com wrote:
> > > > > +typedef struct DescState {
> > > > > +uint8_t inuse;
> > > > > +uint8_t version;
> > > > > +uint16_t used_idx;
> > > > > +uint16_t avail_idx;
> > > > > +uint16_t reserved;
> > > > > +} DescState;
> > > > > +
> > > > > +typedef struct QueueRegion {
> > > > > +uint8_t valid;
> > >
> > > what's this?
> > >
> >
> > We can use this to check whether this buffer is reset by qemu.
>
>
> I'd use version here. Document that it's 1 currently.

If we put version into the shared buffer, QEMU will reset it when vm
reset. Then if backend restart at the same time, the version of this
buffer will be lost. So I still let QEMU maintain it and send it back
through message's payload.

> Or do we want feature flags so we can support downgrades too?
>

We faced the same problem that the feature flags will be lost if QEMU
do not maintain it. So maybe we should put this into message's
payload?

>
> > > > > +uint16_t desc_num;
> > >
> > > there's padding before this field. Pls make it explicit.
> > >
> >
> > Will do it.
> >
> > > > > +DescState desc[0];
> > > > > +} QueueRegion;
> > > > > +
> > > > > +The struct DescState is used to describe one head-descriptor's 
> > > > > state. The
> > > > > +fields have following meanings:
> > > > > +
> > > > > +inuse: Indicate whether the descriptor is inuse or not.
> > >
> > > inuse by what?
> > >
> >
> > Maybe inflight is better?
> >
> > > > > +
> > > > > +version: Indicate whether we have an atomic update to used ring 
> > > > > and
> > > > > +inflight buffer when slave crash at that point. This field 
> > > > > should be
> > > > > +increased by one before and after this two updates. An odd 
> > > > > version
> > > > > +indicates an in-progress update.
> > >
> > > I'm not sure I understand what does the above say. Also does this
> > > require two atomics? Seems pretty expensive. And why is it called
> > > version?
> > >
> > > > > +
> > > > > +used_idx: Store old index of used ring before we update used 
> > > > > ring and
> > > > > +inflight buffer so that slave can know whether an odd version 
> > > > > inflight
> > > > > +head-descriptor in inflight buffer is processed or not.
> > >
> > > Here too.
> > >
> >
> > Sorry, the above description may be not clear. This two fields are
> > used to indicate whether we have an in-progress update to used ring
> > and inflight buffer. If slave crash before the update to used_ring and
> > after the update to inflight buffer, the version should be odd and
> > used_idx should be equal to used_ring.idx. Then we need to roll back
> > the update to inflight buffer. As for the name of the version filed,
> > actually I didn't find a good one, so I just copy it from struct
> > kvm_steal_time...
> >
> > > > > +
> > > > > +avail_idx: Used to preserve the descriptor's order in avail ring 
> > > > > so that
> > > > > +slave can resubmit descriptors in order.
> > >
> > > Why would that be necessary?
> > >
> >
> > Maybe some devices will be able to use it to preserve order after
> > reconnecting in future?
>
> If buffers are used in order then old entries in the ring
> are not overwritten so inflight tracking is not
> necessary. This is exactly the case with vhost user net today.
>

OK, looks reasonable to me.

> > > >
> > > > Will a completely new "packed vring" inflight shm layout be necessary to
> > > > support the packed vring layout in VIRTIO 1.1?
> > > >
> > > > https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-610007
> > >
> > > Probably.
> > >
> >
> > How about supporting packed virtqueue in guest driver?
> >
> > Thanks,
> > Yongji
>
> Depends on the guest right? Linux has it:
>

Sorry, actually I mean I prefer to support inflight tracking for
packed virtqueue in guest driver. This feature is only used by legacy
virtio 1.0 or virtio 0.9 device. What do you think about it?

Thanks,
Yongji



Re: [Qemu-devel] [PATCH v5 1/6] vhost-user: Support transferring inflight buffer between qemu and backend

2019-01-29 Thread Michael S. Tsirkin
On Tue, Jan 29, 2019 at 02:15:35PM +0800, Yongji Xie wrote:
> On Tue, 29 Jan 2019 at 12:26, Michael S. Tsirkin  wrote:
> >
> > On Tue, Jan 29, 2019 at 12:11:55PM +0800, Stefan Hajnoczi wrote:
> > > On Tue, Jan 22, 2019 at 04:31:47PM +0800, elohi...@gmail.com wrote:
> > > > +typedef struct DescState {
> > > > +uint8_t inuse;
> > > > +uint8_t version;
> > > > +uint16_t used_idx;
> > > > +uint16_t avail_idx;
> > > > +uint16_t reserved;
> > > > +} DescState;
> > > > +
> > > > +typedef struct QueueRegion {
> > > > +uint8_t valid;
> >
> > what's this?
> >
> 
> We can use this to check whether this buffer is reset by qemu.


I'd use version here. Document that it's 1 currently.
Or do we want feature flags so we can support downgrades too?



> > > > +uint16_t desc_num;
> >
> > there's padding before this field. Pls make it explicit.
> >
> 
> Will do it.
> 
> > > > +DescState desc[0];
> > > > +} QueueRegion;
> > > > +
> > > > +The struct DescState is used to describe one head-descriptor's state. 
> > > > The
> > > > +fields have following meanings:
> > > > +
> > > > +inuse: Indicate whether the descriptor is inuse or not.
> >
> > inuse by what?
> >
> 
> Maybe inflight is better?
> 
> > > > +
> > > > +version: Indicate whether we have an atomic update to used ring and
> > > > +inflight buffer when slave crash at that point. This field should 
> > > > be
> > > > +increased by one before and after this two updates. An odd version
> > > > +indicates an in-progress update.
> >
> > I'm not sure I understand what does the above say. Also does this
> > require two atomics? Seems pretty expensive. And why is it called
> > version?
> >
> > > > +
> > > > +used_idx: Store old index of used ring before we update used ring 
> > > > and
> > > > +inflight buffer so that slave can know whether an odd version 
> > > > inflight
> > > > +head-descriptor in inflight buffer is processed or not.
> >
> > Here too.
> >
> 
> Sorry, the above description may be not clear. This two fields are
> used to indicate whether we have an in-progress update to used ring
> and inflight buffer. If slave crash before the update to used_ring and
> after the update to inflight buffer, the version should be odd and
> used_idx should be equal to used_ring.idx. Then we need to roll back
> the update to inflight buffer. As for the name of the version filed,
> actually I didn't find a good one, so I just copy it from struct
> kvm_steal_time...
> 
> > > > +
> > > > +avail_idx: Used to preserve the descriptor's order in avail ring 
> > > > so that
> > > > +slave can resubmit descriptors in order.
> >
> > Why would that be necessary?
> >
> 
> Maybe some devices will be able to use it to preserve order after
> reconnecting in future?

If buffers are used in order then old entries in the ring
are not overwritten so inflight tracking is not
necessary. This is exactly the case with vhost user net today.

> > >
> > > Will a completely new "packed vring" inflight shm layout be necessary to
> > > support the packed vring layout in VIRTIO 1.1?
> > >
> > > https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-610007
> >
> > Probably.
> >
> 
> How about supporting packed virtqueue in guest driver?
> 
> Thanks,
> Yongji

Depends on the guest right? Linux has it:


commit f959a128fe83090981add69aadc87a4e496e9369
Author: Tiwei Bie 
Date:   Wed Nov 21 18:03:30 2018 +0800

virtio_ring: advertize packed ring layout

-- 
MST



Re: [Qemu-devel] [PATCH v5 1/6] vhost-user: Support transferring inflight buffer between qemu and backend

2019-01-28 Thread Yongji Xie
On Tue, 29 Jan 2019 at 12:26, Michael S. Tsirkin  wrote:
>
> On Tue, Jan 29, 2019 at 12:11:55PM +0800, Stefan Hajnoczi wrote:
> > On Tue, Jan 22, 2019 at 04:31:47PM +0800, elohi...@gmail.com wrote:
> > > +typedef struct DescState {
> > > +uint8_t inuse;
> > > +uint8_t version;
> > > +uint16_t used_idx;
> > > +uint16_t avail_idx;
> > > +uint16_t reserved;
> > > +} DescState;
> > > +
> > > +typedef struct QueueRegion {
> > > +uint8_t valid;
>
> what's this?
>

We can use this to check whether this buffer is reset by qemu.

> > > +uint16_t desc_num;
>
> there's padding before this field. Pls make it explicit.
>

Will do it.

> > > +DescState desc[0];
> > > +} QueueRegion;
> > > +
> > > +The struct DescState is used to describe one head-descriptor's state. The
> > > +fields have following meanings:
> > > +
> > > +inuse: Indicate whether the descriptor is inuse or not.
>
> inuse by what?
>

Maybe inflight is better?

> > > +
> > > +version: Indicate whether we have an atomic update to used ring and
> > > +inflight buffer when slave crash at that point. This field should be
> > > +increased by one before and after this two updates. An odd version
> > > +indicates an in-progress update.
>
> I'm not sure I understand what does the above say. Also does this
> require two atomics? Seems pretty expensive. And why is it called
> version?
>
> > > +
> > > +used_idx: Store old index of used ring before we update used ring and
> > > +inflight buffer so that slave can know whether an odd version 
> > > inflight
> > > +head-descriptor in inflight buffer is processed or not.
>
> Here too.
>

Sorry, the above description may be not clear. This two fields are
used to indicate whether we have an in-progress update to used ring
and inflight buffer. If slave crash before the update to used_ring and
after the update to inflight buffer, the version should be odd and
used_idx should be equal to used_ring.idx. Then we need to roll back
the update to inflight buffer. As for the name of the version filed,
actually I didn't find a good one, so I just copy it from struct
kvm_steal_time...

> > > +
> > > +avail_idx: Used to preserve the descriptor's order in avail ring so 
> > > that
> > > +slave can resubmit descriptors in order.
>
> Why would that be necessary?
>

Maybe some devices will be able to use it to preserve order after
reconnecting in future?

> >
> > Will a completely new "packed vring" inflight shm layout be necessary to
> > support the packed vring layout in VIRTIO 1.1?
> >
> > https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-610007
>
> Probably.
>

How about supporting packed virtqueue in guest driver?

Thanks,
Yongji



Re: [Qemu-devel] [PATCH v5 1/6] vhost-user: Support transferring inflight buffer between qemu and backend

2019-01-28 Thread Michael S. Tsirkin
On Tue, Jan 29, 2019 at 12:11:55PM +0800, Stefan Hajnoczi wrote:
> On Tue, Jan 22, 2019 at 04:31:47PM +0800, elohi...@gmail.com wrote:
> > +typedef struct DescState {
> > +uint8_t inuse;
> > +uint8_t version;
> > +uint16_t used_idx;
> > +uint16_t avail_idx;
> > +uint16_t reserved;
> > +} DescState;
> > +
> > +typedef struct QueueRegion {
> > +uint8_t valid;

what's this?

> > +uint16_t desc_num;

there's padding before this field. Pls make it explicit.

> > +DescState desc[0];
> > +} QueueRegion;
> > +
> > +The struct DescState is used to describe one head-descriptor's state. The
> > +fields have following meanings:
> > +
> > +inuse: Indicate whether the descriptor is inuse or not.

inuse by what?

> > +
> > +version: Indicate whether we have an atomic update to used ring and
> > +inflight buffer when slave crash at that point. This field should be
> > +increased by one before and after this two updates. An odd version
> > +indicates an in-progress update.

I'm not sure I understand what does the above say. Also does this
require two atomics? Seems pretty expensive. And why is it called
version?

> > +
> > +used_idx: Store old index of used ring before we update used ring and
> > +inflight buffer so that slave can know whether an odd version inflight
> > +head-descriptor in inflight buffer is processed or not.

Here too.

> > +
> > +avail_idx: Used to preserve the descriptor's order in avail ring so 
> > that
> > +slave can resubmit descriptors in order.

Why would that be necessary?

> 
> Will a completely new "packed vring" inflight shm layout be necessary to
> support the packed vring layout in VIRTIO 1.1?
> 
> https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-610007

Probably.

-- 
MST



Re: [Qemu-devel] [PATCH v5 1/6] vhost-user: Support transferring inflight buffer between qemu and backend

2019-01-28 Thread Stefan Hajnoczi
On Tue, Jan 22, 2019 at 04:31:47PM +0800, elohi...@gmail.com wrote:
> +typedef struct DescState {
> +uint8_t inuse;
> +uint8_t version;
> +uint16_t used_idx;
> +uint16_t avail_idx;
> +uint16_t reserved;
> +} DescState;
> +
> +typedef struct QueueRegion {
> +uint8_t valid;
> +uint16_t desc_num;
> +DescState desc[0];
> +} QueueRegion;
> +
> +The struct DescState is used to describe one head-descriptor's state. The
> +fields have following meanings:
> +
> +inuse: Indicate whether the descriptor is inuse or not.
> +
> +version: Indicate whether we have an atomic update to used ring and
> +inflight buffer when slave crash at that point. This field should be
> +increased by one before and after this two updates. An odd version
> +indicates an in-progress update.
> +
> +used_idx: Store old index of used ring before we update used ring and
> +inflight buffer so that slave can know whether an odd version inflight
> +head-descriptor in inflight buffer is processed or not.
> +
> +avail_idx: Used to preserve the descriptor's order in avail ring so that
> +slave can resubmit descriptors in order.

Will a completely new "packed vring" inflight shm layout be necessary to
support the packed vring layout in VIRTIO 1.1?

https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-610007


signature.asc
Description: PGP signature


[Qemu-devel] [PATCH v5 1/6] vhost-user: Support transferring inflight buffer between qemu and backend

2019-01-22 Thread elohimes
From: Xie Yongji 

This patch introduces two new messages VHOST_USER_GET_INFLIGHT_FD
and VHOST_USER_SET_INFLIGHT_FD to support transferring a shared
buffer between qemu and backend.

Firstly, qemu uses VHOST_USER_GET_INFLIGHT_FD to get the
shared buffer from backend. Then qemu should send it back
through VHOST_USER_SET_INFLIGHT_FD each time we start vhost-user.

This shared buffer is used to track inflight I/O by backend.
Qemu should clear it when vm reset.

Signed-off-by: Xie Yongji 
Signed-off-by: Chai Wen 
Signed-off-by: Zhang Yu 
---
 docs/interop/vhost-user.txt   | 101 +++
 hw/virtio/vhost-user.c| 110 ++
 hw/virtio/vhost.c | 105 
 include/hw/virtio/vhost-backend.h |  10 +++
 include/hw/virtio/vhost.h |  19 ++
 5 files changed, 345 insertions(+)

diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
index c2194711d9..fa470a8fe2 100644
--- a/docs/interop/vhost-user.txt
+++ b/docs/interop/vhost-user.txt
@@ -142,6 +142,18 @@ Depending on the request type, payload can be:
Offset: a 64-bit offset of this area from the start of the
supplied file descriptor
 
+ * Inflight description
+   ---
+   | mmap size | mmap offset | version | num queues | queue size |
+   ---
+
+   mmap size: a 64-bit size of area to track inflight I/O
+   mmap offset: a 64-bit offset of this area from the start
+of the supplied file descriptor
+   version: a 16-bit version of this area
+   num queues: a 16-bit number of virtqueues
+   queue size: a 16-bit size of virtqueues
+
 In QEMU the vhost-user message is implemented with the following struct:
 
 typedef struct VhostUserMsg {
@@ -157,6 +169,7 @@ typedef struct VhostUserMsg {
 struct vhost_iotlb_msg iotlb;
 VhostUserConfig config;
 VhostUserVringArea area;
+VhostUserInflight inflight;
 };
 } QEMU_PACKED VhostUserMsg;
 
@@ -175,6 +188,7 @@ the ones that do:
  * VHOST_USER_GET_PROTOCOL_FEATURES
  * VHOST_USER_GET_VRING_BASE
  * VHOST_USER_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD)
+ * VHOST_USER_GET_INFLIGHT_FD (if VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)
 
 [ Also see the section on REPLY_ACK protocol extension. ]
 
@@ -188,6 +202,7 @@ in the ancillary data:
  * VHOST_USER_SET_VRING_CALL
  * VHOST_USER_SET_VRING_ERR
  * VHOST_USER_SET_SLAVE_REQ_FD
+ * VHOST_USER_SET_INFLIGHT_FD (if VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)
 
 If Master is unable to send the full message or receives a wrong reply it will
 close the connection. An optional reconnection mechanism can be implemented.
@@ -382,6 +397,71 @@ If VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD protocol feature is 
negotiated,
 slave can send file descriptors (at most 8 descriptors in each message)
 to master via ancillary data using this fd communication channel.
 
+Inflight I/O tracking
+-
+
+To support reconnecting after restart or crash, slave may need to get
+and resubmit inflight I/O. If virtqueue is processed in order, we can
+easily achieve that by getting the inflight head-descriptor from avail ring.
+However, it can't work when we process descriptors out-of-order because
+some entries which store inflight head-descriptor in avail ring might be
+overrided by new entries. To solve this problem, slave need to allocate a
+buffer to track inflight head-descriptor and share it with master for
+persistent. VHOST_USER_GET_INFLIGHT_FD and VHOST_USER_SET_INFLIGHT_FD are
+used to transfer this buffer between master and slave. And the format of
+this buffer is described below:
+
+---
+| queue0 region | queue1 region | ... | queueN region |
+---
+
+N is the number of available virtqueues. Slave could get it from num queues
+field of VhostUserInflight.
+
+Queue region can be implemented as:
+
+typedef struct DescState {
+uint8_t inuse;
+uint8_t version;
+uint16_t used_idx;
+uint16_t avail_idx;
+uint16_t reserved;
+} DescState;
+
+typedef struct QueueRegion {
+uint8_t valid;
+uint16_t desc_num;
+DescState desc[0];
+} QueueRegion;
+
+The struct DescState is used to describe one head-descriptor's state. The
+fields have following meanings:
+
+inuse: Indicate whether the descriptor is inuse or not.
+
+version: Indicate whether we have an atomic update to used ring and
+inflight buffer when slave crash at that point. This field should be
+increased by one before and after this two updates. An odd version
+indicates an in-progress update.
+
+used_idx: Store old index of used ring before we update used ring and
+inflight buffer so that slave can know whether an odd version inflight
+head-descriptor in inflight buffer is processed or not.
+
+