[dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support

2015-01-23 Thread Linhaifeng



>>
>> Can you mmap the region if gpa is 0? When i run VM with two numa node (qemu
>> will create two hugepage file) found that always failed to mmap with the 
>> region
>> which gpa is 0.
>>
>> BTW can we ensure the memory regions cover with all the memory of hugepage
>> for VM?
>>
> We had discussed this. Seemed a qemu bug.
> Do you have update on this?
> 

Yes,because qemu not send all hugepage file fd to vhost-user.I have send the 
patch bug no body reply.


> 
> Thanks Haifeng for your comments.
> Next time, could you help cut the code not commented?
> They are too long. It is really hard for me to find your comment.
> 
> 
> .
> 

OK:)

-- 
Regards,
Haifeng



[dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support

2015-01-23 Thread Linhaifeng


On 2015/1/23 11:40, Xie, Huawei wrote:
> 
> 
>> -Original Message-
>> From: Linhaifeng [mailto:haifeng.lin at huawei.com]
>> Sent: Thursday, December 11, 2014 1:36 PM
>> To: Xie, Huawei; dev at dpdk.org
>> Cc: haifeng.lin at intel.com
>> Subject: Re: [dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user
>> support
>>
>>
>>
>> On 2014/12/11 5:37, Huawei Xie wrote:
>>> vhost-user support
>>>
>>>
>>> Signed-off-by: Huawei Xie 
>>> ---
>>>  lib/librte_vhost/Makefile |   5 +-
>>> +   case VHOST_USER_SET_LOG_FD:
>>
>> should close fd for fd leak when receive VHOST_USER_SET_LOG_FD msg?
> Ok, would at least close the fd though we don't support.
> Do you know how to test this?

reboot VM many times then watch the number of fd with command 'ls /proc/pidof 
vhost/fd |wc -l'

>>
>>> +   RTE_LOG(INFO, VHOST_CONFIG, "not implemented.\n");
>>> +
>>> +   case VHOST_USER_SET_VRING_ERR:
>>
>> should close fd for fd leak?
> Would do.
>>
>>> +   RTE_LOG(INFO, VHOST_CONFIG, "not implemented\n");
>>> +   break;
>>> +#endif
>>>
>>
>> --
>> Regards,
>> Haifeng
> 
> 
> 

-- 
Regards,
Haifeng



[dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support

2015-01-23 Thread Xie, Huawei


> -Original Message-
> From: Linhaifeng [mailto:haifeng.lin at huawei.com]
> Sent: Thursday, December 11, 2014 1:36 PM
> To: Xie, Huawei; dev at dpdk.org
> Cc: haifeng.lin at intel.com
> Subject: Re: [dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user
> support
> 
> 
> 
> On 2014/12/11 5:37, Huawei Xie wrote:
> > vhost-user support
> >
> >
> > Signed-off-by: Huawei Xie 
> > ---
> >  lib/librte_vhost/Makefile |   5 +-
> > +   case VHOST_USER_SET_LOG_FD:
> 
> should close fd for fd leak when receive VHOST_USER_SET_LOG_FD msg?
Ok, would at least close the fd though we don't support.
Do you know how to test this?
> 
> > +   RTE_LOG(INFO, VHOST_CONFIG, "not implemented.\n");
> > +
> > +   case VHOST_USER_SET_VRING_ERR:
> 
> should close fd for fd leak?
Would do.
> 
> > +   RTE_LOG(INFO, VHOST_CONFIG, "not implemented\n");
> > +   break;
> > +#endif
> >
> 
> --
> Regards,
> Haifeng



[dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support

2015-01-23 Thread Xie, Huawei


> -Original Message-
> From: Linhaifeng [mailto:haifeng.lin at huawei.com]
> Sent: Thursday, December 11, 2014 2:04 PM
> To: Xie, Huawei; dev at dpdk.org
> Cc: haifeng.lin at intel.com
> Subject: Re: [dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user
> support
> 
> 
> 
> On 2014/12/11 5:37, Huawei Xie wrote:
> > vhost-user support
> >
> >
> > Signed-off-by: Huawei Xie 
> > ---
> >  lib/librte_vhost/Makefile |   5 +-
> >  lib/librte_vhost/vhost-net.h  |   4 +
> >  lib/librte_vhost/vhost_cuse/virtio-net-cdev.c |   9 +
> >  lib/librte_vhost/vhost_user/vhost-net-user.c  | 422
> ++
> > +
> > +static struct vhost_server *g_vhost_server;
> 
> Only support one vhost-user port ?

The newer patch supports multiple vhost-user port.
Will send next week.
> 
> > +
> > +static const char *vhost_message_str[VHOST_USER_MAX] = {

> 
> why not use vserver_new_vq_conn(int fd, void* dat) ?
> 
>
Since currently we only pass memory address and 32bit number, will change to 
void *.
 > > +{
> > +   struct vhost_server *vserver = (void *)(uintptr_t)dat;
> regions[VHOST_MEMORY_MAX_NREGIONS];
> > +   uint64_t mapped_address, base_address = 0;
> > +
> > +   for (idx = 0; idx < memory.nregions; idx++) {
> > +   if (memory.regions[idx].guest_phys_addr == 0)
> > +   base_address = memory.regions[idx].userspace_addr;
> > +   }
> > +   if (base_address == 0) {
> 
> when will base_address is 0? how to test to run this code?

base_address with 0 is the requirement for vhost-cuse implementation.
Normally it is the region [0, 0xA).
In this version, we would keep this and not fix this. In future, will remove 
this.

> 
> > +   RTE_LOG(ERR, VHOST_CONFIG,
> > +   "couldn't find the mem region whose GPA is 0.\n");
> > +   return -1;
> > +   }
> > +
> > +   for (idx = 0; idx < memory.nregions; idx++) {
> > +   regions[idx].guest_phys_address =
> > +   memory.regions[idx].guest_phys_addr;
> > +   regions[idx].guest_phys_address_end =
> > +   memory.regions[idx].guest_phys_addr +
> > +   memory.regions[idx].memory_size;
> > +   regions[idx].memory_size = memory.regions[idx].memory_size;
> > +   regions[idx].userspace_address =
> > +   memory.regions[idx].userspace_addr;
> > +
> > +   /* This is ugly */
> > +   mapped_address = (uint64_t)(uintptr_t)mmap(NULL,
> > +   regions[idx].memory_size +
> > +   memory.regions[idx].mmap_offset,
> > +   PROT_READ | PROT_WRITE, MAP_SHARED,
> > +   pmsg->fds[idx],
> > +   0);
> 
> Can you mmap the region if gpa is 0? When i run VM with two numa node (qemu
> will create two hugepage file) found that always failed to mmap with the 
> region
> which gpa is 0.
> 
> BTW can we ensure the memory regions cover with all the memory of hugepage
> for VM?
> 
We had discussed this. Seemed a qemu bug.
Do you have update on this?

> > +   RTE_LOG(INFO, VHOST_CONFIG,
> > +   "mapped region %d to %p\n",
> > +struct virtio_net_device_ops const *notify_ops;
> > +struct virtio_net * get_device(struct vhost_device_ctx ctx);
> > +
> > +#endif
> >
> 
> --
> Regards,
> Haifeng

Thanks Haifeng for your comments.
Next time, could you help cut the code not commented?
They are too long. It is really hard for me to find your comment.



[dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support

2015-01-05 Thread Xie, Huawei
> -Original Message-
> From: Linhaifeng [mailto:haifeng.lin at huawei.com]
> Sent: Wednesday, December 10, 2014 10:36 PM
> To: Xie, Huawei; dev at dpdk.org
> Cc: haifeng.lin at intel.com
> Subject: Re: [dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user
> support
> 
> 
> 
> On 2014/12/11 5:37, Huawei Xie wrote:
> > vhost-user support
> >
> >
> 
> should close fd for fd leak when receive VHOST_USER_SET_LOG_FD msg?
> 
> > +   RTE_LOG(INFO, VHOST_CONFIG, "not implemented.\n");
> > +   case VHOST_USER_SET_VRING_ERR:
> 
> should close fd for fd leak?
> 
Thanks for careful review.
> > +   RTE_LOG(INFO, VHOST_CONFIG, "not implemented\n");
> > +   break;




[dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support

2015-01-04 Thread Xie, Huawei
> -Original Message-
> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
> Sent: Wednesday, December 24, 2014 12:21 AM
> To: Xie, Huawei; dev at dpdk.org
> Cc: haifeng.lin at intel.com
> Subject: Re: [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support
> 
> (2014/12/11 6:37), Huawei Xie wrote:
> > vhost-user support
> >
> >
> 
> I guess 'feature' should be filled as payload like below.
> msg.payload.u64 = features;
>

Applied. Thanks.

> Thanks,
> Tetsuya
> 


[dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support

2014-12-24 Thread Tetsuya Mukawa
(2014/12/11 6:37), Huawei Xie wrote:
> vhost-user support
>
>
> Signed-off-by: Huawei Xie 
> ---
>  lib/librte_vhost/Makefile |   5 +-
>  lib/librte_vhost/vhost-net.h  |   4 +
>  lib/librte_vhost/vhost_cuse/virtio-net-cdev.c |   9 +
>  lib/librte_vhost/vhost_user/vhost-net-user.c  | 422 
> ++
>  lib/librte_vhost/vhost_user/vhost-net-user.h  | 108 +++
>  lib/librte_vhost/vhost_user/virtio-net-user.c | 199 
>  lib/librte_vhost/vhost_user/virtio-net-user.h |  48 +++
>  lib/librte_vhost/virtio-net.c |  16 +-
>  lib/librte_vhost/virtio-net.h |  43 +++
>  9 files changed, 842 insertions(+), 12 deletions(-)
>  create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.c
>  create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.h
>  create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.c
>  create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.h
>  create mode 100644 lib/librte_vhost/virtio-net.h
>
> diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
> index e0d0ef6..b2f14a0 100644
> --- a/lib/librte_vhost/Makefile
> +++ b/lib/librte_vhost/Makefile
> @@ -34,10 +34,11 @@ include $(RTE_SDK)/mk/rte.vars.mk
>  # library name
>  LIB = librte_vhost.a
>  
> -CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -I vhost_cuse -O3 
> -D_FILE_OFFSET_BITS=64 -lfuse
> +CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -I vhost_cuse -I vhost_user -O3 
> -D_FILE_OFFSET_BITS=64 -lfuse
>  LDFLAGS += -lfuse
>  # all source are stored in SRCS-y
> -SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost_cuse/vhost-net-cdev.c 
> vhost_cuse/virtio-net-cdev.c virtio-net.c vhost_rxtx.c
> +#SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost_cuse/vhost-net-cdev.c 
> vhost_cuse/virtio-net-cdev.c virtio-net.c vhost_rxtx.c
> +SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost_user/vhost-net-user.c 
> vhost_user/virtio-net-user.c vhost_user/fd_man.c virtio-net.c vhost_rxtx.c
>  
>  # install includes
>  SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_virtio_net.h
> diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h
> index f7e96fd..f9ec40b 100644
> --- a/lib/librte_vhost/vhost-net.h
> +++ b/lib/librte_vhost/vhost-net.h
> @@ -41,8 +41,12 @@
>  
>  #include 
>  
> +#include "rte_virtio_net.h"
> +
>  #define VHOST_MEMORY_MAX_NREGIONS 8
>  
> +extern struct vhost_net_device_ops const *ops;
> +
>  /* Macros for printing using RTE_LOG */
>  #define RTE_LOGTYPE_VHOST_CONFIG RTE_LOGTYPE_USER1
>  #define RTE_LOGTYPE_VHOST_DATA   RTE_LOGTYPE_USER1
> diff --git a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c 
> b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
> index edcbc10..8ac3360 100644
> --- a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
> +++ b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
> @@ -268,6 +268,7 @@ cuse_set_mem_table(struct vhost_device_ctx ctx,
>   struct vhost_memory_region *mem_regions = (void *)(uintptr_t)
>   ((uint64_t)(uintptr_t)mem_regions_addr + size);
>   uint64_t base_address = 0, mapped_address, mapped_size;
> + struct virtio_dev *dev;
>  
>   for (idx = 0; idx < nregions; idx++) {
>   regions[idx].guest_phys_address =
> @@ -335,6 +336,14 @@ cuse_set_mem_table(struct vhost_device_ctx ctx,
>   regions[idx].guest_phys_address;
>   }
>  
> + dev = get_device(ctx);
> + if (dev && dev->mem && dev->mmaped_address) {
> + munmap((void *)(uintptr_t)dev->mmaped_address,
> + (size_t)dev->mmaped_size);
> + free(dev->mem);
> + dev->mem = NULL;
> + }
> +
>   ops->set_mem_table(ctx, ®ions[0], valid_regions);
>   return 0;
>  }
> diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c 
> b/lib/librte_vhost/vhost_user/vhost-net-user.c
> new file mode 100644
> index 000..841d7e6
> --- /dev/null
> +++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
> @@ -0,0 +1,422 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + * * Redistributions of source code must retain the above copyright
> + *   notice, this list of conditions and the following disclaimer.
> + * * Redistributions in binary form must reproduce the above copyright
> + *   notice, this list of conditions and the following disclaimer in
> + *   the documentation and/or other materials provided with the
> + *   distribution.
> + * * Neither the name of Intel Corporation nor the names of its
> + *   contributors may be used to endorse or promote products derived
> + *   from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY E

[dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support

2014-12-19 Thread Tetsuya Mukawa
(2014/12/18 2:31), Xie, Huawei wrote:
>
>> -Original Message-
>> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
>> Sent: Tuesday, December 16, 2014 9:22 PM
>> To: Xie, Huawei; dev at dpdk.org
>> Cc: Linhaifeng (haifeng.lin at huawei.com)
>> Subject: Re: [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support
>>
>> (2014/12/17 12:31), Tetsuya Mukawa wrote:
>>> (2014/12/17 10:06), Xie, Huawei wrote:
>> +{
>> +struct virtio_net *dev = get_device(ctx);
>> +
>> +/* We have to stop the queue (virtio) if it is running. */
>> +if (dev->flags & VIRTIO_DEV_RUNNING)
>> +notify_ops->destroy_device(dev);
> I have an one concern about finalization of vrings.
> Can vhost-backend stop accessing RX/TX to the vring before replying to
> this message?
>
> QEMU sends this message when virtio-net device is finalized by
> virtio-net driver on the guest.
> After finalization, memories used by the vring will be freed by
> virtio-net driver, because these memories are allocated by virtio-net
> driver.
> Because of this, I guess vhost-backend must stop accessing to vring
> before replying to this message.
>
> I am not sure what is a good way to stop accessing.
> One idea is adding a condition checking when rte_vhost_dequeue_burst()
> and rte_vhost_enqueue_burst() is called.
> Anyway we probably need to wait for stopping access before replying.
>
> Thanks,
> Tetsuya
>
 I think we have discussed the similar question.
>>> Sorry, probably I might not be able to got your point correctly at the
>>> former email.
>>>
 It is actually the same issue whether the virtio APP in guest is crashed, 
 or is
>> finalized.
>>> I guess when the APP is finalized correctly, we can have a solution.
>>> Could you please read comment I wrote later?
>>>
 The virtio APP will only write to the STATUS register without 
 waiting/syncing
>> to vhost backend.
>>> Yes, virtio APP only write to the STATUS register. I agree with it.
>>>
>>> When the register is written by guest, KVM will catch it, and the
>>> context will be change to QEMU. And QEMU will works like below.
>>> (Also while doing following steps, guest is waiting because the context
>>> is in QEMU)
>>>
>>> Could you please see below with latest QEMU code?
>>> 1. virtio_ioport_write() [hw/virtio/virtio-pci.c] <= virtio APP will
>>> wait for replying of this function.
>>> 2. virtio_set_status() [hw/virtio/virtio.c]
>>> 3. virtio_net_set_status() [hw/net/virtio-net.c]
>>> 4. virtio_net_vhost_status() [hw/net/virtio-net.c]
>>> 5. vhost_net_stop() [hw/net/vhost_net.c]
>>> 6. vhost_net_stop_one() [hw/net/vhost_net.c]
>>> 7. vhost_dev_stop() [hw/virtio/vhost.c]
>>> 8. vhost_virtqueue_stop() [hw/virtio/vhost.c]
>>> 9. vhost_user_call() [virtio/vhost-user.c]
>>> 10. VHOST_USER_GET_VRING_BASE message is sent to backend. And waiting
>>> for backend reply.
>>>
>>> When the vhost-user backend receives GET_VRING_BASE, I guess the guest
>>> APP is stopped.
>>> Also QEMU will wait for vhost-user backend reply because GET_VRING_BASE
>>> is synchronous message.
>>> Because of above, I guess virtio APP can wait for vhost-backend
>>> finalization.
>>>
 After that, not only the guest memory pointed by vring entry but also the
>> vring itself isn't usable any more.
 The memory for vring or pointed by vring entry might be used by other APPs.
>>> I agree.
>>>
 This will crash guest(rather than the vhost, do you agree?).
>>> I guess we cannot assume how the freed memory is used.
>>> In some cases, a new APP still works, but vhost backend can access
>>> inconsistency vring structure.
>>> In the case vhost backend could receive illegal packets.
>>> For example, avail_idx value might be changed to be 0x by a new APP.
>>> (I am not sure RX/TX functions can handle such a value correctly)
> Yes, I fully understand your concern and this is a very good point.
> my point is, in theory, a well written vhost backend is either able to detect 
> the error,
> or will crash the guest VM(virtio APP or even guest OS)  if it couldn't.
> For example, if avail_idx is set to 0Xf, if vhost_backend accepts this 
> value blindly, it will 
> 1. for tx ring, receive illegal packets. This is ok.
> 2. for rx ring, dma to guest memory before the availd_idx, which will crash 
> guest virtual machine.
> In current implementation, if there is chance our vhost backend aborts itself 
> in error handling, that is incorrect. We
> need to check our code if there is such case.

I agree. I will also check the code.

>
>>> Anyway, my point is if we can finalize vhost backend correctly, we only
>>> need to take care of crashing case.
>>> (If so, it's very nice :))
>>> So let's make sure whether virtio APP can wait for finalization, or not.
>>> I am thinking how to do it now.
> Sorry for the confuse. 
> The STATUS write must be synchronized with qemu.
> The vcpu thread for 

[dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support

2014-12-17 Thread Xie, Huawei


> -Original Message-
> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
> Sent: Tuesday, December 16, 2014 9:22 PM
> To: Xie, Huawei; dev at dpdk.org
> Cc: Linhaifeng (haifeng.lin at huawei.com)
> Subject: Re: [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support
> 
> (2014/12/17 12:31), Tetsuya Mukawa wrote:
> > (2014/12/17 10:06), Xie, Huawei wrote:
>  +{
>  +struct virtio_net *dev = get_device(ctx);
>  +
>  +/* We have to stop the queue (virtio) if it is running. */
>  +if (dev->flags & VIRTIO_DEV_RUNNING)
>  +notify_ops->destroy_device(dev);
> >>> I have an one concern about finalization of vrings.
> >>> Can vhost-backend stop accessing RX/TX to the vring before replying to
> >>> this message?
> >>>
> >>> QEMU sends this message when virtio-net device is finalized by
> >>> virtio-net driver on the guest.
> >>> After finalization, memories used by the vring will be freed by
> >>> virtio-net driver, because these memories are allocated by virtio-net
> >>> driver.
> >>> Because of this, I guess vhost-backend must stop accessing to vring
> >>> before replying to this message.
> >>>
> >>> I am not sure what is a good way to stop accessing.
> >>> One idea is adding a condition checking when rte_vhost_dequeue_burst()
> >>> and rte_vhost_enqueue_burst() is called.
> >>> Anyway we probably need to wait for stopping access before replying.
> >>>
> >>> Thanks,
> >>> Tetsuya
> >>>
> >> I think we have discussed the similar question.
> > Sorry, probably I might not be able to got your point correctly at the
> > former email.
> >
> >> It is actually the same issue whether the virtio APP in guest is crashed, 
> >> or is
> finalized.
> > I guess when the APP is finalized correctly, we can have a solution.
> > Could you please read comment I wrote later?
> >
> >> The virtio APP will only write to the STATUS register without 
> >> waiting/syncing
> to vhost backend.
> > Yes, virtio APP only write to the STATUS register. I agree with it.
> >
> > When the register is written by guest, KVM will catch it, and the
> > context will be change to QEMU. And QEMU will works like below.
> > (Also while doing following steps, guest is waiting because the context
> > is in QEMU)
> >
> > Could you please see below with latest QEMU code?
> > 1. virtio_ioport_write() [hw/virtio/virtio-pci.c] <= virtio APP will
> > wait for replying of this function.
> > 2. virtio_set_status() [hw/virtio/virtio.c]
> > 3. virtio_net_set_status() [hw/net/virtio-net.c]
> > 4. virtio_net_vhost_status() [hw/net/virtio-net.c]
> > 5. vhost_net_stop() [hw/net/vhost_net.c]
> > 6. vhost_net_stop_one() [hw/net/vhost_net.c]
> > 7. vhost_dev_stop() [hw/virtio/vhost.c]
> > 8. vhost_virtqueue_stop() [hw/virtio/vhost.c]
> > 9. vhost_user_call() [virtio/vhost-user.c]
> > 10. VHOST_USER_GET_VRING_BASE message is sent to backend. And waiting
> > for backend reply.
> >
> > When the vhost-user backend receives GET_VRING_BASE, I guess the guest
> > APP is stopped.
> > Also QEMU will wait for vhost-user backend reply because GET_VRING_BASE
> > is synchronous message.
> > Because of above, I guess virtio APP can wait for vhost-backend
> > finalization.
> >
> >> After that, not only the guest memory pointed by vring entry but also the
> vring itself isn't usable any more.
> >> The memory for vring or pointed by vring entry might be used by other APPs.
> > I agree.
> >
> >> This will crash guest(rather than the vhost, do you agree?).
> > I guess we cannot assume how the freed memory is used.
> > In some cases, a new APP still works, but vhost backend can access
> > inconsistency vring structure.
> > In the case vhost backend could receive illegal packets.
> > For example, avail_idx value might be changed to be 0x by a new APP.
> > (I am not sure RX/TX functions can handle such a value correctly)

Yes, I fully understand your concern and this is a very good point.
my point is, in theory, a well written vhost backend is either able to detect 
the error,
or will crash the guest VM(virtio APP or even guest OS)  if it couldn't.
For example, if avail_idx is set to 0Xf, if vhost_backend accepts this 
value blindly, it will 
1. for tx ring, receive illegal packets. This is ok.
2. for rx ring, dma to guest memory before the availd_idx, which will crash 
guest virtual machine.
In current implementation, if there is chance our vhost backend aborts itself 
in error handling, that is incorrect. We
need to check our code if there is such case.

> >
> > Anyway, my point is if we can finalize vhost backend correctly, we only
> > need to take care of crashing case.
> > (If so, it's very nice :))
> > So let's make sure whether virtio APP can wait for finalization, or not.
> > I am thinking how to do it now.

Sorry for the confuse. 
The STATUS write must be synchronized with qemu.
The vcpu thread for the virtio APP willn't continue until qemu  finishes the 
simulation
and resumes the vcpu thread.

In the RFC pa

[dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support

2014-12-17 Thread Tetsuya Mukawa
(2014/12/17 12:31), Tetsuya Mukawa wrote:
> (2014/12/17 10:06), Xie, Huawei wrote:
 +{
 +  struct virtio_net *dev = get_device(ctx);
 +
 +  /* We have to stop the queue (virtio) if it is running. */
 +  if (dev->flags & VIRTIO_DEV_RUNNING)
 +  notify_ops->destroy_device(dev);
>>> I have an one concern about finalization of vrings.
>>> Can vhost-backend stop accessing RX/TX to the vring before replying to
>>> this message?
>>>
>>> QEMU sends this message when virtio-net device is finalized by
>>> virtio-net driver on the guest.
>>> After finalization, memories used by the vring will be freed by
>>> virtio-net driver, because these memories are allocated by virtio-net
>>> driver.
>>> Because of this, I guess vhost-backend must stop accessing to vring
>>> before replying to this message.
>>>
>>> I am not sure what is a good way to stop accessing.
>>> One idea is adding a condition checking when rte_vhost_dequeue_burst()
>>> and rte_vhost_enqueue_burst() is called.
>>> Anyway we probably need to wait for stopping access before replying.
>>>
>>> Thanks,
>>> Tetsuya
>>>
>> I think we have discussed the similar question.
> Sorry, probably I might not be able to got your point correctly at the
> former email.
>
>> It is actually the same issue whether the virtio APP in guest is crashed, or 
>> is finalized.
> I guess when the APP is finalized correctly, we can have a solution.
> Could you please read comment I wrote later?
>
>> The virtio APP will only write to the STATUS register without 
>> waiting/syncing to vhost backend.
> Yes, virtio APP only write to the STATUS register. I agree with it.
>
> When the register is written by guest, KVM will catch it, and the
> context will be change to QEMU. And QEMU will works like below.
> (Also while doing following steps, guest is waiting because the context
> is in QEMU)
>
> Could you please see below with latest QEMU code?
> 1. virtio_ioport_write() [hw/virtio/virtio-pci.c] <= virtio APP will
> wait for replying of this function.
> 2. virtio_set_status() [hw/virtio/virtio.c]
> 3. virtio_net_set_status() [hw/net/virtio-net.c]
> 4. virtio_net_vhost_status() [hw/net/virtio-net.c]
> 5. vhost_net_stop() [hw/net/vhost_net.c]
> 6. vhost_net_stop_one() [hw/net/vhost_net.c]
> 7. vhost_dev_stop() [hw/virtio/vhost.c]
> 8. vhost_virtqueue_stop() [hw/virtio/vhost.c]
> 9. vhost_user_call() [virtio/vhost-user.c]
> 10. VHOST_USER_GET_VRING_BASE message is sent to backend. And waiting
> for backend reply.
>
> When the vhost-user backend receives GET_VRING_BASE, I guess the guest
> APP is stopped.
> Also QEMU will wait for vhost-user backend reply because GET_VRING_BASE
> is synchronous message.
> Because of above, I guess virtio APP can wait for vhost-backend
> finalization.
>
>> After that, not only the guest memory pointed by vring entry but also the 
>> vring itself isn't usable any more.
>> The memory for vring or pointed by vring entry might be used by other APPs.
> I agree.
>
>> This will crash guest(rather than the vhost, do you agree?).
> I guess we cannot assume how the freed memory is used.
> In some cases, a new APP still works, but vhost backend can access
> inconsistency vring structure.
> In the case vhost backend could receive illegal packets.
> For example, avail_idx value might be changed to be 0x by a new APP.
> (I am not sure RX/TX functions can handle such a value correctly)
>
> Anyway, my point is if we can finalize vhost backend correctly, we only
> need to take care of crashing case.
> (If so, it's very nice :))
> So let's make sure whether virtio APP can wait for finalization, or not.
> I am thinking how to do it now.
>

I added sleep() like below.

--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -300,7 +300,10 @@ static void virtio_ioport_write(void *opaque,
uint32_t addr, uint32_t val)
 virtio_pci_stop_ioeventfd(proxy);
 }

 virtio_set_status(vdev, val & 0xFF);
+if (val == 0)
+sleep(10);

 if (val & VIRTIO_CONFIG_S_DRIVER_OK) {
 virtio_pci_start_ioeventfd(proxy);

When I type 'dpdk_nic_bind.py' to cause GET_VRING_BASE, this command
takes 10 seconds to be finished.
So we can assume that virtio APP is able to wait for finalization of
vhost backend.

Thanks,
Tetsuya

>> If you mean this issue, I think we have no solution but one walk around: 
>> keep the huge page files of crashed app, and 
>> bind virtio to igb_uio and then delete the huge page files.
> Yes I agree.
> If the virtio APP is crashed, this will be a solution.
>
> Thanks,
> Tetsuya
>
>> In our implementation, when vhost sends the message,  we will call the 
>> destroy_device provided by the vSwitch to ask the
>> vSwitch to stop processing the vring, but this willn't solve the issue I 
>> mention above, because the virtio APP in guest will n't 
>> wait us.
>>
>> Could you explain a bit more? Is it the same issue?
>>
>>
>> -huawei
>>
>>
>>
>




[dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support

2014-12-17 Thread Tetsuya Mukawa
(2014/12/17 10:06), Xie, Huawei wrote:
>>> +{
>>> +   struct virtio_net *dev = get_device(ctx);
>>> +
>>> +   /* We have to stop the queue (virtio) if it is running. */
>>> +   if (dev->flags & VIRTIO_DEV_RUNNING)
>>> +   notify_ops->destroy_device(dev);
>> I have an one concern about finalization of vrings.
>> Can vhost-backend stop accessing RX/TX to the vring before replying to
>> this message?
>>
>> QEMU sends this message when virtio-net device is finalized by
>> virtio-net driver on the guest.
>> After finalization, memories used by the vring will be freed by
>> virtio-net driver, because these memories are allocated by virtio-net
>> driver.
>> Because of this, I guess vhost-backend must stop accessing to vring
>> before replying to this message.
>>
>> I am not sure what is a good way to stop accessing.
>> One idea is adding a condition checking when rte_vhost_dequeue_burst()
>> and rte_vhost_enqueue_burst() is called.
>> Anyway we probably need to wait for stopping access before replying.
>>
>> Thanks,
>> Tetsuya
>>
> I think we have discussed the similar question.

Sorry, probably I might not be able to got your point correctly at the
former email.

> It is actually the same issue whether the virtio APP in guest is crashed, or 
> is finalized.

I guess when the APP is finalized correctly, we can have a solution.
Could you please read comment I wrote later?

> The virtio APP will only write to the STATUS register without waiting/syncing 
> to vhost backend.

Yes, virtio APP only write to the STATUS register. I agree with it.

When the register is written by guest, KVM will catch it, and the
context will be change to QEMU. And QEMU will works like below.
(Also while doing following steps, guest is waiting because the context
is in QEMU)

Could you please see below with latest QEMU code?
1. virtio_ioport_write() [hw/virtio/virtio-pci.c] <= virtio APP will
wait for replying of this function.
2. virtio_set_status() [hw/virtio/virtio.c]
3. virtio_net_set_status() [hw/net/virtio-net.c]
4. virtio_net_vhost_status() [hw/net/virtio-net.c]
5. vhost_net_stop() [hw/net/vhost_net.c]
6. vhost_net_stop_one() [hw/net/vhost_net.c]
7. vhost_dev_stop() [hw/virtio/vhost.c]
8. vhost_virtqueue_stop() [hw/virtio/vhost.c]
9. vhost_user_call() [virtio/vhost-user.c]
10. VHOST_USER_GET_VRING_BASE message is sent to backend. And waiting
for backend reply.

When the vhost-user backend receives GET_VRING_BASE, I guess the guest
APP is stopped.
Also QEMU will wait for vhost-user backend reply because GET_VRING_BASE
is synchronous message.
Because of above, I guess virtio APP can wait for vhost-backend
finalization.

> After that, not only the guest memory pointed by vring entry but also the 
> vring itself isn't usable any more.
> The memory for vring or pointed by vring entry might be used by other APPs.

I agree.

> This will crash guest(rather than the vhost, do you agree?).

I guess we cannot assume how the freed memory is used.
In some cases, a new APP still works, but vhost backend can access
inconsistency vring structure.
In the case vhost backend could receive illegal packets.
For example, avail_idx value might be changed to be 0x by a new APP.
(I am not sure RX/TX functions can handle such a value correctly)

Anyway, my point is if we can finalize vhost backend correctly, we only
need to take care of crashing case.
(If so, it's very nice :))
So let's make sure whether virtio APP can wait for finalization, or not.
I am thinking how to do it now.


> If you mean this issue, I think we have no solution but one walk around: keep 
> the huge page files of crashed app, and 
> bind virtio to igb_uio and then delete the huge page files.

Yes I agree.
If the virtio APP is crashed, this will be a solution.

Thanks,
Tetsuya

>
> In our implementation, when vhost sends the message,  we will call the 
> destroy_device provided by the vSwitch to ask the
> vSwitch to stop processing the vring, but this willn't solve the issue I 
> mention above, because the virtio APP in guest will n't 
> wait us.
>
> Could you explain a bit more? Is it the same issue?
>
>
> -huawei
>
>
>




[dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support

2014-12-17 Thread Xie, Huawei
> > +{
> > +   struct virtio_net *dev = get_device(ctx);
> > +
> > +   /* We have to stop the queue (virtio) if it is running. */
> > +   if (dev->flags & VIRTIO_DEV_RUNNING)
> > +   notify_ops->destroy_device(dev);
> 
> I have an one concern about finalization of vrings.
> Can vhost-backend stop accessing RX/TX to the vring before replying to
> this message?
> 
> QEMU sends this message when virtio-net device is finalized by
> virtio-net driver on the guest.
> After finalization, memories used by the vring will be freed by
> virtio-net driver, because these memories are allocated by virtio-net
> driver.
> Because of this, I guess vhost-backend must stop accessing to vring
> before replying to this message.
> 
> I am not sure what is a good way to stop accessing.
> One idea is adding a condition checking when rte_vhost_dequeue_burst()
> and rte_vhost_enqueue_burst() is called.
> Anyway we probably need to wait for stopping access before replying.
> 
> Thanks,
> Tetsuya
> 
I think we have discussed the similar question.
It is actually the same issue whether the virtio APP in guest is crashed, or is 
finalized.
The virtio APP will only write to the STATUS register without waiting/syncing 
to vhost backend.
After that, not only the guest memory pointed by vring entry but also the vring 
itself isn't usable any more.
The memory for vring or pointed by vring entry might be used by other APPs.
This will crash guest(rather than the vhost, do you agree?).

If you mean this issue, I think we have no solution but one walk around: keep 
the huge page files of crashed app, and 
bind virtio to igb_uio and then delete the huge page files.

In our implementation, when vhost sends the message,  we will call the 
destroy_device provided by the vSwitch to ask the
vSwitch to stop processing the vring, but this willn't solve the issue I 
mention above, because the virtio APP in guest will n't 
wait us.

Could you explain a bit more? Is it the same issue?


-huawei





[dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support

2014-12-16 Thread Tetsuya Mukawa
(2014/12/11 6:37), Huawei Xie wrote:
> vhost-user support
>
>
> Signed-off-by: Huawei Xie 
> ---
>  lib/librte_vhost/Makefile |   5 +-
>  lib/librte_vhost/vhost-net.h  |   4 +
>  lib/librte_vhost/vhost_cuse/virtio-net-cdev.c |   9 +
>  lib/librte_vhost/vhost_user/vhost-net-user.c  | 422 
> ++
>  lib/librte_vhost/vhost_user/vhost-net-user.h  | 108 +++
>  lib/librte_vhost/vhost_user/virtio-net-user.c | 199 
>  lib/librte_vhost/vhost_user/virtio-net-user.h |  48 +++
>  lib/librte_vhost/virtio-net.c |  16 +-
>  lib/librte_vhost/virtio-net.h |  43 +++
>  9 files changed, 842 insertions(+), 12 deletions(-)
>  create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.c
>  create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.h
>  create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.c
>  create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.h
>  create mode 100644 lib/librte_vhost/virtio-net.h
>
> diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
> index e0d0ef6..b2f14a0 100644
> --- a/lib/librte_vhost/Makefile
> +++ b/lib/librte_vhost/Makefile
> @@ -34,10 +34,11 @@ include $(RTE_SDK)/mk/rte.vars.mk
>  # library name
>  LIB = librte_vhost.a
>  
> -CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -I vhost_cuse -O3 
> -D_FILE_OFFSET_BITS=64 -lfuse
> +CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -I vhost_cuse -I vhost_user -O3 
> -D_FILE_OFFSET_BITS=64 -lfuse
>  LDFLAGS += -lfuse
>  # all source are stored in SRCS-y
> -SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost_cuse/vhost-net-cdev.c 
> vhost_cuse/virtio-net-cdev.c virtio-net.c vhost_rxtx.c
> +#SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost_cuse/vhost-net-cdev.c 
> vhost_cuse/virtio-net-cdev.c virtio-net.c vhost_rxtx.c
> +SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost_user/vhost-net-user.c 
> vhost_user/virtio-net-user.c vhost_user/fd_man.c virtio-net.c vhost_rxtx.c
>  
>  # install includes
>  SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_virtio_net.h
> diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h
> index f7e96fd..f9ec40b 100644
> --- a/lib/librte_vhost/vhost-net.h
> +++ b/lib/librte_vhost/vhost-net.h
> @@ -41,8 +41,12 @@
>  
>  #include 
>  
> +#include "rte_virtio_net.h"
> +
>  #define VHOST_MEMORY_MAX_NREGIONS 8
>  
> +extern struct vhost_net_device_ops const *ops;
> +
>  /* Macros for printing using RTE_LOG */
>  #define RTE_LOGTYPE_VHOST_CONFIG RTE_LOGTYPE_USER1
>  #define RTE_LOGTYPE_VHOST_DATA   RTE_LOGTYPE_USER1
> diff --git a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c 
> b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
> index edcbc10..8ac3360 100644
> --- a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
> +++ b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
> @@ -268,6 +268,7 @@ cuse_set_mem_table(struct vhost_device_ctx ctx,
>   struct vhost_memory_region *mem_regions = (void *)(uintptr_t)
>   ((uint64_t)(uintptr_t)mem_regions_addr + size);
>   uint64_t base_address = 0, mapped_address, mapped_size;
> + struct virtio_dev *dev;
>  
>   for (idx = 0; idx < nregions; idx++) {
>   regions[idx].guest_phys_address =
> @@ -335,6 +336,14 @@ cuse_set_mem_table(struct vhost_device_ctx ctx,
>   regions[idx].guest_phys_address;
>   }
>  
> + dev = get_device(ctx);
> + if (dev && dev->mem && dev->mmaped_address) {
> + munmap((void *)(uintptr_t)dev->mmaped_address,
> + (size_t)dev->mmaped_size);
> + free(dev->mem);
> + dev->mem = NULL;
> + }
> +
>   ops->set_mem_table(ctx, ®ions[0], valid_regions);
>   return 0;
>  }
> diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c 
> b/lib/librte_vhost/vhost_user/vhost-net-user.c
> new file mode 100644
> index 000..841d7e6
> --- /dev/null
> +++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
> @@ -0,0 +1,422 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + * * Redistributions of source code must retain the above copyright
> + *   notice, this list of conditions and the following disclaimer.
> + * * Redistributions in binary form must reproduce the above copyright
> + *   notice, this list of conditions and the following disclaimer in
> + *   the documentation and/or other materials provided with the
> + *   distribution.
> + * * Neither the name of Intel Corporation nor the names of its
> + *   contributors may be used to endorse or promote products derived
> + *   from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY E

[dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support

2014-12-12 Thread Linhaifeng


On 2014/12/12 1:13, Xie, Huawei wrote:
>>
>> Only support one vhost-user port ?
> 
> Do you mean vhost server by "port"?
> If that is the case, yes, now only one vhost server is supported for multiple 
> virtio devices.
> As stated in the cover letter, we have requirement and plan for multiple 
> server support,
> though I am not sure if it is absolutely necessary.
> 
I think if you also want to suport client mode (qemu as server) is necessary to 
implement multiple server.

>>
>> Can you mmap the region if gpa is 0? When i run VM with two numa node (qemu
>> will create two hugepage file) found that always failed to mmap with the 
>> region
>> which gpa is 0.
> 
> Current implementation doesn't assume there is only one huge page file to 
> back the guest memory.
> It maps every region using the fd of that region. 
> Could you please paste your guest VM command line here?
> 
>>
>> BTW can we ensure the memory regions cover with all the memory of hugepage
>> for VM?
> 
> I think so, because virtio devices could use any normal guest memory, but we 
> needn't ensure that.
> We only need to map the region passed to us from qemu vhost, which should be 
> enough to translate
> the GPA in vring from virtio in guest, otherwise it is the bug of qemu vhost.
> 
> 

-- 
Regards,
Haifeng



[dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support

2014-12-11 Thread Xie, Huawei


> -Original Message-
> From: Xie, Huawei
> Sent: Thursday, December 11, 2014 10:13 AM
> To: 'Linhaifeng'; dev at dpdk.org
> Cc: haifeng.lin at intel.com
> Subject: RE: [dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user
> support
> 
> >
> > Only support one vhost-user port ?
> 
> Do you mean vhost server by "port"?
> If that is the case, yes, now only one vhost server is supported for multiple 
> virtio
> devices.
> As stated in the cover letter, we have requirement and plan for multiple 
> server
> support,
> though I am not sure if it is absolutely necessary.
> 
> >
> > Can you mmap the region if gpa is 0? When i run VM with two numa node
> (qemu
> > will create two hugepage file) found that always failed to mmap with the
> region
> > which gpa is 0.
> 
> Current implementation doesn't assume there is only one huge page file to back
> the guest memory.
> It maps every region using the fd of that region.
> Could you please paste your guest VM command line here?
> 
> >
> > BTW can we ensure the memory regions cover with all the memory of
> hugepage
> > for VM?
> 
> I think so, because virtio devices could use any normal guest memory, but we
> needn't ensure that.
> We only need to map the region passed to us from qemu vhost, which should be
> enough to translate
> the GPA in vring from virtio in guest, otherwise it is the bug of qemu vhost.

I see your post to qemu mailing list. Would you mind I paste here?
Qemu use two 1GB huge pages files to map the guest 2GB memory, and indeed we 
get "2GB" memory region.

The problem is the all the 2GB memory maps to the first huge page file 
node0.MvcPyi.
Seems a bug.


qemu command:
-m 2048 -smp 2,sockets=2,cores=1,threads=1
-object 
memory-backend-file,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu,share=on,size=1024M,id=ram-node0
 -numa node,nodeid=0,cpus=0,memdev=ram-node0
-object 
memory-backend-file,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu,share=on,size=1024M,id=ram-node1
 -numa node,nodeid=1,cpus=1,memdev=ram-node1


memory regions:
gpa = 0xC
size = 2146697216
ua = 0x2acc
offset = 786432

gpa = 0x0
size = 655360
ua = 0x2ac0
offset = 0

hugepage:
cat /proc/pidof qemu/maps
2ac0-2aaaeac0 rw-s  00:18 10357788   
/dev/hugepages/libvirt/qemu/qemu_back_mem._objects_ram-node0.MvcPyi (deleted)
2aaaeac0-2aab2ac0 rw-s  00:18 10357789   
/dev/hugepages/libvirt/qemu/qemu_back_mem._objects_ram-node1.tjAVin (deleted)

The memory size of each region is not match to the size of each hugepage 
file,is this ok?How does vhost-user to mmap all the hugepage?


[dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support

2014-12-11 Thread Xie, Huawei
> 
> Only support one vhost-user port ?

Do you mean vhost server by "port"?
If that is the case, yes, now only one vhost server is supported for multiple 
virtio devices.
As stated in the cover letter, we have requirement and plan for multiple server 
support,
though I am not sure if it is absolutely necessary.

> 
> Can you mmap the region if gpa is 0? When i run VM with two numa node (qemu
> will create two hugepage file) found that always failed to mmap with the 
> region
> which gpa is 0.

Current implementation doesn't assume there is only one huge page file to back 
the guest memory.
It maps every region using the fd of that region. 
Could you please paste your guest VM command line here?

> 
> BTW can we ensure the memory regions cover with all the memory of hugepage
> for VM?

I think so, because virtio devices could use any normal guest memory, but we 
needn't ensure that.
We only need to map the region passed to us from qemu vhost, which should be 
enough to translate
the GPA in vring from virtio in guest, otherwise it is the bug of qemu vhost.


[dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support

2014-12-11 Thread Linhaifeng


On 2014/12/11 5:37, Huawei Xie wrote:
> vhost-user support
> 
> 
> Signed-off-by: Huawei Xie 
> ---
>  lib/librte_vhost/Makefile |   5 +-
>  lib/librte_vhost/vhost-net.h  |   4 +
>  lib/librte_vhost/vhost_cuse/virtio-net-cdev.c |   9 +
>  lib/librte_vhost/vhost_user/vhost-net-user.c  | 422 
> ++
>  lib/librte_vhost/vhost_user/vhost-net-user.h  | 108 +++
>  lib/librte_vhost/vhost_user/virtio-net-user.c | 199 
>  lib/librte_vhost/vhost_user/virtio-net-user.h |  48 +++
>  lib/librte_vhost/virtio-net.c |  16 +-
>  lib/librte_vhost/virtio-net.h |  43 +++
>  9 files changed, 842 insertions(+), 12 deletions(-)
>  create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.c
>  create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.h
>  create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.c
>  create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.h
>  create mode 100644 lib/librte_vhost/virtio-net.h
> 
> diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
> index e0d0ef6..b2f14a0 100644
> --- a/lib/librte_vhost/Makefile
> +++ b/lib/librte_vhost/Makefile
> @@ -34,10 +34,11 @@ include $(RTE_SDK)/mk/rte.vars.mk
>  # library name
>  LIB = librte_vhost.a
>  
> -CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -I vhost_cuse -O3 
> -D_FILE_OFFSET_BITS=64 -lfuse
> +CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -I vhost_cuse -I vhost_user -O3 
> -D_FILE_OFFSET_BITS=64 -lfuse
>  LDFLAGS += -lfuse
>  # all source are stored in SRCS-y
> -SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost_cuse/vhost-net-cdev.c 
> vhost_cuse/virtio-net-cdev.c virtio-net.c vhost_rxtx.c
> +#SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost_cuse/vhost-net-cdev.c 
> vhost_cuse/virtio-net-cdev.c virtio-net.c vhost_rxtx.c
> +SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost_user/vhost-net-user.c 
> vhost_user/virtio-net-user.c vhost_user/fd_man.c virtio-net.c vhost_rxtx.c
>  
>  # install includes
>  SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_virtio_net.h
> diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h
> index f7e96fd..f9ec40b 100644
> --- a/lib/librte_vhost/vhost-net.h
> +++ b/lib/librte_vhost/vhost-net.h
> @@ -41,8 +41,12 @@
>  
>  #include 
>  
> +#include "rte_virtio_net.h"
> +
>  #define VHOST_MEMORY_MAX_NREGIONS 8
>  
> +extern struct vhost_net_device_ops const *ops;
> +
>  /* Macros for printing using RTE_LOG */
>  #define RTE_LOGTYPE_VHOST_CONFIG RTE_LOGTYPE_USER1
>  #define RTE_LOGTYPE_VHOST_DATA   RTE_LOGTYPE_USER1
> diff --git a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c 
> b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
> index edcbc10..8ac3360 100644
> --- a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
> +++ b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
> @@ -268,6 +268,7 @@ cuse_set_mem_table(struct vhost_device_ctx ctx,
>   struct vhost_memory_region *mem_regions = (void *)(uintptr_t)
>   ((uint64_t)(uintptr_t)mem_regions_addr + size);
>   uint64_t base_address = 0, mapped_address, mapped_size;
> + struct virtio_dev *dev;
>  
>   for (idx = 0; idx < nregions; idx++) {
>   regions[idx].guest_phys_address =
> @@ -335,6 +336,14 @@ cuse_set_mem_table(struct vhost_device_ctx ctx,
>   regions[idx].guest_phys_address;
>   }
>  
> + dev = get_device(ctx);
> + if (dev && dev->mem && dev->mmaped_address) {
> + munmap((void *)(uintptr_t)dev->mmaped_address,
> + (size_t)dev->mmaped_size);
> + free(dev->mem);
> + dev->mem = NULL;
> + }
> +
>   ops->set_mem_table(ctx, ®ions[0], valid_regions);
>   return 0;
>  }
> diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c 
> b/lib/librte_vhost/vhost_user/vhost-net-user.c
> new file mode 100644
> index 000..841d7e6
> --- /dev/null
> +++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
> @@ -0,0 +1,422 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + * * Redistributions of source code must retain the above copyright
> + *   notice, this list of conditions and the following disclaimer.
> + * * Redistributions in binary form must reproduce the above copyright
> + *   notice, this list of conditions and the following disclaimer in
> + *   the documentation and/or other materials provided with the
> + *   distribution.
> + * * Neither the name of Intel Corporation nor the names of its
> + *   contributors may be used to endorse or promote products derived
> + *   from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND

[dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support

2014-12-11 Thread Linhaifeng


On 2014/12/11 5:37, Huawei Xie wrote:
> vhost-user support
> 
> 
> Signed-off-by: Huawei Xie 
> ---
>  lib/librte_vhost/Makefile |   5 +-
>  lib/librte_vhost/vhost-net.h  |   4 +
>  lib/librte_vhost/vhost_cuse/virtio-net-cdev.c |   9 +
>  lib/librte_vhost/vhost_user/vhost-net-user.c  | 422 
> ++
>  lib/librte_vhost/vhost_user/vhost-net-user.h  | 108 +++
>  lib/librte_vhost/vhost_user/virtio-net-user.c | 199 
>  lib/librte_vhost/vhost_user/virtio-net-user.h |  48 +++
>  lib/librte_vhost/virtio-net.c |  16 +-
>  lib/librte_vhost/virtio-net.h |  43 +++
>  9 files changed, 842 insertions(+), 12 deletions(-)
>  create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.c
>  create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.h
>  create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.c
>  create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.h
>  create mode 100644 lib/librte_vhost/virtio-net.h
> 
> diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
> index e0d0ef6..b2f14a0 100644
> --- a/lib/librte_vhost/Makefile
> +++ b/lib/librte_vhost/Makefile
> @@ -34,10 +34,11 @@ include $(RTE_SDK)/mk/rte.vars.mk
>  # library name
>  LIB = librte_vhost.a
>  
> -CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -I vhost_cuse -O3 
> -D_FILE_OFFSET_BITS=64 -lfuse
> +CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -I vhost_cuse -I vhost_user -O3 
> -D_FILE_OFFSET_BITS=64 -lfuse
>  LDFLAGS += -lfuse
>  # all source are stored in SRCS-y
> -SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost_cuse/vhost-net-cdev.c 
> vhost_cuse/virtio-net-cdev.c virtio-net.c vhost_rxtx.c
> +#SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost_cuse/vhost-net-cdev.c 
> vhost_cuse/virtio-net-cdev.c virtio-net.c vhost_rxtx.c
> +SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost_user/vhost-net-user.c 
> vhost_user/virtio-net-user.c vhost_user/fd_man.c virtio-net.c vhost_rxtx.c
>  
>  # install includes
>  SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_virtio_net.h
> diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h
> index f7e96fd..f9ec40b 100644
> --- a/lib/librte_vhost/vhost-net.h
> +++ b/lib/librte_vhost/vhost-net.h
> @@ -41,8 +41,12 @@
>  
>  #include 
>  
> +#include "rte_virtio_net.h"
> +
>  #define VHOST_MEMORY_MAX_NREGIONS 8
>  
> +extern struct vhost_net_device_ops const *ops;
> +
>  /* Macros for printing using RTE_LOG */
>  #define RTE_LOGTYPE_VHOST_CONFIG RTE_LOGTYPE_USER1
>  #define RTE_LOGTYPE_VHOST_DATA   RTE_LOGTYPE_USER1
> diff --git a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c 
> b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
> index edcbc10..8ac3360 100644
> --- a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
> +++ b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
> @@ -268,6 +268,7 @@ cuse_set_mem_table(struct vhost_device_ctx ctx,
>   struct vhost_memory_region *mem_regions = (void *)(uintptr_t)
>   ((uint64_t)(uintptr_t)mem_regions_addr + size);
>   uint64_t base_address = 0, mapped_address, mapped_size;
> + struct virtio_dev *dev;
>  
>   for (idx = 0; idx < nregions; idx++) {
>   regions[idx].guest_phys_address =
> @@ -335,6 +336,14 @@ cuse_set_mem_table(struct vhost_device_ctx ctx,
>   regions[idx].guest_phys_address;
>   }
>  
> + dev = get_device(ctx);
> + if (dev && dev->mem && dev->mmaped_address) {
> + munmap((void *)(uintptr_t)dev->mmaped_address,
> + (size_t)dev->mmaped_size);
> + free(dev->mem);
> + dev->mem = NULL;
> + }
> +
>   ops->set_mem_table(ctx, ®ions[0], valid_regions);
>   return 0;
>  }
> diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c 
> b/lib/librte_vhost/vhost_user/vhost-net-user.c
> new file mode 100644
> index 000..841d7e6
> --- /dev/null
> +++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
> @@ -0,0 +1,422 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + * * Redistributions of source code must retain the above copyright
> + *   notice, this list of conditions and the following disclaimer.
> + * * Redistributions in binary form must reproduce the above copyright
> + *   notice, this list of conditions and the following disclaimer in
> + *   the documentation and/or other materials provided with the
> + *   distribution.
> + * * Neither the name of Intel Corporation nor the names of its
> + *   contributors may be used to endorse or promote products derived
> + *   from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND

[dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support

2014-12-11 Thread Huawei Xie
vhost-user support


Signed-off-by: Huawei Xie 
---
 lib/librte_vhost/Makefile |   5 +-
 lib/librte_vhost/vhost-net.h  |   4 +
 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c |   9 +
 lib/librte_vhost/vhost_user/vhost-net-user.c  | 422 ++
 lib/librte_vhost/vhost_user/vhost-net-user.h  | 108 +++
 lib/librte_vhost/vhost_user/virtio-net-user.c | 199 
 lib/librte_vhost/vhost_user/virtio-net-user.h |  48 +++
 lib/librte_vhost/virtio-net.c |  16 +-
 lib/librte_vhost/virtio-net.h |  43 +++
 9 files changed, 842 insertions(+), 12 deletions(-)
 create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.c
 create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.h
 create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.c
 create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.h
 create mode 100644 lib/librte_vhost/virtio-net.h

diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
index e0d0ef6..b2f14a0 100644
--- a/lib/librte_vhost/Makefile
+++ b/lib/librte_vhost/Makefile
@@ -34,10 +34,11 @@ include $(RTE_SDK)/mk/rte.vars.mk
 # library name
 LIB = librte_vhost.a

-CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -I vhost_cuse -O3 -D_FILE_OFFSET_BITS=64 
-lfuse
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -I vhost_cuse -I vhost_user -O3 
-D_FILE_OFFSET_BITS=64 -lfuse
 LDFLAGS += -lfuse
 # all source are stored in SRCS-y
-SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost_cuse/vhost-net-cdev.c 
vhost_cuse/virtio-net-cdev.c virtio-net.c vhost_rxtx.c
+#SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost_cuse/vhost-net-cdev.c 
vhost_cuse/virtio-net-cdev.c virtio-net.c vhost_rxtx.c
+SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost_user/vhost-net-user.c 
vhost_user/virtio-net-user.c vhost_user/fd_man.c virtio-net.c vhost_rxtx.c

 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_virtio_net.h
diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h
index f7e96fd..f9ec40b 100644
--- a/lib/librte_vhost/vhost-net.h
+++ b/lib/librte_vhost/vhost-net.h
@@ -41,8 +41,12 @@

 #include 

+#include "rte_virtio_net.h"
+
 #define VHOST_MEMORY_MAX_NREGIONS 8

+extern struct vhost_net_device_ops const *ops;
+
 /* Macros for printing using RTE_LOG */
 #define RTE_LOGTYPE_VHOST_CONFIG RTE_LOGTYPE_USER1
 #define RTE_LOGTYPE_VHOST_DATA   RTE_LOGTYPE_USER1
diff --git a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c 
b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
index edcbc10..8ac3360 100644
--- a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
+++ b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
@@ -268,6 +268,7 @@ cuse_set_mem_table(struct vhost_device_ctx ctx,
struct vhost_memory_region *mem_regions = (void *)(uintptr_t)
((uint64_t)(uintptr_t)mem_regions_addr + size);
uint64_t base_address = 0, mapped_address, mapped_size;
+   struct virtio_dev *dev;

for (idx = 0; idx < nregions; idx++) {
regions[idx].guest_phys_address =
@@ -335,6 +336,14 @@ cuse_set_mem_table(struct vhost_device_ctx ctx,
regions[idx].guest_phys_address;
}

+   dev = get_device(ctx);
+   if (dev && dev->mem && dev->mmaped_address) {
+   munmap((void *)(uintptr_t)dev->mmaped_address,
+   (size_t)dev->mmaped_size);
+   free(dev->mem);
+   dev->mem = NULL;
+   }
+
ops->set_mem_table(ctx, ®ions[0], valid_regions);
return 0;
 }
diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c 
b/lib/librte_vhost/vhost_user/vhost-net-user.c
new file mode 100644
index 000..841d7e6
--- /dev/null
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
@@ -0,0 +1,422 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY