Re: [RFC PATCH 0/9] vhost-nvme: new qemu nvme backend using nvme target

2015-12-01 Thread Paolo Bonzini

> What do you think about virtio-nvme+vhost-nvme?

What would be the advantage over virtio-blk?  Multiqueue is not supported
by QEMU but it's already supported by Linux (commit 6a27b656fc).

To me, the advantage of nvme is that it provides more than decent performance on
unmodified Windows guests, and thanks to your vendor extension can be used
on Linux as well with speeds comparable to virtio-blk.  So it's potentially
a very good choice for a cloud provider that wants to support Windows guests
(together with e.g. a fast SAS emulated controller to replace virtio-scsi,
and emulated igb or ixgbe to replace virtio-net).

Which features are supported by NVMe and not virtio-blk?

Paolo

> I also have patch for vritio-nvme:
> https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=nvme-split/virtio
> 
> Just need to change vhost-nvme to work with it.
> 
> > 
> > Paolo
> > 
> > > Still tuning.
> 
> 
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 0/9] vhost-nvme: new qemu nvme backend using nvme target

2015-12-01 Thread Ming Lin
On Tue, 2015-12-01 at 11:59 -0500, Paolo Bonzini wrote:
> > What do you think about virtio-nvme+vhost-nvme?
> 
> What would be the advantage over virtio-blk?  Multiqueue is not supported
> by QEMU but it's already supported by Linux (commit 6a27b656fc).

I expect performance would be better.

Seems google cloud VM uses both nvme and virtio-scsi. Not sure if
virtio-blk is also used.
https://cloud.google.com/compute/docs/disks/local-ssd#runscript

> 
> To me, the advantage of nvme is that it provides more than decent performance 
> on
> unmodified Windows guests, and thanks to your vendor extension can be used
> on Linux as well with speeds comparable to virtio-blk.  So it's potentially
> a very good choice for a cloud provider that wants to support Windows guests
> (together with e.g. a fast SAS emulated controller to replace virtio-scsi,
> and emulated igb or ixgbe to replace virtio-net).

vhost-nvme patches are learned from rts-megasas, which could possibly be
a fast SAS emulated controller.
https://github.com/Datera/rts-megasas

> 
> Which features are supported by NVMe and not virtio-blk?

Rob (CCed),

Would you share whether google uses any NVMe specific feature?

Thanks.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 0/9] vhost-nvme: new qemu nvme backend using nvme target

2015-12-01 Thread Paolo Bonzini


On 01/12/2015 00:20, Ming Lin wrote:
> qemu-nvme: 148MB/s
> vhost-nvme + google-ext: 230MB/s
> qemu-nvme + google-ext + eventfd: 294MB/s
> virtio-scsi: 296MB/s
> virtio-blk: 344MB/s
> 
> "vhost-nvme + google-ext" didn't get good enough performance.

I'd expect it to be on par of qemu-nvme with ioeventfd but the question
is: why should it be better?  For vhost-net, the answer is that more
zerocopy can be done if you put the data path in the kernel.

But qemu-nvme is already using io_submit for the data path, perhaps
there's not much to gain from vhost-nvme...

Paolo

> Still tuning.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 0/9] vhost-nvme: new qemu nvme backend using nvme target

2015-12-01 Thread Ming Lin
On Tue, 2015-12-01 at 17:02 +0100, Paolo Bonzini wrote:
> 
> On 01/12/2015 00:20, Ming Lin wrote:
> > qemu-nvme: 148MB/s
> > vhost-nvme + google-ext: 230MB/s
> > qemu-nvme + google-ext + eventfd: 294MB/s
> > virtio-scsi: 296MB/s
> > virtio-blk: 344MB/s
> > 
> > "vhost-nvme + google-ext" didn't get good enough performance.
> 
> I'd expect it to be on par of qemu-nvme with ioeventfd but the question
> is: why should it be better?  For vhost-net, the answer is that more
> zerocopy can be done if you put the data path in the kernel.
> 
> But qemu-nvme is already using io_submit for the data path, perhaps
> there's not much to gain from vhost-nvme...

What do you think about virtio-nvme+vhost-nvme?
I also have patch for vritio-nvme:
https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=nvme-split/virtio

Just need to change vhost-nvme to work with it.

> 
> Paolo
> 
> > Still tuning.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 0/9] vhost-nvme: new qemu nvme backend using nvme target

2015-11-30 Thread Ming Lin
On Mon, 2015-11-23 at 15:14 +0100, Paolo Bonzini wrote:
> 
> On 23/11/2015 09:17, Ming Lin wrote:
> > On Sat, 2015-11-21 at 14:11 +0100, Paolo Bonzini wrote:
> >>
> >> On 20/11/2015 01:20, Ming Lin wrote:
> >>> One improvment could be to use google's NVMe vendor extension that
> >>> I send in another thread, aslo here:
> >>> https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=nvme-google-ext
> >>>
> >>> Qemu side:
> >>> http://www.minggr.net/cgit/cgit.cgi/qemu/log/?h=vhost-nvme.0
> >>> Kernel side also here:
> >>> https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=vhost-nvme.0
> >>
> >> How much do you get with vhost-nvme plus vendor extension, compared to
> >> 190 MB/s for QEMU?
> > 
> > There is still some bug. I'll update.
> 
> Sure.

Fixed it after Thanksgiving holiday :)
https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=vhost-nvme.0-ext

Combined with previous test results(lower to higher):

qemu-nvme: 148MB/s
vhost-nvme + google-ext: 230MB/s
qemu-nvme + google-ext + eventfd: 294MB/s
virtio-scsi: 296MB/s
virtio-blk: 344MB/s

"vhost-nvme + google-ext" didn't get good enough performance.
Still tuning.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 0/9] vhost-nvme: new qemu nvme backend using nvme target

2015-11-25 Thread Paolo Bonzini


On 25/11/2015 19:51, Ming Lin wrote:
> > Do you still have a blk_set_aio_context somewhere?  I'm losing track of
> > the changes.
> 
> No.

You'll need it.  That's what causes your error.

> BTW, I'm not sure about qemu upstream policy.
> Do I need to first make the kernel side patch upstream?
> https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=nvme-google-ext

No problem, there's time.  QEMU is in freeze right now, and the 2.6
cycle will last until March or so.

Paolo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 0/9] vhost-nvme: new qemu nvme backend using nvme target

2015-11-25 Thread Ming Lin
On Wed, 2015-11-25 at 12:27 +0100, Paolo Bonzini wrote:
> Do you still have a blk_set_aio_context somewhere?  I'm losing track of
> the changes.

No.

> 
> In any case, I think using a separate I/O thread is a bit premature,
> except for benchmarking.  In the meanwhile I think the best option is to
> post a two-patch series with the vendor extension and the ioeventfd
> respectively.

I'll post.

BTW, I'm not sure about qemu upstream policy.
Do I need to first make the kernel side patch upstream?
https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=nvme-google-ext

I think the kernel side patch will take a bit longer time to upstream.

> 
> Paolo


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 0/9] vhost-nvme: new qemu nvme backend using nvme target

2015-11-25 Thread Paolo Bonzini


On 24/11/2015 20:25, Ming Lin wrote:
> On Tue, 2015-11-24 at 11:51 +0100, Paolo Bonzini wrote:
>>
>> On 24/11/2015 08:27, Ming Lin wrote:
>>> handle_notify (qemu/hw/block/dataplane/virtio-blk.c:126)
>>> aio_dispatch (qemu/aio-posix.c:329)
>>> aio_poll (qemu/aio-posix.c:474)
>>> iothread_run (qemu/iothread.c:45)
>>> start_thread (pthread_create.c:312)
>>> /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)
>>>
>>> I think I'll have a "nvme_dev_notify" similar as "handle_notify"
>>>
>>> static void nvme_dev_notify(EventNotifier *e)
>>> {
>>> 
>>> }
>>>
>>> But then how can I know this notify is for cq or sq?
>>
>> virtio-blk has a single queue, so it has a single EventNotifier.  Your
>> code using multiple EventNotifiers is fine, you can use
>> aio_set_event_notifier multiple times on the same iothread.
> 
> I feel below patch is close to right.
> But I got "Co-routine re-entered recursively".
> 
> Program received signal SIGABRT, Aborted.
> 0x74a45cc9 in __GI_raise (sig=sig@entry=6) at 
> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> 56../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
> (gdb) bt
> #0  0x74a45cc9 in __GI_raise (sig=sig@entry=6) at 
> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> #1  0x74a490d8 in __GI_abort () at abort.c:89
> #2  0x55b910d1 in qemu_coroutine_enter (co=0x577a9ce0, 
> opaque=0x0) at /home/mlin/qemu/util/qemu-coroutine.c:111
> #3  0x55b282e0 in bdrv_co_io_em_complete (opaque=0x7fffd94e4a30, 
> ret=0) at /home/mlin/qemu/block/io.c:2282
> #4  0x55ab8ba4 in thread_pool_completion_bh (opaque=0x57d6d440) 
> at /home/mlin/qemu/thread-pool.c:187
> #5  0x55ab7ab2 in aio_bh_call (bh=0x57648110) at 
> /home/mlin/qemu/async.c:64
> #6  0x55ab7b88 in aio_bh_poll (ctx=0x565b28f0) at 
> /home/mlin/qemu/async.c:92
> #7  0x55acb3b6 in aio_dispatch (ctx=0x565b28f0) at 
> /home/mlin/qemu/aio-posix.c:305
> #8  0x55ab8013 in aio_ctx_dispatch (source=0x565b28f0, 
> callback=0x0, user_data=0x0) at /home/mlin/qemu/async.c:231
> #9  0x7575ee04 in g_main_context_dispatch () from 
> /lib/x86_64-linux-gnu/libglib-2.0.so.0
> #10 0x55ac8b35 in glib_pollfds_poll () at 
> /home/mlin/qemu/main-loop.c:211
> #11 0x55ac8c36 in os_host_main_loop_wait (timeout=0) at 
> /home/mlin/qemu/main-loop.c:256
> #12 0x55ac8cff in main_loop_wait (nonblocking=0) at 
> /home/mlin/qemu/main-loop.c:504
> #13 0x558a41d4 in main_loop () at /home/mlin/qemu/vl.c:1920
> #14 0x558ac28a in main (argc=21, argv=0x7fffe4a8, 
> envp=0x7fffe558) at /home/mlin/qemu/vl.c:4681
> (gdb)
> 
> Would you help to take a look?
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index f27fd35..f542740 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -28,6 +28,8 @@
>  #include "sysemu/sysemu.h"
>  #include "qapi/visitor.h"
>  #include "sysemu/block-backend.h"
> +#include "sysemu/iothread.h"
> +#include "qom/object_interfaces.h"
>  
>  #include "nvme.h"
>  
> @@ -549,7 +551,10 @@ static void nvme_cq_notifier(EventNotifier *e)
>  container_of(e, NvmeCQueue, notifier);
>  
>  event_notifier_test_and_clear(>notifier);
> +
> +aio_context_acquire(cq->ctrl->ctx);
>  nvme_post_cqes(cq);
> +aio_context_release(cq->ctrl->ctx);

This is not yet needed in upstream.

>  }
>  
>  static void nvme_init_cq_eventfd(NvmeCQueue *cq)
> @@ -558,9 +563,12 @@ static void nvme_init_cq_eventfd(NvmeCQueue *cq)
>  uint16_t offset = (cq->cqid*2+1) * (4 << NVME_CAP_DSTRD(n->bar.cap));
>  
>  event_notifier_init(>notifier, 0);
> -event_notifier_set_handler(>notifier, nvme_cq_notifier);
>  memory_region_add_eventfd(>iomem,
>  0x1000 + offset, 4, false, 0, >notifier);
> +aio_context_acquire(n->ctx);
> +aio_set_event_notifier(n->ctx, >notifier, false,

This should be true, but shouldn't affect your bug.

> +   nvme_cq_notifier);
> +aio_context_release(n->ctx);
>  }
>  
>  static void nvme_sq_notifier(EventNotifier *e)
> @@ -569,7 +577,9 @@ static void nvme_sq_notifier(EventNotifier *e)
>  container_of(e, NvmeSQueue, notifier);
>  
>  event_notifier_test_and_clear(>notifier);
> +aio_context_acquire(sq->ctrl->ctx);
>  nvme_process_sq(sq);
> +aio_context_release(sq->ctrl->ctx);

Same as above.

>  }
>  
>  static void nvme_init_sq_eventfd(NvmeSQueue *sq)
> @@ -578,9 +588,22 @@ static void nvme_init_sq_eventfd(NvmeSQueue *sq)
>  uint16_t offset = sq->sqid * 2 * (4 << NVME_CAP_DSTRD(n->bar.cap));
>  
>  event_notifier_init(>notifier, 0);
> -event_notifier_set_handler(>notifier, nvme_sq_notifier);
>  memory_region_add_eventfd(>iomem,
>  0x1000 + offset, 4, false, 0, >notifier);
> +aio_context_acquire(n->ctx);
> +aio_set_event_notifier(n->ctx, >notifier, false,

Also true.

> +   nvme_sq_notifier);
> +aio_context_release(n->ctx);
> +}

Re: [RFC PATCH 0/9] vhost-nvme: new qemu nvme backend using nvme target

2015-11-24 Thread Ming Lin
On Tue, 2015-11-24 at 11:51 +0100, Paolo Bonzini wrote:
> 
> On 24/11/2015 08:27, Ming Lin wrote:
> > handle_notify (qemu/hw/block/dataplane/virtio-blk.c:126)
> > aio_dispatch (qemu/aio-posix.c:329)
> > aio_poll (qemu/aio-posix.c:474)
> > iothread_run (qemu/iothread.c:45)
> > start_thread (pthread_create.c:312)
> > /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)
> > 
> > I think I'll have a "nvme_dev_notify" similar as "handle_notify"
> > 
> > static void nvme_dev_notify(EventNotifier *e)
> > {
> > 
> > }
> > 
> > But then how can I know this notify is for cq or sq?
> 
> virtio-blk has a single queue, so it has a single EventNotifier.  Your
> code using multiple EventNotifiers is fine, you can use
> aio_set_event_notifier multiple times on the same iothread.

I feel below patch is close to right.
But I got "Co-routine re-entered recursively".

Program received signal SIGABRT, Aborted.
0x74a45cc9 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
56  ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x74a45cc9 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x74a490d8 in __GI_abort () at abort.c:89
#2  0x55b910d1 in qemu_coroutine_enter (co=0x577a9ce0, opaque=0x0) 
at /home/mlin/qemu/util/qemu-coroutine.c:111
#3  0x55b282e0 in bdrv_co_io_em_complete (opaque=0x7fffd94e4a30, ret=0) 
at /home/mlin/qemu/block/io.c:2282
#4  0x55ab8ba4 in thread_pool_completion_bh (opaque=0x57d6d440) at 
/home/mlin/qemu/thread-pool.c:187
#5  0x55ab7ab2 in aio_bh_call (bh=0x57648110) at 
/home/mlin/qemu/async.c:64
#6  0x55ab7b88 in aio_bh_poll (ctx=0x565b28f0) at 
/home/mlin/qemu/async.c:92
#7  0x55acb3b6 in aio_dispatch (ctx=0x565b28f0) at 
/home/mlin/qemu/aio-posix.c:305
#8  0x55ab8013 in aio_ctx_dispatch (source=0x565b28f0, 
callback=0x0, user_data=0x0) at /home/mlin/qemu/async.c:231
#9  0x7575ee04 in g_main_context_dispatch () from 
/lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x55ac8b35 in glib_pollfds_poll () at 
/home/mlin/qemu/main-loop.c:211
#11 0x55ac8c36 in os_host_main_loop_wait (timeout=0) at 
/home/mlin/qemu/main-loop.c:256
#12 0x55ac8cff in main_loop_wait (nonblocking=0) at 
/home/mlin/qemu/main-loop.c:504
#13 0x558a41d4 in main_loop () at /home/mlin/qemu/vl.c:1920
#14 0x558ac28a in main (argc=21, argv=0x7fffe4a8, 
envp=0x7fffe558) at /home/mlin/qemu/vl.c:4681
(gdb)

Would you help to take a look?

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index f27fd35..f542740 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -28,6 +28,8 @@
 #include "sysemu/sysemu.h"
 #include "qapi/visitor.h"
 #include "sysemu/block-backend.h"
+#include "sysemu/iothread.h"
+#include "qom/object_interfaces.h"
 
 #include "nvme.h"
 
@@ -549,7 +551,10 @@ static void nvme_cq_notifier(EventNotifier *e)
 container_of(e, NvmeCQueue, notifier);
 
 event_notifier_test_and_clear(>notifier);
+
+aio_context_acquire(cq->ctrl->ctx);
 nvme_post_cqes(cq);
+aio_context_release(cq->ctrl->ctx);
 }
 
 static void nvme_init_cq_eventfd(NvmeCQueue *cq)
@@ -558,9 +563,12 @@ static void nvme_init_cq_eventfd(NvmeCQueue *cq)
 uint16_t offset = (cq->cqid*2+1) * (4 << NVME_CAP_DSTRD(n->bar.cap));
 
 event_notifier_init(>notifier, 0);
-event_notifier_set_handler(>notifier, nvme_cq_notifier);
 memory_region_add_eventfd(>iomem,
 0x1000 + offset, 4, false, 0, >notifier);
+aio_context_acquire(n->ctx);
+aio_set_event_notifier(n->ctx, >notifier, false,
+   nvme_cq_notifier);
+aio_context_release(n->ctx);
 }
 
 static void nvme_sq_notifier(EventNotifier *e)
@@ -569,7 +577,9 @@ static void nvme_sq_notifier(EventNotifier *e)
 container_of(e, NvmeSQueue, notifier);
 
 event_notifier_test_and_clear(>notifier);
+aio_context_acquire(sq->ctrl->ctx);
 nvme_process_sq(sq);
+aio_context_release(sq->ctrl->ctx);
 }
 
 static void nvme_init_sq_eventfd(NvmeSQueue *sq)
@@ -578,9 +588,22 @@ static void nvme_init_sq_eventfd(NvmeSQueue *sq)
 uint16_t offset = sq->sqid * 2 * (4 << NVME_CAP_DSTRD(n->bar.cap));
 
 event_notifier_init(>notifier, 0);
-event_notifier_set_handler(>notifier, nvme_sq_notifier);
 memory_region_add_eventfd(>iomem,
 0x1000 + offset, 4, false, 0, >notifier);
+aio_context_acquire(n->ctx);
+aio_set_event_notifier(n->ctx, >notifier, false,
+   nvme_sq_notifier);
+aio_context_release(n->ctx);
+}
+
+static void nvme_init_iothread(NvmeCtrl *n)
+{
+object_initialize(>internal_iothread_obj,
+  sizeof(n->internal_iothread_obj),
+  TYPE_IOTHREAD);
+user_creatable_complete(OBJECT(>internal_iothread_obj), _abort);
+n->iothread = >internal_iothread_obj;
+n->ctx = iothread_get_aio_context(n->iothread);
 }
 
 

Re: [RFC PATCH 0/9] vhost-nvme: new qemu nvme backend using nvme target

2015-11-24 Thread Ming Lin
On Mon, 2015-11-23 at 23:27 -0800, Ming Lin wrote:
> On Mon, 2015-11-23 at 15:14 +0100, Paolo Bonzini wrote:
> > 
> > On 23/11/2015 09:17, Ming Lin wrote:
> > > On Sat, 2015-11-21 at 14:11 +0100, Paolo Bonzini wrote:
> > >>
> > >> On 20/11/2015 01:20, Ming Lin wrote:
> > >>> One improvment could be to use google's NVMe vendor extension that
> > >>> I send in another thread, aslo here:
> > >>> https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=nvme-google-ext
> > >>>
> > >>> Qemu side:
> > >>> http://www.minggr.net/cgit/cgit.cgi/qemu/log/?h=vhost-nvme.0
> > >>> Kernel side also here:
> > >>> https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=vhost-nvme.0
> > >>
> > >> How much do you get with vhost-nvme plus vendor extension, compared to
> > >> 190 MB/s for QEMU?
> > > 
> > > There is still some bug. I'll update.
> > 
> > Sure.
> > 
> > >> Note that in all likelihood, QEMU can actually do better than 190 MB/s,
> > >> and gain more parallelism too, by moving the processing of the
> > >> ioeventfds to a separate thread.  This is similar to
> > >> hw/block/dataplane/virtio-blk.c.
> > >>
> > >> It's actually pretty easy to do.  Even though
> > >> hw/block/dataplane/virtio-blk.c is still using some old APIs, all memory
> > >> access in QEMU is now thread-safe.  I have pending patches for 2.6 that
> > >> cut that file down to a mere 200 lines of code, NVMe would probably be
> > >> about the same.
> > > 
> > > Is there a git tree for your patches?
> > 
> > No, not yet.  I'll post them today or tomorrow, will make sure to Cc you.
> > 
> > > Did you mean some pseduo code as below?
> > > 1. need a iothread for each cq/sq?
> > > 2. need a AioContext for each cq/sq?
> > > 
> > >  hw/block/nvme.c | 32 ++--
> > >  hw/block/nvme.h |  8 
> > >  2 files changed, 38 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > > index f27fd35..fed4827 100644
> > > --- a/hw/block/nvme.c
> > > +++ b/hw/block/nvme.c
> > > @@ -28,6 +28,8 @@
> > >  #include "sysemu/sysemu.h"
> > >  #include "qapi/visitor.h"
> > >  #include "sysemu/block-backend.h"
> > > +#include "sysemu/iothread.h"
> > > +#include "qom/object_interfaces.h"
> > >  
> > >  #include "nvme.h"
> > >  
> > > @@ -558,9 +560,22 @@ static void nvme_init_cq_eventfd(NvmeCQueue *cq)
> > >  uint16_t offset = (cq->cqid*2+1) * (4 << NVME_CAP_DSTRD(n->bar.cap));
> > >  
> > >  event_notifier_init(>notifier, 0);
> > > -event_notifier_set_handler(>notifier, nvme_cq_notifier);
> > >  memory_region_add_eventfd(>iomem,
> > >  0x1000 + offset, 4, false, 0, >notifier);
> > > +
> > > +object_initialize(>internal_iothread_obj,
> > > +  sizeof(cq->internal_iothread_obj),
> > > +  TYPE_IOTHREAD);
> > > +user_creatable_complete(OBJECT(>internal_iothread_obj), 
> > > _abort);
> > 
> > For now, you have to use one iothread for all cq/sq of a single NVMe
> > device; multiqueue block layer is planned for 2.7 or 2.8.  Otherwise
> > yes, it's very close to just these changes.
> 
> Here is the call stack of iothread for virtio-blk-dataplane.
> 
> handle_notify (qemu/hw/block/dataplane/virtio-blk.c:126)
> aio_dispatch (qemu/aio-posix.c:329)
> aio_poll (qemu/aio-posix.c:474)
> iothread_run (qemu/iothread.c:45)
> start_thread (pthread_create.c:312)
> /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)
> 
> I think I'll have a "nvme_dev_notify" similar as "handle_notify"
> 
> static void nvme_dev_notify(EventNotifier *e)
> {
> 
> }
> 
> But then how can I know this notify is for cq or sq?

Did you mean patch as below?
But it doesn't work yet.

 hw/block/nvme.c | 64 +++--
 hw/block/nvme.h |  5 +
 2 files changed, 49 insertions(+), 20 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index f27fd35..a8c9914 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -28,6 +28,8 @@
 #include "sysemu/sysemu.h"
 #include "qapi/visitor.h"
 #include "sysemu/block-backend.h"
+#include "sysemu/iothread.h"
+#include "qom/object_interfaces.h"
 
 #include "nvme.h"
 
@@ -543,42 +545,22 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd 
*cmd, NvmeRequest *req)
 return NVME_SUCCESS;
 }
 
-static void nvme_cq_notifier(EventNotifier *e)
-{
-NvmeCQueue *cq =
-container_of(e, NvmeCQueue, notifier);
-
-event_notifier_test_and_clear(>notifier);
-nvme_post_cqes(cq);
-}
-
 static void nvme_init_cq_eventfd(NvmeCQueue *cq)
 {
 NvmeCtrl *n = cq->ctrl;
 uint16_t offset = (cq->cqid*2+1) * (4 << NVME_CAP_DSTRD(n->bar.cap));
 
 event_notifier_init(>notifier, 0);
-event_notifier_set_handler(>notifier, nvme_cq_notifier);
 memory_region_add_eventfd(>iomem,
 0x1000 + offset, 4, false, 0, >notifier);
 }
 
-static void nvme_sq_notifier(EventNotifier *e)
-{
-NvmeSQueue *sq =
-container_of(e, NvmeSQueue, notifier);
-
-

Re: [RFC PATCH 0/9] vhost-nvme: new qemu nvme backend using nvme target

2015-11-23 Thread Ming Lin
On Sat, 2015-11-21 at 14:11 +0100, Paolo Bonzini wrote:
> 
> On 20/11/2015 01:20, Ming Lin wrote:
> > One improvment could be to use google's NVMe vendor extension that
> > I send in another thread, aslo here:
> > https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=nvme-google-ext
> > 
> > Qemu side:
> > http://www.minggr.net/cgit/cgit.cgi/qemu/log/?h=vhost-nvme.0
> > Kernel side also here:
> > https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=vhost-nvme.0
> 
> How much do you get with vhost-nvme plus vendor extension, compared to
> 190 MB/s for QEMU?

There is still some bug. I'll update.

> 
> Note that in all likelihood, QEMU can actually do better than 190 MB/s,
> and gain more parallelism too, by moving the processing of the
> ioeventfds to a separate thread.  This is similar to
> hw/block/dataplane/virtio-blk.c.
> 
> It's actually pretty easy to do.  Even though
> hw/block/dataplane/virtio-blk.c is still using some old APIs, all memory
> access in QEMU is now thread-safe.  I have pending patches for 2.6 that
> cut that file down to a mere 200 lines of code, NVMe would probably be
> about the same.

Is there a git tree for your patches?

Did you mean some pseduo code as below?
1. need a iothread for each cq/sq?
2. need a AioContext for each cq/sq?

 hw/block/nvme.c | 32 ++--
 hw/block/nvme.h |  8 
 2 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index f27fd35..fed4827 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -28,6 +28,8 @@
 #include "sysemu/sysemu.h"
 #include "qapi/visitor.h"
 #include "sysemu/block-backend.h"
+#include "sysemu/iothread.h"
+#include "qom/object_interfaces.h"
 
 #include "nvme.h"
 
@@ -558,9 +560,22 @@ static void nvme_init_cq_eventfd(NvmeCQueue *cq)
 uint16_t offset = (cq->cqid*2+1) * (4 << NVME_CAP_DSTRD(n->bar.cap));
 
 event_notifier_init(>notifier, 0);
-event_notifier_set_handler(>notifier, nvme_cq_notifier);
 memory_region_add_eventfd(>iomem,
 0x1000 + offset, 4, false, 0, >notifier);
+
+object_initialize(>internal_iothread_obj,
+  sizeof(cq->internal_iothread_obj),
+  TYPE_IOTHREAD);
+user_creatable_complete(OBJECT(>internal_iothread_obj), _abort);
+cq->iothread = >internal_iothread_obj;
+cq->ctx = iothread_get_aio_context(cq->iothread);
+//Question: Need a conf.blk for each cq/sq???
+//blk_set_aio_context(cq->conf->conf.blk, cq->ctx);
+
+aio_context_acquire(cq->ctx);
+aio_set_event_notifier(cq->ctx, >notifier, true,
+   nvme_cq_notifier);
+aio_context_release(cq->ctx);
 }
 
 static void nvme_sq_notifier(EventNotifier *e)
@@ -578,9 +593,22 @@ static void nvme_init_sq_eventfd(NvmeSQueue *sq)
 uint16_t offset = sq->sqid * 2 * (4 << NVME_CAP_DSTRD(n->bar.cap));
 
 event_notifier_init(>notifier, 0);
-event_notifier_set_handler(>notifier, nvme_sq_notifier);
 memory_region_add_eventfd(>iomem,
 0x1000 + offset, 4, false, 0, >notifier);
+
+object_initialize(>internal_iothread_obj,
+  sizeof(sq->internal_iothread_obj),
+  TYPE_IOTHREAD);
+user_creatable_complete(OBJECT(>internal_iothread_obj), _abort);
+sq->iothread = >internal_iothread_obj;
+sq->ctx = iothread_get_aio_context(sq->iothread);
+//Question: Need a conf.blk for each cq/sq???
+//blk_set_aio_context(sq->conf->conf.blk, sq->ctx);
+
+aio_context_acquire(sq->ctx);
+aio_set_event_notifier(sq->ctx, >notifier, true,
+   nvme_sq_notifier);
+aio_context_release(sq->ctx);
 }
 
 static uint16_t nvme_set_db_memory(NvmeCtrl *n, const NvmeCmd *cmd)
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 608f202..171ee0b 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -667,6 +667,10 @@ typedef struct NvmeSQueue {
  * do not go over this value will not result in MMIO writes (but will
  * still write the tail pointer to the "db_addr" location above). */
 uint64_teventidx_addr;
+
+IOThread *iothread;
+IOThread internal_iothread_obj;
+AioContext *ctx;
 EventNotifier notifier;
 } NvmeSQueue;
 
@@ -690,6 +694,10 @@ typedef struct NvmeCQueue {
  * do not go over this value will not result in MMIO writes (but will
  * still write the head pointer to the "db_addr" location above). */
 uint64_teventidx_addr;
+
+IOThread *iothread;
+IOThread internal_iothread_obj;
+AioContext *ctx;
 EventNotifier notifier;
 } NvmeCQueue;
 

> 
> Paolo


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 0/9] vhost-nvme: new qemu nvme backend using nvme target

2015-11-23 Thread Paolo Bonzini


On 23/11/2015 09:17, Ming Lin wrote:
> On Sat, 2015-11-21 at 14:11 +0100, Paolo Bonzini wrote:
>>
>> On 20/11/2015 01:20, Ming Lin wrote:
>>> One improvment could be to use google's NVMe vendor extension that
>>> I send in another thread, aslo here:
>>> https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=nvme-google-ext
>>>
>>> Qemu side:
>>> http://www.minggr.net/cgit/cgit.cgi/qemu/log/?h=vhost-nvme.0
>>> Kernel side also here:
>>> https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=vhost-nvme.0
>>
>> How much do you get with vhost-nvme plus vendor extension, compared to
>> 190 MB/s for QEMU?
> 
> There is still some bug. I'll update.

Sure.

>> Note that in all likelihood, QEMU can actually do better than 190 MB/s,
>> and gain more parallelism too, by moving the processing of the
>> ioeventfds to a separate thread.  This is similar to
>> hw/block/dataplane/virtio-blk.c.
>>
>> It's actually pretty easy to do.  Even though
>> hw/block/dataplane/virtio-blk.c is still using some old APIs, all memory
>> access in QEMU is now thread-safe.  I have pending patches for 2.6 that
>> cut that file down to a mere 200 lines of code, NVMe would probably be
>> about the same.
> 
> Is there a git tree for your patches?

No, not yet.  I'll post them today or tomorrow, will make sure to Cc you.

> Did you mean some pseduo code as below?
> 1. need a iothread for each cq/sq?
> 2. need a AioContext for each cq/sq?
> 
>  hw/block/nvme.c | 32 ++--
>  hw/block/nvme.h |  8 
>  2 files changed, 38 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index f27fd35..fed4827 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -28,6 +28,8 @@
>  #include "sysemu/sysemu.h"
>  #include "qapi/visitor.h"
>  #include "sysemu/block-backend.h"
> +#include "sysemu/iothread.h"
> +#include "qom/object_interfaces.h"
>  
>  #include "nvme.h"
>  
> @@ -558,9 +560,22 @@ static void nvme_init_cq_eventfd(NvmeCQueue *cq)
>  uint16_t offset = (cq->cqid*2+1) * (4 << NVME_CAP_DSTRD(n->bar.cap));
>  
>  event_notifier_init(>notifier, 0);
> -event_notifier_set_handler(>notifier, nvme_cq_notifier);
>  memory_region_add_eventfd(>iomem,
>  0x1000 + offset, 4, false, 0, >notifier);
> +
> +object_initialize(>internal_iothread_obj,
> +  sizeof(cq->internal_iothread_obj),
> +  TYPE_IOTHREAD);
> +user_creatable_complete(OBJECT(>internal_iothread_obj), 
> _abort);

For now, you have to use one iothread for all cq/sq of a single NVMe
device; multiqueue block layer is planned for 2.7 or 2.8.  Otherwise
yes, it's very close to just these changes.

If you use "-object iothread,id=NN" and a iothread property, you can
also use an N:M model with multiple disks attached to the same iothread.
 Defining the iothread property is like

object_property_add_link(obj, "iothread", TYPE_IOTHREAD,
(Object **)>conf.iothread,
qdev_prop_allow_set_link_before_realize,
OBJ_PROP_LINK_UNREF_ON_RELEASE, NULL);

Thanks,

Paolo

> +cq->iothread = >internal_iothread_obj;
> +cq->ctx = iothread_get_aio_context(cq->iothread);
> +//Question: Need a conf.blk for each cq/sq???
> +//blk_set_aio_context(cq->conf->conf.blk, cq->ctx);
> +aio_context_acquire(cq->ctx);
> +aio_set_event_notifier(cq->ctx, >notifier, true,
> +   nvme_cq_notifier);
> +aio_context_release(cq->ctx);
>  }
>  
>  static void nvme_sq_notifier(EventNotifier *e)
> @@ -578,9 +593,22 @@ static void nvme_init_sq_eventfd(NvmeSQueue *sq)
>  uint16_t offset = sq->sqid * 2 * (4 << NVME_CAP_DSTRD(n->bar.cap));
>  
>  event_notifier_init(>notifier, 0);
> -event_notifier_set_handler(>notifier, nvme_sq_notifier);
>  memory_region_add_eventfd(>iomem,
>  0x1000 + offset, 4, false, 0, >notifier);
> +
> +object_initialize(>internal_iothread_obj,
> +  sizeof(sq->internal_iothread_obj),
> +  TYPE_IOTHREAD);
> +user_creatable_complete(OBJECT(>internal_iothread_obj), 
> _abort);
> +sq->iothread = >internal_iothread_obj;
> +sq->ctx = iothread_get_aio_context(sq->iothread);
> +//Question: Need a conf.blk for each cq/sq???
> +//blk_set_aio_context(sq->conf->conf.blk, sq->ctx);
> +
> +aio_context_acquire(sq->ctx);
> +aio_set_event_notifier(sq->ctx, >notifier, true,
> +   nvme_sq_notifier);
> +aio_context_release(sq->ctx);
>  }
>  
>  static uint16_t nvme_set_db_memory(NvmeCtrl *n, const NvmeCmd *cmd)
> diff --git a/hw/block/nvme.h b/hw/block/nvme.h
> index 608f202..171ee0b 100644
> --- a/hw/block/nvme.h
> +++ b/hw/block/nvme.h
> @@ -667,6 +667,10 @@ typedef struct NvmeSQueue {
>   * do not go over this value will not result in MMIO writes (but will
>   * still write the tail pointer to the 

Re: [RFC PATCH 0/9] vhost-nvme: new qemu nvme backend using nvme target

2015-11-23 Thread Ming Lin
On Mon, 2015-11-23 at 15:14 +0100, Paolo Bonzini wrote:
> 
> On 23/11/2015 09:17, Ming Lin wrote:
> > On Sat, 2015-11-21 at 14:11 +0100, Paolo Bonzini wrote:
> >>
> >> On 20/11/2015 01:20, Ming Lin wrote:
> >>> One improvment could be to use google's NVMe vendor extension that
> >>> I send in another thread, aslo here:
> >>> https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=nvme-google-ext
> >>>
> >>> Qemu side:
> >>> http://www.minggr.net/cgit/cgit.cgi/qemu/log/?h=vhost-nvme.0
> >>> Kernel side also here:
> >>> https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=vhost-nvme.0
> >>
> >> How much do you get with vhost-nvme plus vendor extension, compared to
> >> 190 MB/s for QEMU?
> > 
> > There is still some bug. I'll update.
> 
> Sure.
> 
> >> Note that in all likelihood, QEMU can actually do better than 190 MB/s,
> >> and gain more parallelism too, by moving the processing of the
> >> ioeventfds to a separate thread.  This is similar to
> >> hw/block/dataplane/virtio-blk.c.
> >>
> >> It's actually pretty easy to do.  Even though
> >> hw/block/dataplane/virtio-blk.c is still using some old APIs, all memory
> >> access in QEMU is now thread-safe.  I have pending patches for 2.6 that
> >> cut that file down to a mere 200 lines of code, NVMe would probably be
> >> about the same.
> > 
> > Is there a git tree for your patches?
> 
> No, not yet.  I'll post them today or tomorrow, will make sure to Cc you.
> 
> > Did you mean some pseduo code as below?
> > 1. need a iothread for each cq/sq?
> > 2. need a AioContext for each cq/sq?
> > 
> >  hw/block/nvme.c | 32 ++--
> >  hw/block/nvme.h |  8 
> >  2 files changed, 38 insertions(+), 2 deletions(-)
> > 
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index f27fd35..fed4827 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -28,6 +28,8 @@
> >  #include "sysemu/sysemu.h"
> >  #include "qapi/visitor.h"
> >  #include "sysemu/block-backend.h"
> > +#include "sysemu/iothread.h"
> > +#include "qom/object_interfaces.h"
> >  
> >  #include "nvme.h"
> >  
> > @@ -558,9 +560,22 @@ static void nvme_init_cq_eventfd(NvmeCQueue *cq)
> >  uint16_t offset = (cq->cqid*2+1) * (4 << NVME_CAP_DSTRD(n->bar.cap));
> >  
> >  event_notifier_init(>notifier, 0);
> > -event_notifier_set_handler(>notifier, nvme_cq_notifier);
> >  memory_region_add_eventfd(>iomem,
> >  0x1000 + offset, 4, false, 0, >notifier);
> > +
> > +object_initialize(>internal_iothread_obj,
> > +  sizeof(cq->internal_iothread_obj),
> > +  TYPE_IOTHREAD);
> > +user_creatable_complete(OBJECT(>internal_iothread_obj), 
> > _abort);
> 
> For now, you have to use one iothread for all cq/sq of a single NVMe
> device; multiqueue block layer is planned for 2.7 or 2.8.  Otherwise
> yes, it's very close to just these changes.

Here is the call stack of iothread for virtio-blk-dataplane.

handle_notify (qemu/hw/block/dataplane/virtio-blk.c:126)
aio_dispatch (qemu/aio-posix.c:329)
aio_poll (qemu/aio-posix.c:474)
iothread_run (qemu/iothread.c:45)
start_thread (pthread_create.c:312)
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)

I think I'll have a "nvme_dev_notify" similar as "handle_notify"

static void nvme_dev_notify(EventNotifier *e)
{

}

But then how can I know this notify is for cq or sq?

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 0/9] vhost-nvme: new qemu nvme backend using nvme target

2015-11-21 Thread Paolo Bonzini


On 20/11/2015 01:20, Ming Lin wrote:
> One improvment could be to use google's NVMe vendor extension that
> I send in another thread, aslo here:
> https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=nvme-google-ext
> 
> Qemu side:
> http://www.minggr.net/cgit/cgit.cgi/qemu/log/?h=vhost-nvme.0
> Kernel side also here:
> https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=vhost-nvme.0

How much do you get with vhost-nvme plus vendor extension, compared to
190 MB/s for QEMU?

Note that in all likelihood, QEMU can actually do better than 190 MB/s,
and gain more parallelism too, by moving the processing of the
ioeventfds to a separate thread.  This is similar to
hw/block/dataplane/virtio-blk.c.

It's actually pretty easy to do.  Even though
hw/block/dataplane/virtio-blk.c is still using some old APIs, all memory
access in QEMU is now thread-safe.  I have pending patches for 2.6 that
cut that file down to a mere 200 lines of code, NVMe would probably be
about the same.

Paolo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC PATCH 0/9] vhost-nvme: new qemu nvme backend using nvme target

2015-11-19 Thread Ming Lin
Hi,

This is the first attempt to add a new qemu nvme backend using
in-kernel nvme target.

Most code are ported from qemu-nvme and also borrow code from
Hannes Reinecke's rts-megasas.

It's similar as vhost-scsi, but doesn't use virtio.
The advantage is guest can run unmodified NVMe driver.
So guest can be any OS that has a NVMe driver.

The goal is to get as good performance as vhost-scsi.
But for now, peformance is poor.

MMIO is the bottleneck.

One improvment could be to use google's NVMe vendor extension that
I send in another thread, aslo here:
https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=nvme-google-ext

Qemu side:
http://www.minggr.net/cgit/cgit.cgi/qemu/log/?h=vhost-nvme.0
Kernel side also here:
https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=vhost-nvme.0

Thanks for any comment,
Ming

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 0/9] vhost-nvme: new qemu nvme backend using nvme target

2015-11-19 Thread Christoph Hellwig
Thanks Ming,

from a first quick view this looks great.  I'll look over it in a bit
more detail once I get a bit more time.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 0/9] vhost-nvme: new qemu nvme backend using nvme target

2015-11-19 Thread Ming Lin
On Fri, 2015-11-20 at 06:16 +0100, Christoph Hellwig wrote:
> Thanks Ming,
> 
> from a first quick view this looks great.  I'll look over it in a bit
> more detail once I get a bit more time.

Thanks to CC Nic :-)

But funny, I double-checked bash history. I actually CCed Nic.
Don't know why it's lost.

mlin@ssi:~$ history |grep "nab"
 1861  git send-email --from "Ming Lin " --to
"linux-n...@lists.infradead.org"  --cc "qemu-de...@nongnu.org" --cc
"virtualization@lists.linux-foundation.org" --cc "Christoph Hellwig
" --cc "Nicholas A. Bellinger "
--compose ~/patches/*.patch


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization