On 1/29/18, 8:29 AM, "Stefan Hajnoczi" <stefa...@gmail.com> wrote:
<trim> Each new feature has a cost in terms of maintainance, testing, documentation, and support. Users need to be educated about the role of each available storage controller and how to choose between them. I'm not sure why QEMU should go in this direction since it makes the landscape more complex and harder to support. You've said the performance is comparable to vhost-user-blk. So what does NVMe offer that makes this worthwhile? A cool NVMe feature would be the ability to pass through invididual queues to different guests without SR-IOV. In other words, bind a queue to namespace subset so that multiple guests can be isolated from each other. That way the data path would not require vmexits. The control path and device initialization would still be emulated by QEMU so the hardware does not need to provide the full resources and state needed for SR-IOV. I looked into this but came to the conclusion that it would require changes to the NVMe specification because the namespace is a per-command field. Correct – any command from any queue can access any namespace on the controller. Another reason this is not possible is that most (if not all?) controllers have CAP.DSTRD (Doorbell Stride) = 0, meaning doorbell registers for all queues fall within the same page.