On Nov 30 12:33, Stefan Hajnoczi wrote: > On Fri, Nov 27, 2020 at 12:46:01AM +0100, Klaus Jensen wrote: > > From: Klaus Jensen <k.jen...@samsung.com> > > > > Signed-off-by: Klaus Jensen <k.jen...@samsung.com> > > --- > > docs/specs/nvme.txt | 15 +++ > > hw/block/nvme-ns.h | 16 ++++ > > hw/block/nvme-ns.c | 212 +++++++++++++++++++++++++++++++++++++++++- > > hw/block/nvme.c | 87 +++++++++++++++++ > > hw/block/trace-events | 2 + > > 5 files changed, 331 insertions(+), 1 deletion(-) > > > > diff --git a/docs/specs/nvme.txt b/docs/specs/nvme.txt > > index 03bb4d9516b4..05d81c88ad4e 100644 > > --- a/docs/specs/nvme.txt > > +++ b/docs/specs/nvme.txt > > @@ -20,6 +20,21 @@ The nvme device (-device nvme) emulates an NVM Express > > Controller. > > `zns.mor`; Specifies the number of open resources available. This is a 0s > > based value. > > > > + `zns.pstate`; This parameter specifies another blockdev to be used for > > + storing zone state persistently. > > + > > + -drive id=zns-pstate,file=zns-pstate.img,format=raw > > + -device nvme-ns,zns.pstate=zns-pstate,... > > + > > + To reset (or initialize) state, the blockdev image should be of zero > > size: > > + > > + qemu-img create -f raw zns-pstate.img 0 > > + > > + The image will be intialized with a file format header and truncated > > to > > + the required size. If the pstate given is of non-zero size, it will be > > + assumed to already contain zone state information and the header will > > be > > + checked. > > In principle this makes sense but at first glance it looks like the code > is synchronous - it blocks the vCPU if zone metadata I/O is necessary. > That works for test/bring-up code but can't be used in production due to > the performance impact. > > Is the expectation that the QEMU NVMe device emulation will only be used > for test/bring-up now and in the future? >
Hi Stefan, Thanks for taking a look at this. I could see why someone would maybe use the core nvme emulation in production (but I'm not aware of anyone doing it), but the zoned emulation is *probably* not for production (and that is where the zone updates are needed). But someone could surprise me with a use case I guess. And yes, I know this is synchronous in this version. I have not extensively evaluated the performance impact, but crucially the blocking only happens when the zone changes state (i.e. every write does NOT trigger a blk_pwrite just to persist the updated write pointer). I know this can be done asynchronously (I have implemented it like so previously), but I wanted to get an opinion on the general stategry before adding that. The opposing strategy, is to use a some form of mmap/msync, but I, for one, pushed back against that because I'd like this to work on as many platforms as possible. Hence the RFC for this blockdev based approach. But if you think a blockdev approach like this is a reasonable QEMU-esce way of doing it, then I'll proceed to do a v2 with asynchronous updates.
signature.asc
Description: PGP signature