nvme: add persistence for zone info

Klaus Jensen Mon, 30 Nov 2020 05:00:02 -0800

On Nov 30 12:33, Stefan Hajnoczi wrote:
> On Fri, Nov 27, 2020 at 12:46:01AM +0100, Klaus Jensen wrote:
> > From: Klaus Jensen <k.jen...@samsung.com>
> > 
> > Signed-off-by: Klaus Jensen <k.jen...@samsung.com>
> > ---
> >  docs/specs/nvme.txt   |  15 +++
> >  hw/block/nvme-ns.h    |  16 ++++
> >  hw/block/nvme-ns.c    | 212 +++++++++++++++++++++++++++++++++++++++++-
> >  hw/block/nvme.c       |  87 +++++++++++++++++
> >  hw/block/trace-events |   2 +
> >  5 files changed, 331 insertions(+), 1 deletion(-)
> > 
> > diff --git a/docs/specs/nvme.txt b/docs/specs/nvme.txt
> > index 03bb4d9516b4..05d81c88ad4e 100644
> > --- a/docs/specs/nvme.txt
> > +++ b/docs/specs/nvme.txt
> > @@ -20,6 +20,21 @@ The nvme device (-device nvme) emulates an NVM Express 
> > Controller.
> >    `zns.mor`; Specifies the number of open resources available. This is a 0s
> >       based value.
> >  
> > +  `zns.pstate`; This parameter specifies another blockdev to be used for
> > +     storing zone state persistently.
> > +
> > +       -drive id=zns-pstate,file=zns-pstate.img,format=raw
> > +       -device nvme-ns,zns.pstate=zns-pstate,...
> > +
> > +     To reset (or initialize) state, the blockdev image should be of zero 
> > size:
> > +
> > +       qemu-img create -f raw zns-pstate.img 0
> > +
> > +     The image will be intialized with a file format header and truncated 
> > to
> > +     the required size. If the pstate given is of non-zero size, it will be
> > +     assumed to already contain zone state information and the header will 
> > be
> > +     checked.
> 
> In principle this makes sense but at first glance it looks like the code
> is synchronous - it blocks the vCPU if zone metadata I/O is necessary.
> That works for test/bring-up code but can't be used in production due to
> the performance impact.
> 
> Is the expectation that the QEMU NVMe device emulation will only be used
> for test/bring-up now and in the future?
>


Hi Stefan,

Thanks for taking a look at this.

I could see why someone would maybe use the core nvme emulation in
production (but I'm not aware of anyone doing it), but the zoned
emulation is *probably* not for production (and that is where the zone
updates are needed). But someone could surprise me with a use case I
guess.

And yes, I know this is synchronous in this version. I have not
extensively evaluated the performance impact, but crucially the blocking
only happens when the zone changes state (i.e. every write does NOT
trigger a blk_pwrite just to persist the updated write pointer).

I know this can be done asynchronously (I have implemented it like so
previously), but I wanted to get an opinion on the general stategry
before adding that. The opposing strategy, is to use a some form of
mmap/msync, but I, for one, pushed back against that because I'd like
this to work on as many platforms as possible. Hence the RFC for this
blockdev based approach.

But if you think a blockdev approach like this is a reasonable QEMU-esce
way of doing it, then I'll proceed to do a v2 with asynchronous updates.

signature.asc
Description: PGP signature

Re: [PATCH RFC v5 12/12] hw/block/nvme: add persistence for zone info

Reply via email to