nvme: fix zone write finalize

Klaus Jensen Thu, 14 Jan 2021 22:02:10 -0800

On Jan 14 15:03, Keith Busch wrote:
> On Tue, Jan 12, 2021 at 10:42:35AM +0100, Klaus Jensen wrote:
> > From: Klaus Jensen <[email protected]>
> > 
> > The zone write pointer is unconditionally advanced, even for write
> > faults. Make sure that the zone is always transitioned to Full if the
> > write pointer reaches zone capacity.
> 
> Looks like some spec weirdness. It says we can transition to full:
> 
>   b) as a result of successful completion of a write operation that
>      writes one or more logical blocks that causes the zone to reach its
>      writeable zone capacity;
> 
> But a failed write also advances the write pointer as you've pointed
> out, so they might want to strike "successful".
>


Yes, its slightly inconsistent.

Empty/Closed to Opened is also kinda fuzzy in the spec. The wording is
slightly different when transitioning from Empty and Closed to
Implicitly Opened. ZSE->ZSIO just says "and a write operation writes on
or more logical blocks of that zone". ZSC->ZSIO says "and a write
operation that writes one or more logical blocks to the zone completes
succesfully". This has given me some headaches on if the transition
should occur when "preparing/submitting" the write or when the write
completes. But I guess this is pretty much an implementation detail of
the device.

I should mention this to my representatives ;)

> Looks fine to me.
> 
> Reviewed-by: Keith Busch <[email protected]>
> 
> > Signed-off-by: Klaus Jensen <[email protected]>
> > Cc: Dmitry Fomichev <[email protected]>
> > ---
> >  hw/block/nvme.c | 10 +++++-----
> >  1 file changed, 5 insertions(+), 5 deletions(-)
> > 
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index 0854ee307227..280b31b4459d 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -1268,10 +1268,13 @@ static void nvme_finalize_zoned_write(NvmeNamespace 
> > *ns, NvmeRequest *req,
> >      nlb = le16_to_cpu(rw->nlb) + 1;
> >      zone = nvme_get_zone_by_slba(ns, slba);
> >  
> > +    zone->d.wp += nlb;
> > +
> >      if (failed) {
> >          res->slba = 0;
> > -        zone->d.wp += nlb;
> > -    } else if (zone->w_ptr == nvme_zone_wr_boundary(zone)) {
> > +    }
> > +
> > +    if (zone->d.wp == nvme_zone_wr_boundary(zone)) {
> 
> The previous check was using 'zone->w_ptr', but now it's 'zone->d.wp'.
> As far as I can tell, this difference will mean the zone won't finalize
> until the last write completes, where before it could finalize after the
> zone's last write is submitted. Either way looks okay, but I think these
> two values ought to always be in sync.
> 

Right - the zone will transition to Full on the last completed write. I
really not sure if this models devices better or not. But I would argue
that a real device would not relinquish resources until the last write
completes.

An issue is that for QD>1 (zone append), we can easily end up with
appends A and B completing in reversed order such that d.wp is advanced
by B_nlb, but we left a "hole" of A_nlb in the zone because that write
has not completed yet.

But - I think *some* inconsistency on the write pointer value is to be
expected when using Zone Append. The host can only ever depend on a
consistent value of the write pointer by quiescing I/O to the zone and
then getting a zone report - and d.wp and d.w_ptr will always
converge in that case.

signature.asc
Description: PGP signature

Re: [PATCH] hw/block/nvme: fix zone write finalize

Reply via email to