On Jan 14 15:03, Keith Busch wrote: > On Tue, Jan 12, 2021 at 10:42:35AM +0100, Klaus Jensen wrote: > > From: Klaus Jensen <k.jen...@samsung.com> > > > > The zone write pointer is unconditionally advanced, even for write > > faults. Make sure that the zone is always transitioned to Full if the > > write pointer reaches zone capacity. > > Looks like some spec weirdness. It says we can transition to full: > > b) as a result of successful completion of a write operation that > writes one or more logical blocks that causes the zone to reach its > writeable zone capacity; > > But a failed write also advances the write pointer as you've pointed > out, so they might want to strike "successful". >
Yes, its slightly inconsistent. Empty/Closed to Opened is also kinda fuzzy in the spec. The wording is slightly different when transitioning from Empty and Closed to Implicitly Opened. ZSE->ZSIO just says "and a write operation writes on or more logical blocks of that zone". ZSC->ZSIO says "and a write operation that writes one or more logical blocks to the zone completes succesfully". This has given me some headaches on if the transition should occur when "preparing/submitting" the write or when the write completes. But I guess this is pretty much an implementation detail of the device. I should mention this to my representatives ;) > Looks fine to me. > > Reviewed-by: Keith Busch <kbu...@kernel.org> > > > Signed-off-by: Klaus Jensen <k.jen...@samsung.com> > > Cc: Dmitry Fomichev <dmitry.fomic...@wdc.com> > > --- > > hw/block/nvme.c | 10 +++++----- > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > diff --git a/hw/block/nvme.c b/hw/block/nvme.c > > index 0854ee307227..280b31b4459d 100644 > > --- a/hw/block/nvme.c > > +++ b/hw/block/nvme.c > > @@ -1268,10 +1268,13 @@ static void nvme_finalize_zoned_write(NvmeNamespace > > *ns, NvmeRequest *req, > > nlb = le16_to_cpu(rw->nlb) + 1; > > zone = nvme_get_zone_by_slba(ns, slba); > > > > + zone->d.wp += nlb; > > + > > if (failed) { > > res->slba = 0; > > - zone->d.wp += nlb; > > - } else if (zone->w_ptr == nvme_zone_wr_boundary(zone)) { > > + } > > + > > + if (zone->d.wp == nvme_zone_wr_boundary(zone)) { > > The previous check was using 'zone->w_ptr', but now it's 'zone->d.wp'. > As far as I can tell, this difference will mean the zone won't finalize > until the last write completes, where before it could finalize after the > zone's last write is submitted. Either way looks okay, but I think these > two values ought to always be in sync. > Right - the zone will transition to Full on the last completed write. I really not sure if this models devices better or not. But I would argue that a real device would not relinquish resources until the last write completes. An issue is that for QD>1 (zone append), we can easily end up with appends A and B completing in reversed order such that d.wp is advanced by B_nlb, but we left a "hole" of A_nlb in the zone because that write has not completed yet. But - I think *some* inconsistency on the write pointer value is to be expected when using Zone Append. The host can only ever depend on a consistent value of the write pointer by quiescing I/O to the zone and then getting a zone report - and d.wp and d.w_ptr will always converge in that case.
signature.asc
Description: PGP signature