On Dec 17 13:14, Keith Busch wrote: > On Thu, Dec 17, 2020 at 10:02:19PM +0100, Klaus Jensen wrote: > > From: Klaus Jensen <k.jen...@samsung.com> > > > > This series adds support for extended LBAs and end-to-end data > > protection. Marked RFC, since there are a bunch of issues that could use > > some discussion. > > > > Storing metadata bytes contiguously with the logical block data and > > creating a physically extended logical block basically breaks the DULBE > > and deallocation support I just added. Formatting a namespace with > > protection information requires the app- and reftags of deallocated or > > unwritten blocks to be 0xffff and 0xffffffff respectively; this could be > > used to reintroduce DULBE support in that case, albeit at a somewhat > > higher cost than the block status flag-based approach. > > > > There is basically three ways of storing metadata (and maybe a forth, > > but that is probably quite the endeavour): > > > > 1. Storing metadata as extended blocks directly on the blockdev. That > > is the approach used in this RFC. > > > > 2. Use a separate blockdev. Incidentially, this is also the easiest > > and most straightforward solution to support MPTR-based "separate > > metadata". This also allows DULBE and block deallocation to be > > supported using the existing approach. > > > > 3. A hybrid of 1 and 2 where the metadata is stored contiguously at > > the end of the nvme-ns blockdev. > > > > Option 1 obviously works well with DIF-based protection information and > > extended LBAs since it maps one to one. Option 2 works flawlessly with > > MPTR-based metadata, but extended LBAs can be "emulated" at the cost of > > a bunch of scatter/gather operations. > > Are there any actual users of extended metadata that we care about? I'm > aware of only a few niche places that can even access an extended > metadata format. There's not kernel support in any major OS that I know > of. >
Yes, there are definitely actual users in enterprise storage. But the main use case here is testing (using extended LBAs with SPDK for instance). > Option 2 sounds fine. > > If option 3 means that you're still using MPTR, but just sequester space > at the end of the backing block device for meta-data purposes, then that > is fine too. You can even resize it dynamically if you want to support > different metadata sizes. Heh, I tend to think that my English vocabulary is pretty decent but I had to look up 'sequester'. I just learned a new word today \o/ Yes. I actually also like option 3. Technically option 2 does not break image interoperability between devices (ide, virtio), but you would leave out a bunch of metadata that your application might depend on, so I don't see any way to not break interoperability really. And I would then be just fine with "emulating" extended LBAs at the cost of more I/O. Because I would like the device to support that mode of operation as well. We have not implemented this, but my gut feeling says that it can be done. > > > The 4th option is extending an existing image format (QCOW2) or create > > something on top of RAW to supports metadata bytes per block. But both > > approaches require full API support through the block layer. And > > probably a lot of other stuff that I did not think about. > > It definitely sounds appealing to push the feature to a lower level if > you're really willing to see that through. > Yes, its super appealing and I would like to have some input from the block layer guys on this. That is, if anyone has ever explored it? > In any case, calculating T10 CRCs is *really* slow unless you have > special hardware and software support for it. > Yeah. I know this is super slow. For for emulation and testing purposes I think it is a nice feature for the device to optionally offer.
signature.asc
Description: PGP signature