Re: [PATCH v2 00/20] libnd: non-volatile memory device support
On Tue, 2015-05-05 at 02:06 +0200, Rafael J. Wysocki wrote: > On Tuesday, April 28, 2015 06:22:05 PM Dan Williams wrote: > > On Tue, Apr 28, 2015 at 5:25 PM, Rafael J. Wysocki > > wrote: > > > On Tuesday, April 28, 2015 02:24:12 PM Dan Williams wrote: > > >> Changes since v1 [1]: Incorporates feedback received prior to April 24. > > >> > > [cut] > > > > > > > I'm wondering what's wrong with CCing all of the series to linux-acpi? > > > > > > Is there anything in it that the people on that list should not see, by > > > any > > > chance? > > > > linux-acpi may not care about the dimm-metadata labeling patches that > > are completely independent of ACPI, but might as well include > > linux-acpi on the whole series at this point. > > I've gone through the ACPI-related patches in this series (other than [2/20] > that I've commented directly) and while I haven't found anything horrible in > them, I don't quite feel confident enough to ACK them. > > What I'm really missing in this series is a design document describing all > that > from a high-level perspective and making it clear where all of the pieces go > and what their respective roles are. Also reordering the series to introduce > the nd subsystem to start with and then its users might help here. Here you go, and also see the "Supporting Documents" section if you need more details, or just ask. This is the reworked document after pushing NFIT specifics out of the core implementation. The core apis are nd_bus_register(), nd_dimm_create(), nd_pmem_region_create(), and nd_blk_region_create(). --- LIBND: Non-volatile Devices libnd - kernel / libndctl - userspace helper library linux-nvd...@lists.01.org v10 Glossary Overview Supporting Documents Git Trees LIBND PMEM and BLK Why BLK? PMEM vs BLK BLK-REGIONs, PMEM-REGIONs, Atomic Sectors, and DAX Example NVDIMM Platform LIBND Kernel Device Model and LIBNDCTL Userspace API LIBNDCTL: Context libndctl: instantiate a new library context example LIBND/LIBNDCTL: Bus libnd: control class device in /sys/class libnd: bus libndctl: bus enumeration example LIBND/LIBNDCTL: DIMM (NMEM) libnd: DIMM (NMEM) libndctl: DIMM enumeration example LIBND/LIBNDCTL: Region libnd: region libndctl: region enumeration example Why Not Encode the Region Type into the Region Name? How Do I Determine the Major Type of a Region? LIBND/LIBNDCTL: Namespace libnd: namespace libndctl: namespace enumeration example libndctl: namespace creation example Why the Term "namespace"? LIBND/LIBNDCTL: Block Translation Table "btt" libnd: btt layout libndctl: btt creation example Summary LIBNDCTL Diagram Glossary PMEM: A system physical address range where writes are persistent. A block device composed of PMEM is capable of DAX. A PMEM address range may span/interleave several DIMMs. BLK: A set of one or more programmable memory mapped apertures provided by a DIMM to access its media. This indirection precludes the performance benefit of interleaving, but enables DIMM-bounded failure modes . DPA: DIMM Physical Address, is a DIMM-relative offset. With one DIMM in the system there would be a 1:1 system-physical-address:DPA association. Once more DIMMs are added an memory controller interleave must be decoded to determine the DPA associated with a given system-physical-address. BLK capacity always has a 1:1 relationship with a single-dimm's DPA range. DAX: File system extensions to bypass the page cache and block layer to mmap persistent memory, from a PMEM block device, directly into a process address space. BTT: Block Translation Table: Persistent memory is byte addressable. Existing software may have an expectation that the power-fail-atomicity of writes is at least one sector, 512 bytes. The BTT is an indirection table with atomic update semantics to front a PMEM/BLK block device driver and present arbitrary atomic sector sizes. LABEL: Metadata stored on a DIMM device that partitions and identifies (persistently names) storage between PMEM and BLK. It also partitions BLK storage to host BTTs with different parameters per BLK-partition. Note that traditional partition tables, GPT/MBR, are layered on top of a BLK or PMEM device. Overview The libnd subsystem provides support for three types of NVDIMMs, PMEM, BLK, and NVDIMM platforms that can simultaneously support PMEM and BLK mode access capabilities on a given set of DIMMs. These three modes of operation are described by the
Re: [PATCH v2 00/20] libnd: non-volatile memory device support
On Tue, 2015-05-05 at 02:06 +0200, Rafael J. Wysocki wrote: On Tuesday, April 28, 2015 06:22:05 PM Dan Williams wrote: On Tue, Apr 28, 2015 at 5:25 PM, Rafael J. Wysocki r...@rjwysocki.net wrote: On Tuesday, April 28, 2015 02:24:12 PM Dan Williams wrote: Changes since v1 [1]: Incorporates feedback received prior to April 24. [cut] I'm wondering what's wrong with CCing all of the series to linux-acpi? Is there anything in it that the people on that list should not see, by any chance? linux-acpi may not care about the dimm-metadata labeling patches that are completely independent of ACPI, but might as well include linux-acpi on the whole series at this point. I've gone through the ACPI-related patches in this series (other than [2/20] that I've commented directly) and while I haven't found anything horrible in them, I don't quite feel confident enough to ACK them. What I'm really missing in this series is a design document describing all that from a high-level perspective and making it clear where all of the pieces go and what their respective roles are. Also reordering the series to introduce the nd subsystem to start with and then its users might help here. Here you go, and also see the Supporting Documents section if you need more details, or just ask. This is the reworked document after pushing NFIT specifics out of the core implementation. The core apis are nd_bus_register(), nd_dimm_create(), nd_pmem_region_create(), and nd_blk_region_create(). --- LIBND: Non-volatile Devices libnd - kernel / libndctl - userspace helper library linux-nvd...@lists.01.org v10 Glossary Overview Supporting Documents Git Trees LIBND PMEM and BLK Why BLK? PMEM vs BLK BLK-REGIONs, PMEM-REGIONs, Atomic Sectors, and DAX Example NVDIMM Platform LIBND Kernel Device Model and LIBNDCTL Userspace API LIBNDCTL: Context libndctl: instantiate a new library context example LIBND/LIBNDCTL: Bus libnd: control class device in /sys/class libnd: bus libndctl: bus enumeration example LIBND/LIBNDCTL: DIMM (NMEM) libnd: DIMM (NMEM) libndctl: DIMM enumeration example LIBND/LIBNDCTL: Region libnd: region libndctl: region enumeration example Why Not Encode the Region Type into the Region Name? How Do I Determine the Major Type of a Region? LIBND/LIBNDCTL: Namespace libnd: namespace libndctl: namespace enumeration example libndctl: namespace creation example Why the Term namespace? LIBND/LIBNDCTL: Block Translation Table btt libnd: btt layout libndctl: btt creation example Summary LIBNDCTL Diagram Glossary PMEM: A system physical address range where writes are persistent. A block device composed of PMEM is capable of DAX. A PMEM address range may span/interleave several DIMMs. BLK: A set of one or more programmable memory mapped apertures provided by a DIMM to access its media. This indirection precludes the performance benefit of interleaving, but enables DIMM-bounded failure modes . DPA: DIMM Physical Address, is a DIMM-relative offset. With one DIMM in the system there would be a 1:1 system-physical-address:DPA association. Once more DIMMs are added an memory controller interleave must be decoded to determine the DPA associated with a given system-physical-address. BLK capacity always has a 1:1 relationship with a single-dimm's DPA range. DAX: File system extensions to bypass the page cache and block layer to mmap persistent memory, from a PMEM block device, directly into a process address space. BTT: Block Translation Table: Persistent memory is byte addressable. Existing software may have an expectation that the power-fail-atomicity of writes is at least one sector, 512 bytes. The BTT is an indirection table with atomic update semantics to front a PMEM/BLK block device driver and present arbitrary atomic sector sizes. LABEL: Metadata stored on a DIMM device that partitions and identifies (persistently names) storage between PMEM and BLK. It also partitions BLK storage to host BTTs with different parameters per BLK-partition. Note that traditional partition tables, GPT/MBR, are layered on top of a BLK or PMEM device. Overview The libnd subsystem provides support for three types of NVDIMMs, PMEM, BLK, and NVDIMM platforms that can simultaneously support PMEM and BLK mode access capabilities on a given set of DIMMs. These three modes of operation are described by the NVDIMM Firmware Interface Table (NFIT) in
Re: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device support
On Tue, Apr 28, 2015 at 03:15:54PM -0700, Dan Williams wrote: > > lsblk's blkdev_scsi_type_to_name() considers 4 to mean > > SCSI_TYPE_WORM (write once read many ... used for certain optical > > and tape drives). > > Why is lsblk assuming these are scsi devices? I'll need to go check that out. It's a very common assumption unfortunately. I rember fixing it in various in-house tools at customers and stumbled over it in targetcli recently. Please use a prefix for your type attribute to avoid this problem. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device support
On Tue, Apr 28, 2015 at 03:15:54PM -0700, Dan Williams wrote: lsblk's blkdev_scsi_type_to_name() considers 4 to mean SCSI_TYPE_WORM (write once read many ... used for certain optical and tape drives). Why is lsblk assuming these are scsi devices? I'll need to go check that out. It's a very common assumption unfortunately. I rember fixing it in various in-house tools at customers and stumbled over it in targetcli recently. Please use a prefix for your type attribute to avoid this problem. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 00/20] libnd: non-volatile memory device support
On Tuesday, April 28, 2015 06:22:05 PM Dan Williams wrote: > On Tue, Apr 28, 2015 at 5:25 PM, Rafael J. Wysocki wrote: > > On Tuesday, April 28, 2015 02:24:12 PM Dan Williams wrote: > >> Changes since v1 [1]: Incorporates feedback received prior to April 24. > >> [cut] > > > > I'm wondering what's wrong with CCing all of the series to linux-acpi? > > > > Is there anything in it that the people on that list should not see, by any > > chance? > > linux-acpi may not care about the dimm-metadata labeling patches that > are completely independent of ACPI, but might as well include > linux-acpi on the whole series at this point. I've gone through the ACPI-related patches in this series (other than [2/20] that I've commented directly) and while I haven't found anything horrible in them, I don't quite feel confident enough to ACK them. What I'm really missing in this series is a design document describing all that from a high-level perspective and making it clear where all of the pieces go and what their respective roles are. Also reordering the series to introduce the nd subsystem to start with and then its users might help here. -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 00/20] libnd: non-volatile memory device support
On Tuesday, April 28, 2015 06:22:05 PM Dan Williams wrote: On Tue, Apr 28, 2015 at 5:25 PM, Rafael J. Wysocki r...@rjwysocki.net wrote: On Tuesday, April 28, 2015 02:24:12 PM Dan Williams wrote: Changes since v1 [1]: Incorporates feedback received prior to April 24. [cut] I'm wondering what's wrong with CCing all of the series to linux-acpi? Is there anything in it that the people on that list should not see, by any chance? linux-acpi may not care about the dimm-metadata labeling patches that are completely independent of ACPI, but might as well include linux-acpi on the whole series at this point. I've gone through the ACPI-related patches in this series (other than [2/20] that I've commented directly) and while I haven't found anything horrible in them, I don't quite feel confident enough to ACK them. What I'm really missing in this series is a design document describing all that from a high-level perspective and making it clear where all of the pieces go and what their respective roles are. Also reordering the series to introduce the nd subsystem to start with and then its users might help here. -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 00/20] libnd: non-volatile memory device support
On Tue, 2015-04-28 at 16:05 -0700, Andy Lutomirski wrote: > On Tue, Apr 28, 2015 at 3:28 PM, Dan Williams > wrote: > > On Tue, Apr 28, 2015 at 2:06 PM, Andy Lutomirski > > wrote: > >> On Tue, Apr 28, 2015 at 1:59 PM, Dan Williams > >> wrote: > >>> On Tue, Apr 28, 2015 at 1:52 PM, Andy Lutomirski > >>> wrote: > On Tue, Apr 28, 2015 at 11:24 AM, Dan Williams > wrote: > Mostly for my understanding: is there a name for "address relative to > the address lines on the DIMM"? That is, a DIMM that exposes 8 GB of > apparent physical memory, possibly interleaved, broken up, or weirdly > remapped by the memory controller, would still have addresses between > 0 and 8 GB. Some of those might be PMEM windows, some might be MMIO, > some might be BLK apertures, etc. > > IIUC "DPA" refers to actual addressable storage, not this type of > address? > >>> > >>> No, DPA is exactly as you describe above. You can't directly access > >>> it except through a PMEM mapping (possibly interleaved with DPA from > >>> other DIMMs) or a BLK aperture (mmio window into DPA). > >> > >> So the thing I'm describing has no name, then? Oh, well. > > > > What? The thing you are describing *is* DPA. > > I'm confused. Here are the two things I have in mind: > > 1. An address into on-DIMM storage. If I have a DIMM that is mapped > to 8 GB of SPA but has 64 GB of usable storage (accessed through BLK > apertures, say), then this address runs from 0 to 64 GB. > > 2. An address into the DIMM's view of physical address space. If I > have a DIMM that is mapped to 8 GB of SPA but has 64 GB of usable > storage (accessed through BLK apertures, say), then this address runs > from 0 to 8 GB. There's a one-to-one mapping between SPA and this > type of address. > > Since you said "a dimm may provide both PMEM-mode and BLK-mode access > to a range of DPA.," I thought that DPA was #1. > > --Andy I think that you've got the right definition, #1 above, for DPA. The DPA is relative to the DIMM, knows nothing about interleaving or SPA or anything else in the system, and is basically equivalent to the idea of an LBA on a disk. A DIMM that has 64 GiB of storage could have a DPA space ranging from 0 to 64 GiB. The second concept is a little trickier - we've been talking about this by using the term "N-way interleave set". Say you have your 64 GiB DIMM and only the first 8 GiB are given to the OS in an SPA, and that DIMM isn't interleaved with any other DIMMs. This would be a 1-way interleave set, ranging from DPA 0 - 8GiB on the DIMM. If you have 2 DIMMs of size 64 GiB, and they each have a 8 GiB region given to the SPA space, those two regions could be interleaved together. The OS would then see a 16 GiB 2-way interleave set, made up of DPAs 0 -> 8 GiB on each of the two DIMMs. You can figure out exactly how all the interleaving works by looking at the SPA tables, the Memory Device tables and the Interleave Tables. These are in sections 5.2.25.1 - 5.2.25.3 in ACPI 6, and are in our code as struct acpi_nfit_spa, struct acpi_nfit_memdev and struct acpi_nfit_idt. - Ross -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 00/20] libnd: non-volatile memory device support
On Tue, 2015-04-28 at 16:05 -0700, Andy Lutomirski wrote: On Tue, Apr 28, 2015 at 3:28 PM, Dan Williams dan.j.willi...@intel.com wrote: On Tue, Apr 28, 2015 at 2:06 PM, Andy Lutomirski l...@amacapital.net wrote: On Tue, Apr 28, 2015 at 1:59 PM, Dan Williams dan.j.willi...@intel.com wrote: On Tue, Apr 28, 2015 at 1:52 PM, Andy Lutomirski l...@amacapital.net wrote: On Tue, Apr 28, 2015 at 11:24 AM, Dan Williams dan.j.willi...@intel.com wrote: Mostly for my understanding: is there a name for address relative to the address lines on the DIMM? That is, a DIMM that exposes 8 GB of apparent physical memory, possibly interleaved, broken up, or weirdly remapped by the memory controller, would still have addresses between 0 and 8 GB. Some of those might be PMEM windows, some might be MMIO, some might be BLK apertures, etc. IIUC DPA refers to actual addressable storage, not this type of address? No, DPA is exactly as you describe above. You can't directly access it except through a PMEM mapping (possibly interleaved with DPA from other DIMMs) or a BLK aperture (mmio window into DPA). So the thing I'm describing has no name, then? Oh, well. What? The thing you are describing *is* DPA. I'm confused. Here are the two things I have in mind: 1. An address into on-DIMM storage. If I have a DIMM that is mapped to 8 GB of SPA but has 64 GB of usable storage (accessed through BLK apertures, say), then this address runs from 0 to 64 GB. 2. An address into the DIMM's view of physical address space. If I have a DIMM that is mapped to 8 GB of SPA but has 64 GB of usable storage (accessed through BLK apertures, say), then this address runs from 0 to 8 GB. There's a one-to-one mapping between SPA and this type of address. Since you said a dimm may provide both PMEM-mode and BLK-mode access to a range of DPA., I thought that DPA was #1. --Andy I think that you've got the right definition, #1 above, for DPA. The DPA is relative to the DIMM, knows nothing about interleaving or SPA or anything else in the system, and is basically equivalent to the idea of an LBA on a disk. A DIMM that has 64 GiB of storage could have a DPA space ranging from 0 to 64 GiB. The second concept is a little trickier - we've been talking about this by using the term N-way interleave set. Say you have your 64 GiB DIMM and only the first 8 GiB are given to the OS in an SPA, and that DIMM isn't interleaved with any other DIMMs. This would be a 1-way interleave set, ranging from DPA 0 - 8GiB on the DIMM. If you have 2 DIMMs of size 64 GiB, and they each have a 8 GiB region given to the SPA space, those two regions could be interleaved together. The OS would then see a 16 GiB 2-way interleave set, made up of DPAs 0 - 8 GiB on each of the two DIMMs. You can figure out exactly how all the interleaving works by looking at the SPA tables, the Memory Device tables and the Interleave Tables. These are in sections 5.2.25.1 - 5.2.25.3 in ACPI 6, and are in our code as struct acpi_nfit_spa, struct acpi_nfit_memdev and struct acpi_nfit_idt. - Ross -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 00/20] libnd: non-volatile memory device support
On Tue, Apr 28, 2015 at 5:25 PM, Rafael J. Wysocki wrote: > On Tuesday, April 28, 2015 02:24:12 PM Dan Williams wrote: >> Changes since v1 [1]: Incorporates feedback received prior to April 24. >> >> 1/ Ingo said [2]: >> >>"So why on earth is this whole concept and the naming itself >>('drivers/block/nd/' stands for 'NFIT Defined', apparently) >>revolving around a specific 'firmware' mindset and revolving >>around specific, weirdly named, overly complicated looking >>firmware interfaces that come with their own new weird >>glossary??" >> >>Indeed, we of course consulted the NFIT specification to determine >>the shape of the sub-system, but then let its terms and data >>structures permeate too deep into the implementation. That is fixed >>now with all NFIT specifics factored out into acpi.c. The NFIT is no >>longer required reading to review libnd. Only three concepts are >>needed: >> >> i/ PMEM - contiguous memory range where cpu stores are >>persistent once they are flushed through the memory >> controller. >> >> ii/ BLK - mmio apertures (sliding windows) that can be >>programmed to access an aperture's-worth of persistent >> media at a time. >> >> iii/ DPA - "dimm-physical-address", address space local to a >>dimm. A dimm may provide both PMEM-mode and BLK-mode >>access to a range of DPA. libnd manages allocation of DPA >> to either PMEM or BLK-namespaces to resolve this aliasing. >> >>The v1..v2 diffstat below shows the migration of nfit-specifics to >>acpi.c and the new state of libnd being nfit-free. "nd" now only >>refers to "non-volatile devices". Note, reworked documentation will >>return once the review has settled. >> >>Documentation/blockdev/nd.txt | 867 - >>MAINTAINERS | 34 +- >>arch/ia64/kernel/efi.c|5 +- >>arch/x86/kernel/e820.c| 11 +- >>arch/x86/kernel/pmem.c|2 +- >>drivers/block/Makefile|2 +- >>drivers/block/nd/Kconfig | 135 ++-- >>drivers/block/nd/Makefile | 32 +- >>drivers/block/nd/acpi.c | 1506 >> +++-- >>drivers/block/nd/acpi_nfit.h | 321 >>drivers/block/nd/blk.c| 27 +- >>drivers/block/nd/btt.c|6 +- >>drivers/block/nd/btt_devs.c |8 +- >>drivers/block/nd/bus.c| 337 + >>drivers/block/nd/core.c | 574 +- >>drivers/block/nd/dimm.c | 11 - >>drivers/block/nd/dimm_devs.c | 292 ++- >>drivers/block/nd/e820.c | 100 +++ >>drivers/block/nd/libnd.h | 122 +++ >>drivers/block/nd/namespace_devs.c | 10 +- >>drivers/block/nd/nd-private.h | 107 +-- >>drivers/block/nd/nd.h | 91 +-- >>drivers/block/nd/nfit.h | 238 -- >>drivers/block/nd/pmem.c | 56 +- >>drivers/block/nd/region.c | 78 +- >>drivers/block/nd/region_devs.c| 783 +++ >>drivers/block/nd/test/iomap.c | 86 +-- >>drivers/block/nd/test/nfit.c | 1115 +++ >>drivers/block/nd/test/nfit_test.h | 15 +- >>include/uapi/linux/ndctl.h| 130 ++-- >>30 files changed, 3166 insertions(+), 3935 deletions(-) >>delete mode 100644 Documentation/blockdev/nd.txt >>create mode 100644 drivers/block/nd/acpi_nfit.h >>create mode 100644 drivers/block/nd/e820.c >>create mode 100644 drivers/block/nd/libnd.h >>delete mode 100644 drivers/block/nd/nfit.h >> >>[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-April/000484.html >>[2]: https://lists.01.org/pipermail/linux-nvdimm/2015-April/000520.html >> >> 2/ Christoph asked the pmem ida conversion to be moved to its own patch >>(done), and to consider leaving the current pmem.c in drivers/block/. >>Instead, I converted the e820-type-12 enabling to be the first >>non-ACPI-NFIT based consumer of libnd. The new nd_e820 driver simply >>registers e820-type-12 ranges as libnd PMEM regions. Among other >>things this conversion enables BTT for these ranges. The alternative >>is to move drivers/block/nd/nd.h internals out to include/linux/ >>which I think is worse. >> >> 3/ Toshi reported that the NFIT parsing fails to handle the case of a >>PMEM range with a single-dimm (non-aliasing) interleave description. >>Support for this case was added and is tested by default by the >>nfit_test.1 configuration. >> >> 4/ Toshi reported that we should not be treating a missing _STA property >>as a "dimm disabled by firmware" case. (fixed). >> >> 5/ Christoph noted that ND_ARCH_HAS_IOREMAP_CACHE needs to be moved to >>arch code. It is gone for now
Re: [PATCH v2 00/20] libnd: non-volatile memory device support
On Tuesday, April 28, 2015 02:24:12 PM Dan Williams wrote: > Changes since v1 [1]: Incorporates feedback received prior to April 24. > > 1/ Ingo said [2]: > >"So why on earth is this whole concept and the naming itself >('drivers/block/nd/' stands for 'NFIT Defined', apparently) >revolving around a specific 'firmware' mindset and revolving >around specific, weirdly named, overly complicated looking >firmware interfaces that come with their own new weird >glossary??" > >Indeed, we of course consulted the NFIT specification to determine >the shape of the sub-system, but then let its terms and data >structures permeate too deep into the implementation. That is fixed >now with all NFIT specifics factored out into acpi.c. The NFIT is no >longer required reading to review libnd. Only three concepts are >needed: > > i/ PMEM - contiguous memory range where cpu stores are >persistent once they are flushed through the memory > controller. > > ii/ BLK - mmio apertures (sliding windows) that can be >programmed to access an aperture's-worth of persistent > media at a time. > > iii/ DPA - "dimm-physical-address", address space local to a >dimm. A dimm may provide both PMEM-mode and BLK-mode >access to a range of DPA. libnd manages allocation of DPA > to either PMEM or BLK-namespaces to resolve this aliasing. > >The v1..v2 diffstat below shows the migration of nfit-specifics to >acpi.c and the new state of libnd being nfit-free. "nd" now only >refers to "non-volatile devices". Note, reworked documentation will >return once the review has settled. > >Documentation/blockdev/nd.txt | 867 - >MAINTAINERS | 34 +- >arch/ia64/kernel/efi.c|5 +- >arch/x86/kernel/e820.c| 11 +- >arch/x86/kernel/pmem.c|2 +- >drivers/block/Makefile|2 +- >drivers/block/nd/Kconfig | 135 ++-- >drivers/block/nd/Makefile | 32 +- >drivers/block/nd/acpi.c | 1506 > +++-- >drivers/block/nd/acpi_nfit.h | 321 >drivers/block/nd/blk.c| 27 +- >drivers/block/nd/btt.c|6 +- >drivers/block/nd/btt_devs.c |8 +- >drivers/block/nd/bus.c| 337 + >drivers/block/nd/core.c | 574 +- >drivers/block/nd/dimm.c | 11 - >drivers/block/nd/dimm_devs.c | 292 ++- >drivers/block/nd/e820.c | 100 +++ >drivers/block/nd/libnd.h | 122 +++ >drivers/block/nd/namespace_devs.c | 10 +- >drivers/block/nd/nd-private.h | 107 +-- >drivers/block/nd/nd.h | 91 +-- >drivers/block/nd/nfit.h | 238 -- >drivers/block/nd/pmem.c | 56 +- >drivers/block/nd/region.c | 78 +- >drivers/block/nd/region_devs.c| 783 +++ >drivers/block/nd/test/iomap.c | 86 +-- >drivers/block/nd/test/nfit.c | 1115 +++ >drivers/block/nd/test/nfit_test.h | 15 +- >include/uapi/linux/ndctl.h| 130 ++-- >30 files changed, 3166 insertions(+), 3935 deletions(-) >delete mode 100644 Documentation/blockdev/nd.txt >create mode 100644 drivers/block/nd/acpi_nfit.h >create mode 100644 drivers/block/nd/e820.c >create mode 100644 drivers/block/nd/libnd.h >delete mode 100644 drivers/block/nd/nfit.h > >[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-April/000484.html >[2]: https://lists.01.org/pipermail/linux-nvdimm/2015-April/000520.html > > 2/ Christoph asked the pmem ida conversion to be moved to its own patch >(done), and to consider leaving the current pmem.c in drivers/block/. >Instead, I converted the e820-type-12 enabling to be the first >non-ACPI-NFIT based consumer of libnd. The new nd_e820 driver simply >registers e820-type-12 ranges as libnd PMEM regions. Among other >things this conversion enables BTT for these ranges. The alternative >is to move drivers/block/nd/nd.h internals out to include/linux/ >which I think is worse. > > 3/ Toshi reported that the NFIT parsing fails to handle the case of a >PMEM range with a single-dimm (non-aliasing) interleave description. >Support for this case was added and is tested by default by the >nfit_test.1 configuration. > > 4/ Toshi reported that we should not be treating a missing _STA property >as a "dimm disabled by firmware" case. (fixed). > > 5/ Christoph noted that ND_ARCH_HAS_IOREMAP_CACHE needs to be moved to >arch code. It is gone for now and we'll revisit when adding cached >mappings back to the PMEM driver. > > 6/ Toshi mentioned that the presence of two different
Re: [PATCH v2 00/20] libnd: non-volatile memory device support
On Tue, Apr 28, 2015 at 3:28 PM, Dan Williams wrote: > On Tue, Apr 28, 2015 at 2:06 PM, Andy Lutomirski wrote: >> On Tue, Apr 28, 2015 at 1:59 PM, Dan Williams >> wrote: >>> On Tue, Apr 28, 2015 at 1:52 PM, Andy Lutomirski >>> wrote: On Tue, Apr 28, 2015 at 11:24 AM, Dan Williams wrote: > Changes since v1 [1]: Incorporates feedback received prior to April 24. > > 1/ Ingo said [2]: > >"So why on earth is this whole concept and the naming itself >('drivers/block/nd/' stands for 'NFIT Defined', apparently) >revolving around a specific 'firmware' mindset and revolving >around specific, weirdly named, overly complicated looking >firmware interfaces that come with their own new weird >glossary??" > >Indeed, we of course consulted the NFIT specification to determine >the shape of the sub-system, but then let its terms and data >structures permeate too deep into the implementation. That is fixed >now with all NFIT specifics factored out into acpi.c. The NFIT is no >longer required reading to review libnd. Only three concepts are >needed: > > i/ PMEM - contiguous memory range where cpu stores are > persistent once they are flushed through the memory > controller. > > ii/ BLK - mmio apertures (sliding windows) that can be > programmed to access an aperture's-worth of persistent > media at a time. > > iii/ DPA - "dimm-physical-address", address space local to a > dimm. A dimm may provide both PMEM-mode and BLK-mode > access to a range of DPA. libnd manages allocation of DPA > to either PMEM or BLK-namespaces to resolve this aliasing. Mostly for my understanding: is there a name for "address relative to the address lines on the DIMM"? That is, a DIMM that exposes 8 GB of apparent physical memory, possibly interleaved, broken up, or weirdly remapped by the memory controller, would still have addresses between 0 and 8 GB. Some of those might be PMEM windows, some might be MMIO, some might be BLK apertures, etc. IIUC "DPA" refers to actual addressable storage, not this type of address? >>> >>> No, DPA is exactly as you describe above. You can't directly access >>> it except through a PMEM mapping (possibly interleaved with DPA from >>> other DIMMs) or a BLK aperture (mmio window into DPA). >> >> So the thing I'm describing has no name, then? Oh, well. > > What? The thing you are describing *is* DPA. I'm confused. Here are the two things I have in mind: 1. An address into on-DIMM storage. If I have a DIMM that is mapped to 8 GB of SPA but has 64 GB of usable storage (accessed through BLK apertures, say), then this address runs from 0 to 64 GB. 2. An address into the DIMM's view of physical address space. If I have a DIMM that is mapped to 8 GB of SPA but has 64 GB of usable storage (accessed through BLK apertures, say), then this address runs from 0 to 8 GB. There's a one-to-one mapping between SPA and this type of address. Since you said "a dimm may provide both PMEM-mode and BLK-mode access to a range of DPA.," I thought that DPA was #1. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 00/20] libnd: non-volatile memory device support
On Tue, Apr 28, 2015 at 2:06 PM, Andy Lutomirski wrote: > On Tue, Apr 28, 2015 at 1:59 PM, Dan Williams > wrote: >> On Tue, Apr 28, 2015 at 1:52 PM, Andy Lutomirski wrote: >>> On Tue, Apr 28, 2015 at 11:24 AM, Dan Williams >>> wrote: Changes since v1 [1]: Incorporates feedback received prior to April 24. 1/ Ingo said [2]: "So why on earth is this whole concept and the naming itself ('drivers/block/nd/' stands for 'NFIT Defined', apparently) revolving around a specific 'firmware' mindset and revolving around specific, weirdly named, overly complicated looking firmware interfaces that come with their own new weird glossary??" Indeed, we of course consulted the NFIT specification to determine the shape of the sub-system, but then let its terms and data structures permeate too deep into the implementation. That is fixed now with all NFIT specifics factored out into acpi.c. The NFIT is no longer required reading to review libnd. Only three concepts are needed: i/ PMEM - contiguous memory range where cpu stores are persistent once they are flushed through the memory controller. ii/ BLK - mmio apertures (sliding windows) that can be programmed to access an aperture's-worth of persistent media at a time. iii/ DPA - "dimm-physical-address", address space local to a dimm. A dimm may provide both PMEM-mode and BLK-mode access to a range of DPA. libnd manages allocation of DPA to either PMEM or BLK-namespaces to resolve this aliasing. >>> >>> Mostly for my understanding: is there a name for "address relative to >>> the address lines on the DIMM"? That is, a DIMM that exposes 8 GB of >>> apparent physical memory, possibly interleaved, broken up, or weirdly >>> remapped by the memory controller, would still have addresses between >>> 0 and 8 GB. Some of those might be PMEM windows, some might be MMIO, >>> some might be BLK apertures, etc. >>> >>> IIUC "DPA" refers to actual addressable storage, not this type of address? >> >> No, DPA is exactly as you describe above. You can't directly access >> it except through a PMEM mapping (possibly interleaved with DPA from >> other DIMMs) or a BLK aperture (mmio window into DPA). > > So the thing I'm describing has no name, then? Oh, well. What? The thing you are describing *is* DPA. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device support
On Tue, Apr 28, 2015 at 2:24 PM, Elliott, Robert (Server Storage) wrote: >> -Original Message- >> From: Linux-nvdimm [mailto:linux-nvdimm-boun...@lists.01.org] On Behalf Of >> Dan Williams >> Sent: Tuesday, April 28, 2015 1:24 PM >> To: linux-nvd...@lists.01.org >> Cc: Neil Brown; Dave Chinner; H. Peter Anvin; Christoph Hellwig; Rafael J. >> Wysocki; Robert Moore; Ingo Molnar; linux-a...@vger.kernel.org; Jens Axboe; >> Borislav Petkov; Thomas Gleixner; Greg KH; linux-kernel@vger.kernel.org; >> Andy Lutomirski; Andrew Morton; Linus Torvalds >> Subject: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device >> support >> >> Changes since v1 [1]: Incorporates feedback received prior to April 24. > > Here are some comments on the sysfs properties reported for a pmem device. > They are based on v1, but I don't think v2 changes anything. > > 1. This confuses lsblk (part of util-linux): > /sys/block/pmem0/device/type:4 > > lsblk shows: > NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT > pmem0 251:00 8G 0 worm > pmem1 251:16 0 8G 0 worm > pmem2 251:32 0 8G 0 worm > pmem3 251:48 0 8G 0 worm > pmem4 251:64 0 8G 0 worm > pmem5 251:80 0 8G 0 worm > pmem6 251:96 0 8G 0 worm > pmem7 251:112 0 8G 0 worm > > lsblk's blkdev_scsi_type_to_name() considers 4 to mean > SCSI_TYPE_WORM (write once read many ... used for certain optical > and tape drives). Why is lsblk assuming these are scsi devices? I'll need to go check that out. > I'm not sure what nd and pmem are doing to result in that value. That is their libnd specific device type number from include/uapi/ndctl.h. 4 == ND_DEVICE_NAMESPACE_IO. lsblk has no business interpreting this as something SCSI specific. > 2. To avoid confusing software trying to detect fast storage vs. > slow storage devices via sysfs, this value should be 0: > /sys/block/pmem0/queue/rotational:1 > > That can be done by adding this shortly after the blk_alloc_queue call: > queue_flag_set_unlocked(QUEUE_FLAG_NONROT, pmem->pmem_queue); Yeah, good catch. > 3. Is there any reason to have a 512 KiB limit on the transfer > length? > /sys/block/pmem0/queue/max_hw_sectors_kb:512 > > That is from: >blk_queue_max_hw_sectors(pmem->pmem_queue, 1024); I'd only change this from the default if performance testing showed it made a non-trivial difference. > 4. These are read-writeable, but IOs never reach a queue, so > the queue size is irrelevant and merging never happens: > /sys/block/pmem0/queue/nomerges:0 > /sys/block/pmem0/queue/nr_requests:128 > > Consider making them both read-only with: > * nomerges set to 2 (no merging happening) > * nr_requests as small as the block layer allows to avoid > wasting memory. > > 5. No scatter-gather lists are created by the driver, so these > read-only fields are meaningless: > /sys/block/pmem0/queue/max_segments:128 > /sys/block/pmem0/queue/max_segment_size:65536 > > Is there a better way to report them as irrelevant? Again it comes back to the question of whether these default settings are actively harmful. > > 6. There is no completion processing, so the read-writeable > cpu affinity is not used: > /sys/block/pmem0/queue/rq_affinity:0 > > Consider making it read-only and set to 2, meaning the > completions always run on the requesting CPU. There are no completions with pmem, the entire I/O path is synchronous. Ideally, this attribute would disappear for a pmem queue, not be set to 2. > 7. With mmap() allowing less than logical block sized accesses > to the device, this could be considered misleading: > /sys/block/pmem0/queue/physical_block_size:512 I don't see how it is misleading. If you access it as a block device the block size is 512. If the application is mmap() + DAX aware it knows that the physical_block_size is being bypassed. > > Perhaps that needs to be 1 byte or a cacheline size (64 bytes > on x86) to indicate that direct partial logical block accesses > are possible. No, because that breaks the definition of a block device. Through the bdev interface it's always accessed a block at a time. > The btt driver could report 512 as one indication > it is different. > > I wouldn't be surprised if smaller values than the logical block > size confused some software, though. Precisely why we shouldn't go there with pmem. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device support
> -Original Message- > From: Linux-nvdimm [mailto:linux-nvdimm-boun...@lists.01.org] On Behalf Of > Dan Williams > Sent: Tuesday, April 28, 2015 1:24 PM > To: linux-nvd...@lists.01.org > Cc: Neil Brown; Dave Chinner; H. Peter Anvin; Christoph Hellwig; Rafael J. > Wysocki; Robert Moore; Ingo Molnar; linux-a...@vger.kernel.org; Jens Axboe; > Borislav Petkov; Thomas Gleixner; Greg KH; linux-kernel@vger.kernel.org; > Andy Lutomirski; Andrew Morton; Linus Torvalds > Subject: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device > support > > Changes since v1 [1]: Incorporates feedback received prior to April 24. Here are some comments on the sysfs properties reported for a pmem device. They are based on v1, but I don't think v2 changes anything. 1. This confuses lsblk (part of util-linux): /sys/block/pmem0/device/type:4 lsblk shows: NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT pmem0 251:00 8G 0 worm pmem1 251:16 0 8G 0 worm pmem2 251:32 0 8G 0 worm pmem3 251:48 0 8G 0 worm pmem4 251:64 0 8G 0 worm pmem5 251:80 0 8G 0 worm pmem6 251:96 0 8G 0 worm pmem7 251:112 0 8G 0 worm lsblk's blkdev_scsi_type_to_name() considers 4 to mean SCSI_TYPE_WORM (write once read many ... used for certain optical and tape drives). I'm not sure what nd and pmem are doing to result in that value. 2. To avoid confusing software trying to detect fast storage vs. slow storage devices via sysfs, this value should be 0: /sys/block/pmem0/queue/rotational:1 That can be done by adding this shortly after the blk_alloc_queue call: queue_flag_set_unlocked(QUEUE_FLAG_NONROT, pmem->pmem_queue); 3. Is there any reason to have a 512 KiB limit on the transfer length? /sys/block/pmem0/queue/max_hw_sectors_kb:512 That is from: blk_queue_max_hw_sectors(pmem->pmem_queue, 1024); 4. These are read-writeable, but IOs never reach a queue, so the queue size is irrelevant and merging never happens: /sys/block/pmem0/queue/nomerges:0 /sys/block/pmem0/queue/nr_requests:128 Consider making them both read-only with: * nomerges set to 2 (no merging happening) * nr_requests as small as the block layer allows to avoid wasting memory. 5. No scatter-gather lists are created by the driver, so these read-only fields are meaningless: /sys/block/pmem0/queue/max_segments:128 /sys/block/pmem0/queue/max_segment_size:65536 Is there a better way to report them as irrelevant? 6. There is no completion processing, so the read-writeable cpu affinity is not used: /sys/block/pmem0/queue/rq_affinity:0 Consider making it read-only and set to 2, meaning the completions always run on the requesting CPU. 7. With mmap() allowing less than logical block sized accesses to the device, this could be considered misleading: /sys/block/pmem0/queue/physical_block_size:512 Perhaps that needs to be 1 byte or a cacheline size (64 bytes on x86) to indicate that direct partial logical block accesses are possible. The btt driver could report 512 as one indication it is different. I wouldn't be surprised if smaller values than the logical block size confused some software, though. --- Robert Elliott, HP Server Storage -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 00/20] libnd: non-volatile memory device support
On Tue, Apr 28, 2015 at 1:59 PM, Dan Williams wrote: > On Tue, Apr 28, 2015 at 1:52 PM, Andy Lutomirski wrote: >> On Tue, Apr 28, 2015 at 11:24 AM, Dan Williams >> wrote: >>> Changes since v1 [1]: Incorporates feedback received prior to April 24. >>> >>> 1/ Ingo said [2]: >>> >>>"So why on earth is this whole concept and the naming itself >>>('drivers/block/nd/' stands for 'NFIT Defined', apparently) >>>revolving around a specific 'firmware' mindset and revolving >>>around specific, weirdly named, overly complicated looking >>>firmware interfaces that come with their own new weird >>>glossary??" >>> >>>Indeed, we of course consulted the NFIT specification to determine >>>the shape of the sub-system, but then let its terms and data >>>structures permeate too deep into the implementation. That is fixed >>>now with all NFIT specifics factored out into acpi.c. The NFIT is no >>>longer required reading to review libnd. Only three concepts are >>>needed: >>> >>> i/ PMEM - contiguous memory range where cpu stores are >>> persistent once they are flushed through the memory >>> controller. >>> >>> ii/ BLK - mmio apertures (sliding windows) that can be >>> programmed to access an aperture's-worth of persistent >>> media at a time. >>> >>> iii/ DPA - "dimm-physical-address", address space local to a >>> dimm. A dimm may provide both PMEM-mode and BLK-mode >>> access to a range of DPA. libnd manages allocation of DPA >>> to either PMEM or BLK-namespaces to resolve this aliasing. >> >> Mostly for my understanding: is there a name for "address relative to >> the address lines on the DIMM"? That is, a DIMM that exposes 8 GB of >> apparent physical memory, possibly interleaved, broken up, or weirdly >> remapped by the memory controller, would still have addresses between >> 0 and 8 GB. Some of those might be PMEM windows, some might be MMIO, >> some might be BLK apertures, etc. >> >> IIUC "DPA" refers to actual addressable storage, not this type of address? > > No, DPA is exactly as you describe above. You can't directly access > it except through a PMEM mapping (possibly interleaved with DPA from > other DIMMs) or a BLK aperture (mmio window into DPA). So the thing I'm describing has no name, then? Oh, well. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 00/20] libnd: non-volatile memory device support
On Tue, Apr 28, 2015 at 1:52 PM, Andy Lutomirski wrote: > On Tue, Apr 28, 2015 at 11:24 AM, Dan Williams > wrote: >> Changes since v1 [1]: Incorporates feedback received prior to April 24. >> >> 1/ Ingo said [2]: >> >>"So why on earth is this whole concept and the naming itself >>('drivers/block/nd/' stands for 'NFIT Defined', apparently) >>revolving around a specific 'firmware' mindset and revolving >>around specific, weirdly named, overly complicated looking >>firmware interfaces that come with their own new weird >>glossary??" >> >>Indeed, we of course consulted the NFIT specification to determine >>the shape of the sub-system, but then let its terms and data >>structures permeate too deep into the implementation. That is fixed >>now with all NFIT specifics factored out into acpi.c. The NFIT is no >>longer required reading to review libnd. Only three concepts are >>needed: >> >> i/ PMEM - contiguous memory range where cpu stores are >> persistent once they are flushed through the memory >> controller. >> >> ii/ BLK - mmio apertures (sliding windows) that can be >> programmed to access an aperture's-worth of persistent >> media at a time. >> >> iii/ DPA - "dimm-physical-address", address space local to a >> dimm. A dimm may provide both PMEM-mode and BLK-mode >> access to a range of DPA. libnd manages allocation of DPA >> to either PMEM or BLK-namespaces to resolve this aliasing. > > Mostly for my understanding: is there a name for "address relative to > the address lines on the DIMM"? That is, a DIMM that exposes 8 GB of > apparent physical memory, possibly interleaved, broken up, or weirdly > remapped by the memory controller, would still have addresses between > 0 and 8 GB. Some of those might be PMEM windows, some might be MMIO, > some might be BLK apertures, etc. > > IIUC "DPA" refers to actual addressable storage, not this type of address? No, DPA is exactly as you describe above. You can't directly access it except through a PMEM mapping (possibly interleaved with DPA from other DIMMs) or a BLK aperture (mmio window into DPA). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 00/20] libnd: non-volatile memory device support
On Tue, Apr 28, 2015 at 11:24 AM, Dan Williams wrote: > Changes since v1 [1]: Incorporates feedback received prior to April 24. > > 1/ Ingo said [2]: > >"So why on earth is this whole concept and the naming itself >('drivers/block/nd/' stands for 'NFIT Defined', apparently) >revolving around a specific 'firmware' mindset and revolving >around specific, weirdly named, overly complicated looking >firmware interfaces that come with their own new weird >glossary??" > >Indeed, we of course consulted the NFIT specification to determine >the shape of the sub-system, but then let its terms and data >structures permeate too deep into the implementation. That is fixed >now with all NFIT specifics factored out into acpi.c. The NFIT is no >longer required reading to review libnd. Only three concepts are >needed: > > i/ PMEM - contiguous memory range where cpu stores are > persistent once they are flushed through the memory > controller. > > ii/ BLK - mmio apertures (sliding windows) that can be > programmed to access an aperture's-worth of persistent > media at a time. > > iii/ DPA - "dimm-physical-address", address space local to a > dimm. A dimm may provide both PMEM-mode and BLK-mode > access to a range of DPA. libnd manages allocation of DPA > to either PMEM or BLK-namespaces to resolve this aliasing. Mostly for my understanding: is there a name for "address relative to the address lines on the DIMM"? That is, a DIMM that exposes 8 GB of apparent physical memory, possibly interleaved, broken up, or weirdly remapped by the memory controller, would still have addresses between 0 and 8 GB. Some of those might be PMEM windows, some might be MMIO, some might be BLK apertures, etc. IIUC "DPA" refers to actual addressable storage, not this type of address? --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 00/20] libnd: non-volatile memory device support
Changes since v1 [1]: Incorporates feedback received prior to April 24. 1/ Ingo said [2]: "So why on earth is this whole concept and the naming itself ('drivers/block/nd/' stands for 'NFIT Defined', apparently) revolving around a specific 'firmware' mindset and revolving around specific, weirdly named, overly complicated looking firmware interfaces that come with their own new weird glossary??" Indeed, we of course consulted the NFIT specification to determine the shape of the sub-system, but then let its terms and data structures permeate too deep into the implementation. That is fixed now with all NFIT specifics factored out into acpi.c. The NFIT is no longer required reading to review libnd. Only three concepts are needed: i/ PMEM - contiguous memory range where cpu stores are persistent once they are flushed through the memory controller. ii/ BLK - mmio apertures (sliding windows) that can be programmed to access an aperture's-worth of persistent media at a time. iii/ DPA - "dimm-physical-address", address space local to a dimm. A dimm may provide both PMEM-mode and BLK-mode access to a range of DPA. libnd manages allocation of DPA to either PMEM or BLK-namespaces to resolve this aliasing. The v1..v2 diffstat below shows the migration of nfit-specifics to acpi.c and the new state of libnd being nfit-free. "nd" now only refers to "non-volatile devices". Note, reworked documentation will return once the review has settled. Documentation/blockdev/nd.txt | 867 - MAINTAINERS | 34 +- arch/ia64/kernel/efi.c|5 +- arch/x86/kernel/e820.c| 11 +- arch/x86/kernel/pmem.c|2 +- drivers/block/Makefile|2 +- drivers/block/nd/Kconfig | 135 ++-- drivers/block/nd/Makefile | 32 +- drivers/block/nd/acpi.c | 1506 +++-- drivers/block/nd/acpi_nfit.h | 321 drivers/block/nd/blk.c| 27 +- drivers/block/nd/btt.c|6 +- drivers/block/nd/btt_devs.c |8 +- drivers/block/nd/bus.c| 337 + drivers/block/nd/core.c | 574 +- drivers/block/nd/dimm.c | 11 - drivers/block/nd/dimm_devs.c | 292 ++- drivers/block/nd/e820.c | 100 +++ drivers/block/nd/libnd.h | 122 +++ drivers/block/nd/namespace_devs.c | 10 +- drivers/block/nd/nd-private.h | 107 +-- drivers/block/nd/nd.h | 91 +-- drivers/block/nd/nfit.h | 238 -- drivers/block/nd/pmem.c | 56 +- drivers/block/nd/region.c | 78 +- drivers/block/nd/region_devs.c| 783 +++ drivers/block/nd/test/iomap.c | 86 +-- drivers/block/nd/test/nfit.c | 1115 +++ drivers/block/nd/test/nfit_test.h | 15 +- include/uapi/linux/ndctl.h| 130 ++-- 30 files changed, 3166 insertions(+), 3935 deletions(-) delete mode 100644 Documentation/blockdev/nd.txt create mode 100644 drivers/block/nd/acpi_nfit.h create mode 100644 drivers/block/nd/e820.c create mode 100644 drivers/block/nd/libnd.h delete mode 100644 drivers/block/nd/nfit.h [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-April/000484.html [2]: https://lists.01.org/pipermail/linux-nvdimm/2015-April/000520.html 2/ Christoph asked the pmem ida conversion to be moved to its own patch (done), and to consider leaving the current pmem.c in drivers/block/. Instead, I converted the e820-type-12 enabling to be the first non-ACPI-NFIT based consumer of libnd. The new nd_e820 driver simply registers e820-type-12 ranges as libnd PMEM regions. Among other things this conversion enables BTT for these ranges. The alternative is to move drivers/block/nd/nd.h internals out to include/linux/ which I think is worse. 3/ Toshi reported that the NFIT parsing fails to handle the case of a PMEM range with a single-dimm (non-aliasing) interleave description. Support for this case was added and is tested by default by the nfit_test.1 configuration. 4/ Toshi reported that we should not be treating a missing _STA property as a "dimm disabled by firmware" case. (fixed). 5/ Christoph noted that ND_ARCH_HAS_IOREMAP_CACHE needs to be moved to arch code. It is gone for now and we'll revisit when adding cached mappings back to the PMEM driver. 6/ Toshi mentioned that the presence of two different nd_bus_probe() functions was confusing. (cleaned up). 7/ Robert asked for s/btt_checksum/nd_btt_checksum/ (done). 8/ Linda asked for nfit_test to honor dynamic cma reservations via the cma= command line (done). The cma requirements have also been
Re: [PATCH v2 00/20] libnd: non-volatile memory device support
On Tue, Apr 28, 2015 at 3:28 PM, Dan Williams dan.j.willi...@intel.com wrote: On Tue, Apr 28, 2015 at 2:06 PM, Andy Lutomirski l...@amacapital.net wrote: On Tue, Apr 28, 2015 at 1:59 PM, Dan Williams dan.j.willi...@intel.com wrote: On Tue, Apr 28, 2015 at 1:52 PM, Andy Lutomirski l...@amacapital.net wrote: On Tue, Apr 28, 2015 at 11:24 AM, Dan Williams dan.j.willi...@intel.com wrote: Changes since v1 [1]: Incorporates feedback received prior to April 24. 1/ Ingo said [2]: So why on earth is this whole concept and the naming itself ('drivers/block/nd/' stands for 'NFIT Defined', apparently) revolving around a specific 'firmware' mindset and revolving around specific, weirdly named, overly complicated looking firmware interfaces that come with their own new weird glossary?? Indeed, we of course consulted the NFIT specification to determine the shape of the sub-system, but then let its terms and data structures permeate too deep into the implementation. That is fixed now with all NFIT specifics factored out into acpi.c. The NFIT is no longer required reading to review libnd. Only three concepts are needed: i/ PMEM - contiguous memory range where cpu stores are persistent once they are flushed through the memory controller. ii/ BLK - mmio apertures (sliding windows) that can be programmed to access an aperture's-worth of persistent media at a time. iii/ DPA - dimm-physical-address, address space local to a dimm. A dimm may provide both PMEM-mode and BLK-mode access to a range of DPA. libnd manages allocation of DPA to either PMEM or BLK-namespaces to resolve this aliasing. Mostly for my understanding: is there a name for address relative to the address lines on the DIMM? That is, a DIMM that exposes 8 GB of apparent physical memory, possibly interleaved, broken up, or weirdly remapped by the memory controller, would still have addresses between 0 and 8 GB. Some of those might be PMEM windows, some might be MMIO, some might be BLK apertures, etc. IIUC DPA refers to actual addressable storage, not this type of address? No, DPA is exactly as you describe above. You can't directly access it except through a PMEM mapping (possibly interleaved with DPA from other DIMMs) or a BLK aperture (mmio window into DPA). So the thing I'm describing has no name, then? Oh, well. What? The thing you are describing *is* DPA. I'm confused. Here are the two things I have in mind: 1. An address into on-DIMM storage. If I have a DIMM that is mapped to 8 GB of SPA but has 64 GB of usable storage (accessed through BLK apertures, say), then this address runs from 0 to 64 GB. 2. An address into the DIMM's view of physical address space. If I have a DIMM that is mapped to 8 GB of SPA but has 64 GB of usable storage (accessed through BLK apertures, say), then this address runs from 0 to 8 GB. There's a one-to-one mapping between SPA and this type of address. Since you said a dimm may provide both PMEM-mode and BLK-mode access to a range of DPA., I thought that DPA was #1. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 00/20] libnd: non-volatile memory device support
On Tue, Apr 28, 2015 at 5:25 PM, Rafael J. Wysocki r...@rjwysocki.net wrote: On Tuesday, April 28, 2015 02:24:12 PM Dan Williams wrote: Changes since v1 [1]: Incorporates feedback received prior to April 24. 1/ Ingo said [2]: So why on earth is this whole concept and the naming itself ('drivers/block/nd/' stands for 'NFIT Defined', apparently) revolving around a specific 'firmware' mindset and revolving around specific, weirdly named, overly complicated looking firmware interfaces that come with their own new weird glossary?? Indeed, we of course consulted the NFIT specification to determine the shape of the sub-system, but then let its terms and data structures permeate too deep into the implementation. That is fixed now with all NFIT specifics factored out into acpi.c. The NFIT is no longer required reading to review libnd. Only three concepts are needed: i/ PMEM - contiguous memory range where cpu stores are persistent once they are flushed through the memory controller. ii/ BLK - mmio apertures (sliding windows) that can be programmed to access an aperture's-worth of persistent media at a time. iii/ DPA - dimm-physical-address, address space local to a dimm. A dimm may provide both PMEM-mode and BLK-mode access to a range of DPA. libnd manages allocation of DPA to either PMEM or BLK-namespaces to resolve this aliasing. The v1..v2 diffstat below shows the migration of nfit-specifics to acpi.c and the new state of libnd being nfit-free. nd now only refers to non-volatile devices. Note, reworked documentation will return once the review has settled. Documentation/blockdev/nd.txt | 867 - MAINTAINERS | 34 +- arch/ia64/kernel/efi.c|5 +- arch/x86/kernel/e820.c| 11 +- arch/x86/kernel/pmem.c|2 +- drivers/block/Makefile|2 +- drivers/block/nd/Kconfig | 135 ++-- drivers/block/nd/Makefile | 32 +- drivers/block/nd/acpi.c | 1506 +++-- drivers/block/nd/acpi_nfit.h | 321 drivers/block/nd/blk.c| 27 +- drivers/block/nd/btt.c|6 +- drivers/block/nd/btt_devs.c |8 +- drivers/block/nd/bus.c| 337 + drivers/block/nd/core.c | 574 +- drivers/block/nd/dimm.c | 11 - drivers/block/nd/dimm_devs.c | 292 ++- drivers/block/nd/e820.c | 100 +++ drivers/block/nd/libnd.h | 122 +++ drivers/block/nd/namespace_devs.c | 10 +- drivers/block/nd/nd-private.h | 107 +-- drivers/block/nd/nd.h | 91 +-- drivers/block/nd/nfit.h | 238 -- drivers/block/nd/pmem.c | 56 +- drivers/block/nd/region.c | 78 +- drivers/block/nd/region_devs.c| 783 +++ drivers/block/nd/test/iomap.c | 86 +-- drivers/block/nd/test/nfit.c | 1115 +++ drivers/block/nd/test/nfit_test.h | 15 +- include/uapi/linux/ndctl.h| 130 ++-- 30 files changed, 3166 insertions(+), 3935 deletions(-) delete mode 100644 Documentation/blockdev/nd.txt create mode 100644 drivers/block/nd/acpi_nfit.h create mode 100644 drivers/block/nd/e820.c create mode 100644 drivers/block/nd/libnd.h delete mode 100644 drivers/block/nd/nfit.h [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-April/000484.html [2]: https://lists.01.org/pipermail/linux-nvdimm/2015-April/000520.html 2/ Christoph asked the pmem ida conversion to be moved to its own patch (done), and to consider leaving the current pmem.c in drivers/block/. Instead, I converted the e820-type-12 enabling to be the first non-ACPI-NFIT based consumer of libnd. The new nd_e820 driver simply registers e820-type-12 ranges as libnd PMEM regions. Among other things this conversion enables BTT for these ranges. The alternative is to move drivers/block/nd/nd.h internals out to include/linux/ which I think is worse. 3/ Toshi reported that the NFIT parsing fails to handle the case of a PMEM range with a single-dimm (non-aliasing) interleave description. Support for this case was added and is tested by default by the nfit_test.1 configuration. 4/ Toshi reported that we should not be treating a missing _STA property as a dimm disabled by firmware case. (fixed). 5/ Christoph noted that ND_ARCH_HAS_IOREMAP_CACHE needs to be moved to arch code. It is gone for now and we'll revisit when adding cached mappings back to the PMEM driver. 6/ Toshi mentioned that the presence of two different nd_bus_probe() functions was confusing. (cleaned
Re: [PATCH v2 00/20] libnd: non-volatile memory device support
On Tuesday, April 28, 2015 02:24:12 PM Dan Williams wrote: Changes since v1 [1]: Incorporates feedback received prior to April 24. 1/ Ingo said [2]: So why on earth is this whole concept and the naming itself ('drivers/block/nd/' stands for 'NFIT Defined', apparently) revolving around a specific 'firmware' mindset and revolving around specific, weirdly named, overly complicated looking firmware interfaces that come with their own new weird glossary?? Indeed, we of course consulted the NFIT specification to determine the shape of the sub-system, but then let its terms and data structures permeate too deep into the implementation. That is fixed now with all NFIT specifics factored out into acpi.c. The NFIT is no longer required reading to review libnd. Only three concepts are needed: i/ PMEM - contiguous memory range where cpu stores are persistent once they are flushed through the memory controller. ii/ BLK - mmio apertures (sliding windows) that can be programmed to access an aperture's-worth of persistent media at a time. iii/ DPA - dimm-physical-address, address space local to a dimm. A dimm may provide both PMEM-mode and BLK-mode access to a range of DPA. libnd manages allocation of DPA to either PMEM or BLK-namespaces to resolve this aliasing. The v1..v2 diffstat below shows the migration of nfit-specifics to acpi.c and the new state of libnd being nfit-free. nd now only refers to non-volatile devices. Note, reworked documentation will return once the review has settled. Documentation/blockdev/nd.txt | 867 - MAINTAINERS | 34 +- arch/ia64/kernel/efi.c|5 +- arch/x86/kernel/e820.c| 11 +- arch/x86/kernel/pmem.c|2 +- drivers/block/Makefile|2 +- drivers/block/nd/Kconfig | 135 ++-- drivers/block/nd/Makefile | 32 +- drivers/block/nd/acpi.c | 1506 +++-- drivers/block/nd/acpi_nfit.h | 321 drivers/block/nd/blk.c| 27 +- drivers/block/nd/btt.c|6 +- drivers/block/nd/btt_devs.c |8 +- drivers/block/nd/bus.c| 337 + drivers/block/nd/core.c | 574 +- drivers/block/nd/dimm.c | 11 - drivers/block/nd/dimm_devs.c | 292 ++- drivers/block/nd/e820.c | 100 +++ drivers/block/nd/libnd.h | 122 +++ drivers/block/nd/namespace_devs.c | 10 +- drivers/block/nd/nd-private.h | 107 +-- drivers/block/nd/nd.h | 91 +-- drivers/block/nd/nfit.h | 238 -- drivers/block/nd/pmem.c | 56 +- drivers/block/nd/region.c | 78 +- drivers/block/nd/region_devs.c| 783 +++ drivers/block/nd/test/iomap.c | 86 +-- drivers/block/nd/test/nfit.c | 1115 +++ drivers/block/nd/test/nfit_test.h | 15 +- include/uapi/linux/ndctl.h| 130 ++-- 30 files changed, 3166 insertions(+), 3935 deletions(-) delete mode 100644 Documentation/blockdev/nd.txt create mode 100644 drivers/block/nd/acpi_nfit.h create mode 100644 drivers/block/nd/e820.c create mode 100644 drivers/block/nd/libnd.h delete mode 100644 drivers/block/nd/nfit.h [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-April/000484.html [2]: https://lists.01.org/pipermail/linux-nvdimm/2015-April/000520.html 2/ Christoph asked the pmem ida conversion to be moved to its own patch (done), and to consider leaving the current pmem.c in drivers/block/. Instead, I converted the e820-type-12 enabling to be the first non-ACPI-NFIT based consumer of libnd. The new nd_e820 driver simply registers e820-type-12 ranges as libnd PMEM regions. Among other things this conversion enables BTT for these ranges. The alternative is to move drivers/block/nd/nd.h internals out to include/linux/ which I think is worse. 3/ Toshi reported that the NFIT parsing fails to handle the case of a PMEM range with a single-dimm (non-aliasing) interleave description. Support for this case was added and is tested by default by the nfit_test.1 configuration. 4/ Toshi reported that we should not be treating a missing _STA property as a dimm disabled by firmware case. (fixed). 5/ Christoph noted that ND_ARCH_HAS_IOREMAP_CACHE needs to be moved to arch code. It is gone for now and we'll revisit when adding cached mappings back to the PMEM driver. 6/ Toshi mentioned that the presence of two different nd_bus_probe() functions was confusing. (cleaned up). 7/ Robert asked for s/btt_checksum/nd_btt_checksum/
Re: [PATCH v2 00/20] libnd: non-volatile memory device support
On Tue, Apr 28, 2015 at 1:59 PM, Dan Williams dan.j.willi...@intel.com wrote: On Tue, Apr 28, 2015 at 1:52 PM, Andy Lutomirski l...@amacapital.net wrote: On Tue, Apr 28, 2015 at 11:24 AM, Dan Williams dan.j.willi...@intel.com wrote: Changes since v1 [1]: Incorporates feedback received prior to April 24. 1/ Ingo said [2]: So why on earth is this whole concept and the naming itself ('drivers/block/nd/' stands for 'NFIT Defined', apparently) revolving around a specific 'firmware' mindset and revolving around specific, weirdly named, overly complicated looking firmware interfaces that come with their own new weird glossary?? Indeed, we of course consulted the NFIT specification to determine the shape of the sub-system, but then let its terms and data structures permeate too deep into the implementation. That is fixed now with all NFIT specifics factored out into acpi.c. The NFIT is no longer required reading to review libnd. Only three concepts are needed: i/ PMEM - contiguous memory range where cpu stores are persistent once they are flushed through the memory controller. ii/ BLK - mmio apertures (sliding windows) that can be programmed to access an aperture's-worth of persistent media at a time. iii/ DPA - dimm-physical-address, address space local to a dimm. A dimm may provide both PMEM-mode and BLK-mode access to a range of DPA. libnd manages allocation of DPA to either PMEM or BLK-namespaces to resolve this aliasing. Mostly for my understanding: is there a name for address relative to the address lines on the DIMM? That is, a DIMM that exposes 8 GB of apparent physical memory, possibly interleaved, broken up, or weirdly remapped by the memory controller, would still have addresses between 0 and 8 GB. Some of those might be PMEM windows, some might be MMIO, some might be BLK apertures, etc. IIUC DPA refers to actual addressable storage, not this type of address? No, DPA is exactly as you describe above. You can't directly access it except through a PMEM mapping (possibly interleaved with DPA from other DIMMs) or a BLK aperture (mmio window into DPA). So the thing I'm describing has no name, then? Oh, well. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 00/20] libnd: non-volatile memory device support
On Tue, Apr 28, 2015 at 1:52 PM, Andy Lutomirski l...@amacapital.net wrote: On Tue, Apr 28, 2015 at 11:24 AM, Dan Williams dan.j.willi...@intel.com wrote: Changes since v1 [1]: Incorporates feedback received prior to April 24. 1/ Ingo said [2]: So why on earth is this whole concept and the naming itself ('drivers/block/nd/' stands for 'NFIT Defined', apparently) revolving around a specific 'firmware' mindset and revolving around specific, weirdly named, overly complicated looking firmware interfaces that come with their own new weird glossary?? Indeed, we of course consulted the NFIT specification to determine the shape of the sub-system, but then let its terms and data structures permeate too deep into the implementation. That is fixed now with all NFIT specifics factored out into acpi.c. The NFIT is no longer required reading to review libnd. Only three concepts are needed: i/ PMEM - contiguous memory range where cpu stores are persistent once they are flushed through the memory controller. ii/ BLK - mmio apertures (sliding windows) that can be programmed to access an aperture's-worth of persistent media at a time. iii/ DPA - dimm-physical-address, address space local to a dimm. A dimm may provide both PMEM-mode and BLK-mode access to a range of DPA. libnd manages allocation of DPA to either PMEM or BLK-namespaces to resolve this aliasing. Mostly for my understanding: is there a name for address relative to the address lines on the DIMM? That is, a DIMM that exposes 8 GB of apparent physical memory, possibly interleaved, broken up, or weirdly remapped by the memory controller, would still have addresses between 0 and 8 GB. Some of those might be PMEM windows, some might be MMIO, some might be BLK apertures, etc. IIUC DPA refers to actual addressable storage, not this type of address? No, DPA is exactly as you describe above. You can't directly access it except through a PMEM mapping (possibly interleaved with DPA from other DIMMs) or a BLK aperture (mmio window into DPA). -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 00/20] libnd: non-volatile memory device support
On Tue, Apr 28, 2015 at 2:06 PM, Andy Lutomirski l...@amacapital.net wrote: On Tue, Apr 28, 2015 at 1:59 PM, Dan Williams dan.j.willi...@intel.com wrote: On Tue, Apr 28, 2015 at 1:52 PM, Andy Lutomirski l...@amacapital.net wrote: On Tue, Apr 28, 2015 at 11:24 AM, Dan Williams dan.j.willi...@intel.com wrote: Changes since v1 [1]: Incorporates feedback received prior to April 24. 1/ Ingo said [2]: So why on earth is this whole concept and the naming itself ('drivers/block/nd/' stands for 'NFIT Defined', apparently) revolving around a specific 'firmware' mindset and revolving around specific, weirdly named, overly complicated looking firmware interfaces that come with their own new weird glossary?? Indeed, we of course consulted the NFIT specification to determine the shape of the sub-system, but then let its terms and data structures permeate too deep into the implementation. That is fixed now with all NFIT specifics factored out into acpi.c. The NFIT is no longer required reading to review libnd. Only three concepts are needed: i/ PMEM - contiguous memory range where cpu stores are persistent once they are flushed through the memory controller. ii/ BLK - mmio apertures (sliding windows) that can be programmed to access an aperture's-worth of persistent media at a time. iii/ DPA - dimm-physical-address, address space local to a dimm. A dimm may provide both PMEM-mode and BLK-mode access to a range of DPA. libnd manages allocation of DPA to either PMEM or BLK-namespaces to resolve this aliasing. Mostly for my understanding: is there a name for address relative to the address lines on the DIMM? That is, a DIMM that exposes 8 GB of apparent physical memory, possibly interleaved, broken up, or weirdly remapped by the memory controller, would still have addresses between 0 and 8 GB. Some of those might be PMEM windows, some might be MMIO, some might be BLK apertures, etc. IIUC DPA refers to actual addressable storage, not this type of address? No, DPA is exactly as you describe above. You can't directly access it except through a PMEM mapping (possibly interleaved with DPA from other DIMMs) or a BLK aperture (mmio window into DPA). So the thing I'm describing has no name, then? Oh, well. What? The thing you are describing *is* DPA. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 00/20] libnd: non-volatile memory device support
On Tue, Apr 28, 2015 at 11:24 AM, Dan Williams dan.j.willi...@intel.com wrote: Changes since v1 [1]: Incorporates feedback received prior to April 24. 1/ Ingo said [2]: So why on earth is this whole concept and the naming itself ('drivers/block/nd/' stands for 'NFIT Defined', apparently) revolving around a specific 'firmware' mindset and revolving around specific, weirdly named, overly complicated looking firmware interfaces that come with their own new weird glossary?? Indeed, we of course consulted the NFIT specification to determine the shape of the sub-system, but then let its terms and data structures permeate too deep into the implementation. That is fixed now with all NFIT specifics factored out into acpi.c. The NFIT is no longer required reading to review libnd. Only three concepts are needed: i/ PMEM - contiguous memory range where cpu stores are persistent once they are flushed through the memory controller. ii/ BLK - mmio apertures (sliding windows) that can be programmed to access an aperture's-worth of persistent media at a time. iii/ DPA - dimm-physical-address, address space local to a dimm. A dimm may provide both PMEM-mode and BLK-mode access to a range of DPA. libnd manages allocation of DPA to either PMEM or BLK-namespaces to resolve this aliasing. Mostly for my understanding: is there a name for address relative to the address lines on the DIMM? That is, a DIMM that exposes 8 GB of apparent physical memory, possibly interleaved, broken up, or weirdly remapped by the memory controller, would still have addresses between 0 and 8 GB. Some of those might be PMEM windows, some might be MMIO, some might be BLK apertures, etc. IIUC DPA refers to actual addressable storage, not this type of address? --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device support
-Original Message- From: Linux-nvdimm [mailto:linux-nvdimm-boun...@lists.01.org] On Behalf Of Dan Williams Sent: Tuesday, April 28, 2015 1:24 PM To: linux-nvd...@lists.01.org Cc: Neil Brown; Dave Chinner; H. Peter Anvin; Christoph Hellwig; Rafael J. Wysocki; Robert Moore; Ingo Molnar; linux-a...@vger.kernel.org; Jens Axboe; Borislav Petkov; Thomas Gleixner; Greg KH; linux-kernel@vger.kernel.org; Andy Lutomirski; Andrew Morton; Linus Torvalds Subject: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device support Changes since v1 [1]: Incorporates feedback received prior to April 24. Here are some comments on the sysfs properties reported for a pmem device. They are based on v1, but I don't think v2 changes anything. 1. This confuses lsblk (part of util-linux): /sys/block/pmem0/device/type:4 lsblk shows: NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT pmem0 251:00 8G 0 worm pmem1 251:16 0 8G 0 worm pmem2 251:32 0 8G 0 worm pmem3 251:48 0 8G 0 worm pmem4 251:64 0 8G 0 worm pmem5 251:80 0 8G 0 worm pmem6 251:96 0 8G 0 worm pmem7 251:112 0 8G 0 worm lsblk's blkdev_scsi_type_to_name() considers 4 to mean SCSI_TYPE_WORM (write once read many ... used for certain optical and tape drives). I'm not sure what nd and pmem are doing to result in that value. 2. To avoid confusing software trying to detect fast storage vs. slow storage devices via sysfs, this value should be 0: /sys/block/pmem0/queue/rotational:1 That can be done by adding this shortly after the blk_alloc_queue call: queue_flag_set_unlocked(QUEUE_FLAG_NONROT, pmem-pmem_queue); 3. Is there any reason to have a 512 KiB limit on the transfer length? /sys/block/pmem0/queue/max_hw_sectors_kb:512 That is from: blk_queue_max_hw_sectors(pmem-pmem_queue, 1024); 4. These are read-writeable, but IOs never reach a queue, so the queue size is irrelevant and merging never happens: /sys/block/pmem0/queue/nomerges:0 /sys/block/pmem0/queue/nr_requests:128 Consider making them both read-only with: * nomerges set to 2 (no merging happening) * nr_requests as small as the block layer allows to avoid wasting memory. 5. No scatter-gather lists are created by the driver, so these read-only fields are meaningless: /sys/block/pmem0/queue/max_segments:128 /sys/block/pmem0/queue/max_segment_size:65536 Is there a better way to report them as irrelevant? 6. There is no completion processing, so the read-writeable cpu affinity is not used: /sys/block/pmem0/queue/rq_affinity:0 Consider making it read-only and set to 2, meaning the completions always run on the requesting CPU. 7. With mmap() allowing less than logical block sized accesses to the device, this could be considered misleading: /sys/block/pmem0/queue/physical_block_size:512 Perhaps that needs to be 1 byte or a cacheline size (64 bytes on x86) to indicate that direct partial logical block accesses are possible. The btt driver could report 512 as one indication it is different. I wouldn't be surprised if smaller values than the logical block size confused some software, though. --- Robert Elliott, HP Server Storage -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device support
On Tue, Apr 28, 2015 at 2:24 PM, Elliott, Robert (Server Storage) elli...@hp.com wrote: -Original Message- From: Linux-nvdimm [mailto:linux-nvdimm-boun...@lists.01.org] On Behalf Of Dan Williams Sent: Tuesday, April 28, 2015 1:24 PM To: linux-nvd...@lists.01.org Cc: Neil Brown; Dave Chinner; H. Peter Anvin; Christoph Hellwig; Rafael J. Wysocki; Robert Moore; Ingo Molnar; linux-a...@vger.kernel.org; Jens Axboe; Borislav Petkov; Thomas Gleixner; Greg KH; linux-kernel@vger.kernel.org; Andy Lutomirski; Andrew Morton; Linus Torvalds Subject: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device support Changes since v1 [1]: Incorporates feedback received prior to April 24. Here are some comments on the sysfs properties reported for a pmem device. They are based on v1, but I don't think v2 changes anything. 1. This confuses lsblk (part of util-linux): /sys/block/pmem0/device/type:4 lsblk shows: NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT pmem0 251:00 8G 0 worm pmem1 251:16 0 8G 0 worm pmem2 251:32 0 8G 0 worm pmem3 251:48 0 8G 0 worm pmem4 251:64 0 8G 0 worm pmem5 251:80 0 8G 0 worm pmem6 251:96 0 8G 0 worm pmem7 251:112 0 8G 0 worm lsblk's blkdev_scsi_type_to_name() considers 4 to mean SCSI_TYPE_WORM (write once read many ... used for certain optical and tape drives). Why is lsblk assuming these are scsi devices? I'll need to go check that out. I'm not sure what nd and pmem are doing to result in that value. That is their libnd specific device type number from include/uapi/ndctl.h. 4 == ND_DEVICE_NAMESPACE_IO. lsblk has no business interpreting this as something SCSI specific. 2. To avoid confusing software trying to detect fast storage vs. slow storage devices via sysfs, this value should be 0: /sys/block/pmem0/queue/rotational:1 That can be done by adding this shortly after the blk_alloc_queue call: queue_flag_set_unlocked(QUEUE_FLAG_NONROT, pmem-pmem_queue); Yeah, good catch. 3. Is there any reason to have a 512 KiB limit on the transfer length? /sys/block/pmem0/queue/max_hw_sectors_kb:512 That is from: blk_queue_max_hw_sectors(pmem-pmem_queue, 1024); I'd only change this from the default if performance testing showed it made a non-trivial difference. 4. These are read-writeable, but IOs never reach a queue, so the queue size is irrelevant and merging never happens: /sys/block/pmem0/queue/nomerges:0 /sys/block/pmem0/queue/nr_requests:128 Consider making them both read-only with: * nomerges set to 2 (no merging happening) * nr_requests as small as the block layer allows to avoid wasting memory. 5. No scatter-gather lists are created by the driver, so these read-only fields are meaningless: /sys/block/pmem0/queue/max_segments:128 /sys/block/pmem0/queue/max_segment_size:65536 Is there a better way to report them as irrelevant? Again it comes back to the question of whether these default settings are actively harmful. 6. There is no completion processing, so the read-writeable cpu affinity is not used: /sys/block/pmem0/queue/rq_affinity:0 Consider making it read-only and set to 2, meaning the completions always run on the requesting CPU. There are no completions with pmem, the entire I/O path is synchronous. Ideally, this attribute would disappear for a pmem queue, not be set to 2. 7. With mmap() allowing less than logical block sized accesses to the device, this could be considered misleading: /sys/block/pmem0/queue/physical_block_size:512 I don't see how it is misleading. If you access it as a block device the block size is 512. If the application is mmap() + DAX aware it knows that the physical_block_size is being bypassed. Perhaps that needs to be 1 byte or a cacheline size (64 bytes on x86) to indicate that direct partial logical block accesses are possible. No, because that breaks the definition of a block device. Through the bdev interface it's always accessed a block at a time. The btt driver could report 512 as one indication it is different. I wouldn't be surprised if smaller values than the logical block size confused some software, though. Precisely why we shouldn't go there with pmem. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 00/20] libnd: non-volatile memory device support
Changes since v1 [1]: Incorporates feedback received prior to April 24. 1/ Ingo said [2]: So why on earth is this whole concept and the naming itself ('drivers/block/nd/' stands for 'NFIT Defined', apparently) revolving around a specific 'firmware' mindset and revolving around specific, weirdly named, overly complicated looking firmware interfaces that come with their own new weird glossary?? Indeed, we of course consulted the NFIT specification to determine the shape of the sub-system, but then let its terms and data structures permeate too deep into the implementation. That is fixed now with all NFIT specifics factored out into acpi.c. The NFIT is no longer required reading to review libnd. Only three concepts are needed: i/ PMEM - contiguous memory range where cpu stores are persistent once they are flushed through the memory controller. ii/ BLK - mmio apertures (sliding windows) that can be programmed to access an aperture's-worth of persistent media at a time. iii/ DPA - dimm-physical-address, address space local to a dimm. A dimm may provide both PMEM-mode and BLK-mode access to a range of DPA. libnd manages allocation of DPA to either PMEM or BLK-namespaces to resolve this aliasing. The v1..v2 diffstat below shows the migration of nfit-specifics to acpi.c and the new state of libnd being nfit-free. nd now only refers to non-volatile devices. Note, reworked documentation will return once the review has settled. Documentation/blockdev/nd.txt | 867 - MAINTAINERS | 34 +- arch/ia64/kernel/efi.c|5 +- arch/x86/kernel/e820.c| 11 +- arch/x86/kernel/pmem.c|2 +- drivers/block/Makefile|2 +- drivers/block/nd/Kconfig | 135 ++-- drivers/block/nd/Makefile | 32 +- drivers/block/nd/acpi.c | 1506 +++-- drivers/block/nd/acpi_nfit.h | 321 drivers/block/nd/blk.c| 27 +- drivers/block/nd/btt.c|6 +- drivers/block/nd/btt_devs.c |8 +- drivers/block/nd/bus.c| 337 + drivers/block/nd/core.c | 574 +- drivers/block/nd/dimm.c | 11 - drivers/block/nd/dimm_devs.c | 292 ++- drivers/block/nd/e820.c | 100 +++ drivers/block/nd/libnd.h | 122 +++ drivers/block/nd/namespace_devs.c | 10 +- drivers/block/nd/nd-private.h | 107 +-- drivers/block/nd/nd.h | 91 +-- drivers/block/nd/nfit.h | 238 -- drivers/block/nd/pmem.c | 56 +- drivers/block/nd/region.c | 78 +- drivers/block/nd/region_devs.c| 783 +++ drivers/block/nd/test/iomap.c | 86 +-- drivers/block/nd/test/nfit.c | 1115 +++ drivers/block/nd/test/nfit_test.h | 15 +- include/uapi/linux/ndctl.h| 130 ++-- 30 files changed, 3166 insertions(+), 3935 deletions(-) delete mode 100644 Documentation/blockdev/nd.txt create mode 100644 drivers/block/nd/acpi_nfit.h create mode 100644 drivers/block/nd/e820.c create mode 100644 drivers/block/nd/libnd.h delete mode 100644 drivers/block/nd/nfit.h [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-April/000484.html [2]: https://lists.01.org/pipermail/linux-nvdimm/2015-April/000520.html 2/ Christoph asked the pmem ida conversion to be moved to its own patch (done), and to consider leaving the current pmem.c in drivers/block/. Instead, I converted the e820-type-12 enabling to be the first non-ACPI-NFIT based consumer of libnd. The new nd_e820 driver simply registers e820-type-12 ranges as libnd PMEM regions. Among other things this conversion enables BTT for these ranges. The alternative is to move drivers/block/nd/nd.h internals out to include/linux/ which I think is worse. 3/ Toshi reported that the NFIT parsing fails to handle the case of a PMEM range with a single-dimm (non-aliasing) interleave description. Support for this case was added and is tested by default by the nfit_test.1 configuration. 4/ Toshi reported that we should not be treating a missing _STA property as a dimm disabled by firmware case. (fixed). 5/ Christoph noted that ND_ARCH_HAS_IOREMAP_CACHE needs to be moved to arch code. It is gone for now and we'll revisit when adding cached mappings back to the PMEM driver. 6/ Toshi mentioned that the presence of two different nd_bus_probe() functions was confusing. (cleaned up). 7/ Robert asked for s/btt_checksum/nd_btt_checksum/ (done). 8/ Linda asked for nfit_test to honor dynamic cma reservations via the cma= command line (done). The cma requirements have also been reduced