Re: [PATCH v2 00/20] libnd: non-volatile memory device support

2015-05-08 Thread Williams, Dan J
On Tue, 2015-05-05 at 02:06 +0200, Rafael J. Wysocki wrote:
> On Tuesday, April 28, 2015 06:22:05 PM Dan Williams wrote:
> > On Tue, Apr 28, 2015 at 5:25 PM, Rafael J. Wysocki  
> > wrote:
> > > On Tuesday, April 28, 2015 02:24:12 PM Dan Williams wrote:
> > >> Changes since v1 [1]: Incorporates feedback received prior to April 24.
> > >>
> 
> [cut]
> 
> > >
> > > I'm wondering what's wrong with CCing all of the series to linux-acpi?
> > >
> > > Is there anything in it that the people on that list should not see, by 
> > > any
> > > chance?
> > 
> > linux-acpi may not care about the dimm-metadata labeling patches that
> > are completely independent of ACPI, but might as well include
> > linux-acpi on the whole series at this point.
> 
> I've gone through the ACPI-related patches in this series (other than [2/20]
> that I've commented directly) and while I haven't found anything horrible in
> them, I don't quite feel confident enough to ACK them.
> 
> What I'm really missing in this series is a design document describing all 
> that
> from a high-level perspective and making it clear where all of the pieces go
> and what their respective roles are.  Also reordering the series to introduce
> the nd subsystem to start with and then its users might help here.

Here you go, and also see the "Supporting Documents" section if you need
more details, or just ask.  This is the reworked document after pushing
NFIT specifics out of the core implementation.  The core apis are
nd_bus_register(), nd_dimm_create(), nd_pmem_region_create(), and
nd_blk_region_create().

---

  LIBND: Non-volatile Devices
  libnd - kernel / libndctl - userspace helper library
   linux-nvd...@lists.01.org
  v10


Glossary
Overview
Supporting Documents
Git Trees
LIBND PMEM and BLK
Why BLK?
PMEM vs BLK
BLK-REGIONs, PMEM-REGIONs, Atomic Sectors, and DAX
Example NVDIMM Platform
LIBND Kernel Device Model and LIBNDCTL Userspace API
LIBNDCTL: Context
libndctl: instantiate a new library context example
LIBND/LIBNDCTL: Bus
libnd: control class device in /sys/class
libnd: bus
libndctl: bus enumeration example
LIBND/LIBNDCTL: DIMM (NMEM)
libnd: DIMM (NMEM)
libndctl: DIMM enumeration example
LIBND/LIBNDCTL: Region
libnd: region
libndctl: region enumeration example
Why Not Encode the Region Type into the Region Name?
How Do I Determine the Major Type of a Region?
LIBND/LIBNDCTL: Namespace
libnd: namespace
libndctl: namespace enumeration example
libndctl: namespace creation example
Why the Term "namespace"?
LIBND/LIBNDCTL: Block Translation Table "btt"
libnd: btt layout
libndctl: btt creation example
Summary LIBNDCTL Diagram


Glossary


PMEM: A system physical address range where writes are persistent.  A
block device composed of PMEM is capable of DAX.  A PMEM address range
may span/interleave several DIMMs.

BLK: A set of one or more programmable memory mapped apertures provided
by a DIMM to access its media.  This indirection precludes the
performance benefit of interleaving, but enables DIMM-bounded failure
modes .

DPA: DIMM Physical Address, is a DIMM-relative offset.  With one DIMM in
the system there would be a 1:1 system-physical-address:DPA association.
Once more DIMMs are added an memory controller interleave must be
decoded to determine the DPA associated with a given
system-physical-address.  BLK capacity always has a 1:1 relationship
with a single-dimm's DPA range.

DAX: File system extensions to bypass the page cache and block layer to
mmap persistent memory, from a PMEM block device, directly into a
process address space.

BTT: Block Translation Table: Persistent memory is byte addressable.
Existing software may have an expectation that the power-fail-atomicity
of writes is at least one sector, 512 bytes.  The BTT is an indirection
table with atomic update semantics to front a PMEM/BLK block device
driver and present arbitrary atomic sector sizes.

LABEL: Metadata stored on a DIMM device that partitions and identifies
(persistently names) storage between PMEM and BLK.  It also partitions
BLK storage to host BTTs with different parameters per BLK-partition.
Note that traditional partition tables, GPT/MBR, are layered on top of a
BLK or PMEM device.


Overview


The libnd subsystem provides support for three types of NVDIMMs, PMEM,
BLK, and NVDIMM platforms that can simultaneously support PMEM and BLK
mode access capabilities on a given set of DIMMs.  These three modes of
operation are described by the 

Re: [PATCH v2 00/20] libnd: non-volatile memory device support

2015-05-08 Thread Williams, Dan J
On Tue, 2015-05-05 at 02:06 +0200, Rafael J. Wysocki wrote:
 On Tuesday, April 28, 2015 06:22:05 PM Dan Williams wrote:
  On Tue, Apr 28, 2015 at 5:25 PM, Rafael J. Wysocki r...@rjwysocki.net 
  wrote:
   On Tuesday, April 28, 2015 02:24:12 PM Dan Williams wrote:
   Changes since v1 [1]: Incorporates feedback received prior to April 24.
  
 
 [cut]
 
  
   I'm wondering what's wrong with CCing all of the series to linux-acpi?
  
   Is there anything in it that the people on that list should not see, by 
   any
   chance?
  
  linux-acpi may not care about the dimm-metadata labeling patches that
  are completely independent of ACPI, but might as well include
  linux-acpi on the whole series at this point.
 
 I've gone through the ACPI-related patches in this series (other than [2/20]
 that I've commented directly) and while I haven't found anything horrible in
 them, I don't quite feel confident enough to ACK them.
 
 What I'm really missing in this series is a design document describing all 
 that
 from a high-level perspective and making it clear where all of the pieces go
 and what their respective roles are.  Also reordering the series to introduce
 the nd subsystem to start with and then its users might help here.

Here you go, and also see the Supporting Documents section if you need
more details, or just ask.  This is the reworked document after pushing
NFIT specifics out of the core implementation.  The core apis are
nd_bus_register(), nd_dimm_create(), nd_pmem_region_create(), and
nd_blk_region_create().

---

  LIBND: Non-volatile Devices
  libnd - kernel / libndctl - userspace helper library
   linux-nvd...@lists.01.org
  v10


Glossary
Overview
Supporting Documents
Git Trees
LIBND PMEM and BLK
Why BLK?
PMEM vs BLK
BLK-REGIONs, PMEM-REGIONs, Atomic Sectors, and DAX
Example NVDIMM Platform
LIBND Kernel Device Model and LIBNDCTL Userspace API
LIBNDCTL: Context
libndctl: instantiate a new library context example
LIBND/LIBNDCTL: Bus
libnd: control class device in /sys/class
libnd: bus
libndctl: bus enumeration example
LIBND/LIBNDCTL: DIMM (NMEM)
libnd: DIMM (NMEM)
libndctl: DIMM enumeration example
LIBND/LIBNDCTL: Region
libnd: region
libndctl: region enumeration example
Why Not Encode the Region Type into the Region Name?
How Do I Determine the Major Type of a Region?
LIBND/LIBNDCTL: Namespace
libnd: namespace
libndctl: namespace enumeration example
libndctl: namespace creation example
Why the Term namespace?
LIBND/LIBNDCTL: Block Translation Table btt
libnd: btt layout
libndctl: btt creation example
Summary LIBNDCTL Diagram


Glossary


PMEM: A system physical address range where writes are persistent.  A
block device composed of PMEM is capable of DAX.  A PMEM address range
may span/interleave several DIMMs.

BLK: A set of one or more programmable memory mapped apertures provided
by a DIMM to access its media.  This indirection precludes the
performance benefit of interleaving, but enables DIMM-bounded failure
modes .

DPA: DIMM Physical Address, is a DIMM-relative offset.  With one DIMM in
the system there would be a 1:1 system-physical-address:DPA association.
Once more DIMMs are added an memory controller interleave must be
decoded to determine the DPA associated with a given
system-physical-address.  BLK capacity always has a 1:1 relationship
with a single-dimm's DPA range.

DAX: File system extensions to bypass the page cache and block layer to
mmap persistent memory, from a PMEM block device, directly into a
process address space.

BTT: Block Translation Table: Persistent memory is byte addressable.
Existing software may have an expectation that the power-fail-atomicity
of writes is at least one sector, 512 bytes.  The BTT is an indirection
table with atomic update semantics to front a PMEM/BLK block device
driver and present arbitrary atomic sector sizes.

LABEL: Metadata stored on a DIMM device that partitions and identifies
(persistently names) storage between PMEM and BLK.  It also partitions
BLK storage to host BTTs with different parameters per BLK-partition.
Note that traditional partition tables, GPT/MBR, are layered on top of a
BLK or PMEM device.


Overview


The libnd subsystem provides support for three types of NVDIMMs, PMEM,
BLK, and NVDIMM platforms that can simultaneously support PMEM and BLK
mode access capabilities on a given set of DIMMs.  These three modes of
operation are described by the NVDIMM Firmware Interface Table (NFIT)
in 

Re: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device support

2015-05-07 Thread Christoph Hellwig
On Tue, Apr 28, 2015 at 03:15:54PM -0700, Dan Williams wrote:
> > lsblk's blkdev_scsi_type_to_name() considers 4 to mean
> > SCSI_TYPE_WORM (write once read many ... used for certain optical
> > and tape drives).
> 
> Why is lsblk assuming these are scsi devices?  I'll need to go check that out.

It's a very common assumption unfortunately.  I rember fixing it in
various in-house tools at customers and stumbled over it in targetcli
recently.

Please use a prefix for your type attribute to avoid this problem.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device support

2015-05-07 Thread Christoph Hellwig
On Tue, Apr 28, 2015 at 03:15:54PM -0700, Dan Williams wrote:
  lsblk's blkdev_scsi_type_to_name() considers 4 to mean
  SCSI_TYPE_WORM (write once read many ... used for certain optical
  and tape drives).
 
 Why is lsblk assuming these are scsi devices?  I'll need to go check that out.

It's a very common assumption unfortunately.  I rember fixing it in
various in-house tools at customers and stumbled over it in targetcli
recently.

Please use a prefix for your type attribute to avoid this problem.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 00/20] libnd: non-volatile memory device support

2015-05-04 Thread Rafael J. Wysocki
On Tuesday, April 28, 2015 06:22:05 PM Dan Williams wrote:
> On Tue, Apr 28, 2015 at 5:25 PM, Rafael J. Wysocki  wrote:
> > On Tuesday, April 28, 2015 02:24:12 PM Dan Williams wrote:
> >> Changes since v1 [1]: Incorporates feedback received prior to April 24.
> >>

[cut]

> >
> > I'm wondering what's wrong with CCing all of the series to linux-acpi?
> >
> > Is there anything in it that the people on that list should not see, by any
> > chance?
> 
> linux-acpi may not care about the dimm-metadata labeling patches that
> are completely independent of ACPI, but might as well include
> linux-acpi on the whole series at this point.

I've gone through the ACPI-related patches in this series (other than [2/20]
that I've commented directly) and while I haven't found anything horrible in
them, I don't quite feel confident enough to ACK them.

What I'm really missing in this series is a design document describing all that
from a high-level perspective and making it clear where all of the pieces go
and what their respective roles are.  Also reordering the series to introduce
the nd subsystem to start with and then its users might help here.


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 00/20] libnd: non-volatile memory device support

2015-05-04 Thread Rafael J. Wysocki
On Tuesday, April 28, 2015 06:22:05 PM Dan Williams wrote:
 On Tue, Apr 28, 2015 at 5:25 PM, Rafael J. Wysocki r...@rjwysocki.net wrote:
  On Tuesday, April 28, 2015 02:24:12 PM Dan Williams wrote:
  Changes since v1 [1]: Incorporates feedback received prior to April 24.
 

[cut]

 
  I'm wondering what's wrong with CCing all of the series to linux-acpi?
 
  Is there anything in it that the people on that list should not see, by any
  chance?
 
 linux-acpi may not care about the dimm-metadata labeling patches that
 are completely independent of ACPI, but might as well include
 linux-acpi on the whole series at this point.

I've gone through the ACPI-related patches in this series (other than [2/20]
that I've commented directly) and while I haven't found anything horrible in
them, I don't quite feel confident enough to ACK them.

What I'm really missing in this series is a design document describing all that
from a high-level perspective and making it clear where all of the pieces go
and what their respective roles are.  Also reordering the series to introduce
the nd subsystem to start with and then its users might help here.


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 00/20] libnd: non-volatile memory device support

2015-04-30 Thread Ross Zwisler
On Tue, 2015-04-28 at 16:05 -0700, Andy Lutomirski wrote:
> On Tue, Apr 28, 2015 at 3:28 PM, Dan Williams  
> wrote:
> > On Tue, Apr 28, 2015 at 2:06 PM, Andy Lutomirski  
> > wrote:
> >> On Tue, Apr 28, 2015 at 1:59 PM, Dan Williams  
> >> wrote:
> >>> On Tue, Apr 28, 2015 at 1:52 PM, Andy Lutomirski  
> >>> wrote:
>  On Tue, Apr 28, 2015 at 11:24 AM, Dan Williams 
>   wrote:

>  Mostly for my understanding: is there a name for "address relative to
>  the address lines on the DIMM"?  That is, a DIMM that exposes 8 GB of
>  apparent physical memory, possibly interleaved, broken up, or weirdly
>  remapped by the memory controller, would still have addresses between
>  0 and 8 GB.  Some of those might be PMEM windows, some might be MMIO,
>  some might be BLK apertures, etc.
> 
>  IIUC "DPA" refers to actual addressable storage, not this type of 
>  address?
> >>>
> >>> No, DPA is exactly as you describe above.  You can't directly access
> >>> it except through a PMEM mapping (possibly interleaved with DPA from
> >>> other DIMMs) or a BLK aperture (mmio window into DPA).
> >>
> >> So the thing I'm describing has no name, then?  Oh, well.
> >
> > What?  The thing you are describing *is* DPA.
> 
> I'm confused.  Here are the two things I have in mind:
> 
> 1. An address into on-DIMM storage.  If I have a DIMM that is mapped
> to 8 GB of SPA but has 64 GB of usable storage (accessed through BLK
> apertures, say), then this address runs from 0 to 64 GB.
> 
> 2. An address into the DIMM's view of physical address space.  If I
> have a DIMM that is mapped to 8 GB of SPA but has 64 GB of usable
> storage (accessed through BLK apertures, say), then this address runs
> from 0 to 8 GB.  There's a one-to-one mapping between SPA and this
> type of address.
> 
> Since you said "a dimm may provide both PMEM-mode and BLK-mode access
> to a range of DPA.," I thought that DPA was #1.
> 
> --Andy

I think that you've got the right definition, #1 above, for DPA.  The DPA is
relative to the DIMM, knows nothing about interleaving or SPA or anything else
in the system, and is basically equivalent to the idea of an LBA on a disk.  A
DIMM that has 64 GiB of storage could have a DPA space ranging from 0 to 64
GiB.

The second concept is a little trickier - we've been talking about this by
using the term "N-way interleave set".  Say you have your 64 GiB DIMM and only
the first 8 GiB are given to the OS in an SPA, and that DIMM isn't interleaved
with any other DIMMs.  This would be a 1-way interleave set, ranging from DPA
0 - 8GiB on the DIMM.

If you have 2 DIMMs of size 64 GiB, and they each have a 8 GiB region given to
the SPA space, those two regions could be interleaved together.  The OS would
then see a 16 GiB 2-way interleave set, made up of DPAs 0 -> 8 GiB on each of
the two DIMMs.

You can figure out exactly how all the interleaving works by looking at the
SPA tables, the Memory Device tables and the Interleave Tables.

These are in sections 5.2.25.1 - 5.2.25.3 in ACPI 6, and are in our code as
struct acpi_nfit_spa, struct acpi_nfit_memdev and struct acpi_nfit_idt.

- Ross


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 00/20] libnd: non-volatile memory device support

2015-04-30 Thread Ross Zwisler
On Tue, 2015-04-28 at 16:05 -0700, Andy Lutomirski wrote:
 On Tue, Apr 28, 2015 at 3:28 PM, Dan Williams dan.j.willi...@intel.com 
 wrote:
  On Tue, Apr 28, 2015 at 2:06 PM, Andy Lutomirski l...@amacapital.net 
  wrote:
  On Tue, Apr 28, 2015 at 1:59 PM, Dan Williams dan.j.willi...@intel.com 
  wrote:
  On Tue, Apr 28, 2015 at 1:52 PM, Andy Lutomirski l...@amacapital.net 
  wrote:
  On Tue, Apr 28, 2015 at 11:24 AM, Dan Williams 
  dan.j.willi...@intel.com wrote:

  Mostly for my understanding: is there a name for address relative to
  the address lines on the DIMM?  That is, a DIMM that exposes 8 GB of
  apparent physical memory, possibly interleaved, broken up, or weirdly
  remapped by the memory controller, would still have addresses between
  0 and 8 GB.  Some of those might be PMEM windows, some might be MMIO,
  some might be BLK apertures, etc.
 
  IIUC DPA refers to actual addressable storage, not this type of 
  address?
 
  No, DPA is exactly as you describe above.  You can't directly access
  it except through a PMEM mapping (possibly interleaved with DPA from
  other DIMMs) or a BLK aperture (mmio window into DPA).
 
  So the thing I'm describing has no name, then?  Oh, well.
 
  What?  The thing you are describing *is* DPA.
 
 I'm confused.  Here are the two things I have in mind:
 
 1. An address into on-DIMM storage.  If I have a DIMM that is mapped
 to 8 GB of SPA but has 64 GB of usable storage (accessed through BLK
 apertures, say), then this address runs from 0 to 64 GB.
 
 2. An address into the DIMM's view of physical address space.  If I
 have a DIMM that is mapped to 8 GB of SPA but has 64 GB of usable
 storage (accessed through BLK apertures, say), then this address runs
 from 0 to 8 GB.  There's a one-to-one mapping between SPA and this
 type of address.
 
 Since you said a dimm may provide both PMEM-mode and BLK-mode access
 to a range of DPA., I thought that DPA was #1.
 
 --Andy

I think that you've got the right definition, #1 above, for DPA.  The DPA is
relative to the DIMM, knows nothing about interleaving or SPA or anything else
in the system, and is basically equivalent to the idea of an LBA on a disk.  A
DIMM that has 64 GiB of storage could have a DPA space ranging from 0 to 64
GiB.

The second concept is a little trickier - we've been talking about this by
using the term N-way interleave set.  Say you have your 64 GiB DIMM and only
the first 8 GiB are given to the OS in an SPA, and that DIMM isn't interleaved
with any other DIMMs.  This would be a 1-way interleave set, ranging from DPA
0 - 8GiB on the DIMM.

If you have 2 DIMMs of size 64 GiB, and they each have a 8 GiB region given to
the SPA space, those two regions could be interleaved together.  The OS would
then see a 16 GiB 2-way interleave set, made up of DPAs 0 - 8 GiB on each of
the two DIMMs.

You can figure out exactly how all the interleaving works by looking at the
SPA tables, the Memory Device tables and the Interleave Tables.

These are in sections 5.2.25.1 - 5.2.25.3 in ACPI 6, and are in our code as
struct acpi_nfit_spa, struct acpi_nfit_memdev and struct acpi_nfit_idt.

- Ross


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 00/20] libnd: non-volatile memory device support

2015-04-28 Thread Dan Williams
On Tue, Apr 28, 2015 at 5:25 PM, Rafael J. Wysocki  wrote:
> On Tuesday, April 28, 2015 02:24:12 PM Dan Williams wrote:
>> Changes since v1 [1]: Incorporates feedback received prior to April 24.
>>
>> 1/ Ingo said [2]:
>>
>>"So why on earth is this whole concept and the naming itself
>>('drivers/block/nd/' stands for 'NFIT Defined', apparently)
>>revolving around a specific 'firmware' mindset and revolving
>>around specific, weirdly named, overly complicated looking
>>firmware interfaces that come with their own new weird
>>glossary??"
>>
>>Indeed, we of course consulted the NFIT specification to determine
>>the shape of the sub-system, but then let its terms and data
>>structures permeate too deep into the implementation.  That is fixed
>>now with all NFIT specifics factored out into acpi.c.  The NFIT is no
>>longer required reading to review libnd.  Only three concepts are
>>needed:
>>
>>   i/ PMEM - contiguous memory range where cpu stores are
>>persistent once they are flushed through the memory
>>  controller.
>>
>>  ii/ BLK - mmio apertures (sliding windows) that can be
>>programmed to access an aperture's-worth of persistent
>>  media at a time.
>>
>> iii/ DPA - "dimm-physical-address", address space local to a
>>dimm.  A dimm may provide both PMEM-mode and BLK-mode
>>access to a range of DPA.  libnd manages allocation of DPA
>>  to either PMEM or BLK-namespaces to resolve this aliasing.
>>
>>The v1..v2 diffstat below shows the migration of nfit-specifics to
>>acpi.c and the new state of libnd being nfit-free.  "nd" now only
>>refers to "non-volatile devices".  Note, reworked documentation will
>>return once the review has settled.
>>
>>Documentation/blockdev/nd.txt |  867 -
>>MAINTAINERS   |   34 +-
>>arch/ia64/kernel/efi.c|5 +-
>>arch/x86/kernel/e820.c|   11 +-
>>arch/x86/kernel/pmem.c|2 +-
>>drivers/block/Makefile|2 +-
>>drivers/block/nd/Kconfig  |  135 ++--
>>drivers/block/nd/Makefile |   32 +-
>>drivers/block/nd/acpi.c   | 1506 
>> +++--
>>drivers/block/nd/acpi_nfit.h  |  321 
>>drivers/block/nd/blk.c|   27 +-
>>drivers/block/nd/btt.c|6 +-
>>drivers/block/nd/btt_devs.c   |8 +-
>>drivers/block/nd/bus.c|  337 +
>>drivers/block/nd/core.c   |  574 +-
>>drivers/block/nd/dimm.c   |   11 -
>>drivers/block/nd/dimm_devs.c  |  292 ++-
>>drivers/block/nd/e820.c   |  100 +++
>>drivers/block/nd/libnd.h  |  122 +++
>>drivers/block/nd/namespace_devs.c |   10 +-
>>drivers/block/nd/nd-private.h |  107 +--
>>drivers/block/nd/nd.h |   91 +--
>>drivers/block/nd/nfit.h   |  238 --
>>drivers/block/nd/pmem.c   |   56 +-
>>drivers/block/nd/region.c |   78 +-
>>drivers/block/nd/region_devs.c|  783 +++
>>drivers/block/nd/test/iomap.c |   86 +--
>>drivers/block/nd/test/nfit.c  | 1115 +++
>>drivers/block/nd/test/nfit_test.h |   15 +-
>>include/uapi/linux/ndctl.h|  130 ++--
>>30 files changed, 3166 insertions(+), 3935 deletions(-)
>>delete mode 100644 Documentation/blockdev/nd.txt
>>create mode 100644 drivers/block/nd/acpi_nfit.h
>>create mode 100644 drivers/block/nd/e820.c
>>create mode 100644 drivers/block/nd/libnd.h
>>delete mode 100644 drivers/block/nd/nfit.h
>>
>>[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-April/000484.html
>>[2]: https://lists.01.org/pipermail/linux-nvdimm/2015-April/000520.html
>>
>> 2/ Christoph asked the pmem ida conversion to be moved to its own patch
>>(done), and to consider leaving the current pmem.c in drivers/block/.
>>Instead, I converted the e820-type-12 enabling to be the first
>>non-ACPI-NFIT based consumer of libnd.  The new nd_e820 driver simply
>>registers e820-type-12 ranges as libnd PMEM regions.  Among other
>>things this conversion enables BTT for these ranges.  The alternative
>>is to move drivers/block/nd/nd.h internals out to include/linux/
>>which I think is worse.
>>
>> 3/ Toshi reported that the NFIT parsing fails to handle the case of a
>>PMEM range with a single-dimm (non-aliasing) interleave description.
>>Support for this case was added and is tested by default by the
>>nfit_test.1 configuration.
>>
>> 4/ Toshi reported that we should not be treating a missing _STA property
>>as a "dimm disabled by firmware" case.  (fixed).
>>
>> 5/ Christoph noted that ND_ARCH_HAS_IOREMAP_CACHE needs to be moved to
>>arch code.  It is gone for now 

Re: [PATCH v2 00/20] libnd: non-volatile memory device support

2015-04-28 Thread Rafael J. Wysocki
On Tuesday, April 28, 2015 02:24:12 PM Dan Williams wrote:
> Changes since v1 [1]: Incorporates feedback received prior to April 24.
> 
> 1/ Ingo said [2]:
> 
>"So why on earth is this whole concept and the naming itself
>('drivers/block/nd/' stands for 'NFIT Defined', apparently)
>revolving around a specific 'firmware' mindset and revolving
>around specific, weirdly named, overly complicated looking
>firmware interfaces that come with their own new weird
>glossary??"
> 
>Indeed, we of course consulted the NFIT specification to determine
>the shape of the sub-system, but then let its terms and data
>structures permeate too deep into the implementation.  That is fixed
>now with all NFIT specifics factored out into acpi.c.  The NFIT is no
>longer required reading to review libnd.  Only three concepts are
>needed:
> 
>   i/ PMEM - contiguous memory range where cpu stores are
>persistent once they are flushed through the memory
>  controller.
> 
>  ii/ BLK - mmio apertures (sliding windows) that can be
>programmed to access an aperture's-worth of persistent
>  media at a time.
> 
> iii/ DPA - "dimm-physical-address", address space local to a
>dimm.  A dimm may provide both PMEM-mode and BLK-mode
>access to a range of DPA.  libnd manages allocation of DPA
>  to either PMEM or BLK-namespaces to resolve this aliasing. 
> 
>The v1..v2 diffstat below shows the migration of nfit-specifics to
>acpi.c and the new state of libnd being nfit-free.  "nd" now only
>refers to "non-volatile devices".  Note, reworked documentation will
>return once the review has settled.
> 
>Documentation/blockdev/nd.txt |  867 -
>MAINTAINERS   |   34 +-
>arch/ia64/kernel/efi.c|5 +-
>arch/x86/kernel/e820.c|   11 +-
>arch/x86/kernel/pmem.c|2 +-
>drivers/block/Makefile|2 +-
>drivers/block/nd/Kconfig  |  135 ++--
>drivers/block/nd/Makefile |   32 +-
>drivers/block/nd/acpi.c   | 1506 
> +++--
>drivers/block/nd/acpi_nfit.h  |  321 
>drivers/block/nd/blk.c|   27 +-
>drivers/block/nd/btt.c|6 +-
>drivers/block/nd/btt_devs.c   |8 +-
>drivers/block/nd/bus.c|  337 +
>drivers/block/nd/core.c   |  574 +-
>drivers/block/nd/dimm.c   |   11 -
>drivers/block/nd/dimm_devs.c  |  292 ++-
>drivers/block/nd/e820.c   |  100 +++
>drivers/block/nd/libnd.h  |  122 +++
>drivers/block/nd/namespace_devs.c |   10 +-
>drivers/block/nd/nd-private.h |  107 +--
>drivers/block/nd/nd.h |   91 +--
>drivers/block/nd/nfit.h   |  238 --
>drivers/block/nd/pmem.c   |   56 +-
>drivers/block/nd/region.c |   78 +-
>drivers/block/nd/region_devs.c|  783 +++
>drivers/block/nd/test/iomap.c |   86 +--
>drivers/block/nd/test/nfit.c  | 1115 +++
>drivers/block/nd/test/nfit_test.h |   15 +-
>include/uapi/linux/ndctl.h|  130 ++--
>30 files changed, 3166 insertions(+), 3935 deletions(-)
>delete mode 100644 Documentation/blockdev/nd.txt
>create mode 100644 drivers/block/nd/acpi_nfit.h
>create mode 100644 drivers/block/nd/e820.c
>create mode 100644 drivers/block/nd/libnd.h
>delete mode 100644 drivers/block/nd/nfit.h
> 
>[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-April/000484.html
>[2]: https://lists.01.org/pipermail/linux-nvdimm/2015-April/000520.html
> 
> 2/ Christoph asked the pmem ida conversion to be moved to its own patch
>(done), and to consider leaving the current pmem.c in drivers/block/.
>Instead, I converted the e820-type-12 enabling to be the first
>non-ACPI-NFIT based consumer of libnd.  The new nd_e820 driver simply
>registers e820-type-12 ranges as libnd PMEM regions.  Among other
>things this conversion enables BTT for these ranges.  The alternative
>is to move drivers/block/nd/nd.h internals out to include/linux/
>which I think is worse.
> 
> 3/ Toshi reported that the NFIT parsing fails to handle the case of a
>PMEM range with a single-dimm (non-aliasing) interleave description.
>Support for this case was added and is tested by default by the
>nfit_test.1 configuration.
> 
> 4/ Toshi reported that we should not be treating a missing _STA property
>as a "dimm disabled by firmware" case.  (fixed).
> 
> 5/ Christoph noted that ND_ARCH_HAS_IOREMAP_CACHE needs to be moved to
>arch code.  It is gone for now and we'll revisit when adding cached
>mappings back to the PMEM driver.
> 
> 6/ Toshi mentioned that the presence of two different 

Re: [PATCH v2 00/20] libnd: non-volatile memory device support

2015-04-28 Thread Andy Lutomirski
On Tue, Apr 28, 2015 at 3:28 PM, Dan Williams  wrote:
> On Tue, Apr 28, 2015 at 2:06 PM, Andy Lutomirski  wrote:
>> On Tue, Apr 28, 2015 at 1:59 PM, Dan Williams  
>> wrote:
>>> On Tue, Apr 28, 2015 at 1:52 PM, Andy Lutomirski  
>>> wrote:
 On Tue, Apr 28, 2015 at 11:24 AM, Dan Williams  
 wrote:
> Changes since v1 [1]: Incorporates feedback received prior to April 24.
>
> 1/ Ingo said [2]:
>
>"So why on earth is this whole concept and the naming itself
>('drivers/block/nd/' stands for 'NFIT Defined', apparently)
>revolving around a specific 'firmware' mindset and revolving
>around specific, weirdly named, overly complicated looking
>firmware interfaces that come with their own new weird
>glossary??"
>
>Indeed, we of course consulted the NFIT specification to determine
>the shape of the sub-system, but then let its terms and data
>structures permeate too deep into the implementation.  That is fixed
>now with all NFIT specifics factored out into acpi.c.  The NFIT is no
>longer required reading to review libnd.  Only three concepts are
>needed:
>
>   i/ PMEM - contiguous memory range where cpu stores are
>  persistent once they are flushed through the memory
>  controller.
>
>  ii/ BLK - mmio apertures (sliding windows) that can be
>  programmed to access an aperture's-worth of persistent
>  media at a time.
>
> iii/ DPA - "dimm-physical-address", address space local to a
>  dimm.  A dimm may provide both PMEM-mode and BLK-mode
>  access to a range of DPA.  libnd manages allocation of DPA
>  to either PMEM or BLK-namespaces to resolve this aliasing.

 Mostly for my understanding: is there a name for "address relative to
 the address lines on the DIMM"?  That is, a DIMM that exposes 8 GB of
 apparent physical memory, possibly interleaved, broken up, or weirdly
 remapped by the memory controller, would still have addresses between
 0 and 8 GB.  Some of those might be PMEM windows, some might be MMIO,
 some might be BLK apertures, etc.

 IIUC "DPA" refers to actual addressable storage, not this type of address?
>>>
>>> No, DPA is exactly as you describe above.  You can't directly access
>>> it except through a PMEM mapping (possibly interleaved with DPA from
>>> other DIMMs) or a BLK aperture (mmio window into DPA).
>>
>> So the thing I'm describing has no name, then?  Oh, well.
>
> What?  The thing you are describing *is* DPA.

I'm confused.  Here are the two things I have in mind:

1. An address into on-DIMM storage.  If I have a DIMM that is mapped
to 8 GB of SPA but has 64 GB of usable storage (accessed through BLK
apertures, say), then this address runs from 0 to 64 GB.

2. An address into the DIMM's view of physical address space.  If I
have a DIMM that is mapped to 8 GB of SPA but has 64 GB of usable
storage (accessed through BLK apertures, say), then this address runs
from 0 to 8 GB.  There's a one-to-one mapping between SPA and this
type of address.

Since you said "a dimm may provide both PMEM-mode and BLK-mode access
to a range of DPA.," I thought that DPA was #1.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 00/20] libnd: non-volatile memory device support

2015-04-28 Thread Dan Williams
On Tue, Apr 28, 2015 at 2:06 PM, Andy Lutomirski  wrote:
> On Tue, Apr 28, 2015 at 1:59 PM, Dan Williams  
> wrote:
>> On Tue, Apr 28, 2015 at 1:52 PM, Andy Lutomirski  wrote:
>>> On Tue, Apr 28, 2015 at 11:24 AM, Dan Williams  
>>> wrote:
 Changes since v1 [1]: Incorporates feedback received prior to April 24.

 1/ Ingo said [2]:

"So why on earth is this whole concept and the naming itself
('drivers/block/nd/' stands for 'NFIT Defined', apparently)
revolving around a specific 'firmware' mindset and revolving
around specific, weirdly named, overly complicated looking
firmware interfaces that come with their own new weird
glossary??"

Indeed, we of course consulted the NFIT specification to determine
the shape of the sub-system, but then let its terms and data
structures permeate too deep into the implementation.  That is fixed
now with all NFIT specifics factored out into acpi.c.  The NFIT is no
longer required reading to review libnd.  Only three concepts are
needed:

   i/ PMEM - contiguous memory range where cpu stores are
  persistent once they are flushed through the memory
  controller.

  ii/ BLK - mmio apertures (sliding windows) that can be
  programmed to access an aperture's-worth of persistent
  media at a time.

 iii/ DPA - "dimm-physical-address", address space local to a
  dimm.  A dimm may provide both PMEM-mode and BLK-mode
  access to a range of DPA.  libnd manages allocation of DPA
  to either PMEM or BLK-namespaces to resolve this aliasing.
>>>
>>> Mostly for my understanding: is there a name for "address relative to
>>> the address lines on the DIMM"?  That is, a DIMM that exposes 8 GB of
>>> apparent physical memory, possibly interleaved, broken up, or weirdly
>>> remapped by the memory controller, would still have addresses between
>>> 0 and 8 GB.  Some of those might be PMEM windows, some might be MMIO,
>>> some might be BLK apertures, etc.
>>>
>>> IIUC "DPA" refers to actual addressable storage, not this type of address?
>>
>> No, DPA is exactly as you describe above.  You can't directly access
>> it except through a PMEM mapping (possibly interleaved with DPA from
>> other DIMMs) or a BLK aperture (mmio window into DPA).
>
> So the thing I'm describing has no name, then?  Oh, well.

What?  The thing you are describing *is* DPA.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device support

2015-04-28 Thread Dan Williams
On Tue, Apr 28, 2015 at 2:24 PM, Elliott, Robert (Server Storage)
 wrote:
>> -Original Message-
>> From: Linux-nvdimm [mailto:linux-nvdimm-boun...@lists.01.org] On Behalf Of
>> Dan Williams
>> Sent: Tuesday, April 28, 2015 1:24 PM
>> To: linux-nvd...@lists.01.org
>> Cc: Neil Brown; Dave Chinner; H. Peter Anvin; Christoph Hellwig; Rafael J.
>> Wysocki; Robert Moore; Ingo Molnar; linux-a...@vger.kernel.org; Jens Axboe;
>> Borislav Petkov; Thomas Gleixner; Greg KH; linux-kernel@vger.kernel.org;
>> Andy Lutomirski; Andrew Morton; Linus Torvalds
>> Subject: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device
>> support
>>
>> Changes since v1 [1]: Incorporates feedback received prior to April 24.
>
> Here are some comments on the sysfs properties reported for a pmem device.
> They are based on v1, but I don't think v2 changes anything.
>
> 1. This confuses lsblk (part of util-linux):
> /sys/block/pmem0/device/type:4
>
> lsblk shows:
> NAME  MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
> pmem0 251:00 8G  0 worm
> pmem1 251:16   0 8G  0 worm
> pmem2 251:32   0 8G  0 worm
> pmem3 251:48   0 8G  0 worm
> pmem4 251:64   0 8G  0 worm
> pmem5 251:80   0 8G  0 worm
> pmem6 251:96   0 8G  0 worm
> pmem7 251:112  0 8G  0 worm
>
> lsblk's blkdev_scsi_type_to_name() considers 4 to mean
> SCSI_TYPE_WORM (write once read many ... used for certain optical
> and tape drives).

Why is lsblk assuming these are scsi devices?  I'll need to go check that out.

> I'm not sure what nd and pmem are doing to result in that value.

That is their libnd specific device type number from
include/uapi/ndctl.h.  4 == ND_DEVICE_NAMESPACE_IO.   lsblk has no
business interpreting this as something SCSI specific.

> 2. To avoid confusing software trying to detect fast storage vs.
> slow storage devices via sysfs, this value should be 0:
> /sys/block/pmem0/queue/rotational:1
>
> That can be done by adding this shortly after the blk_alloc_queue call:
> queue_flag_set_unlocked(QUEUE_FLAG_NONROT, pmem->pmem_queue);

Yeah, good catch.

> 3. Is there any reason to have a 512 KiB limit on the transfer
> length?
> /sys/block/pmem0/queue/max_hw_sectors_kb:512
>
> That is from:
>blk_queue_max_hw_sectors(pmem->pmem_queue, 1024);

I'd only change this from the default if performance testing showed it
made a non-trivial difference.

> 4. These are read-writeable, but IOs never reach a queue, so
> the queue size is irrelevant and merging never happens:
> /sys/block/pmem0/queue/nomerges:0
> /sys/block/pmem0/queue/nr_requests:128
>
> Consider making them both read-only with:
> * nomerges set to 2 (no merging happening)
> * nr_requests as small as the block layer allows to avoid
> wasting memory.
>
> 5. No scatter-gather lists are created by the driver, so these
> read-only fields are meaningless:
> /sys/block/pmem0/queue/max_segments:128
> /sys/block/pmem0/queue/max_segment_size:65536
>
> Is there a better way to report them as irrelevant?

Again it comes back to the question of whether these default settings
are actively harmful.

>
> 6. There is no completion processing, so the read-writeable
> cpu affinity is not used:
> /sys/block/pmem0/queue/rq_affinity:0
>
> Consider making it read-only and set to 2, meaning the
> completions always run on the requesting CPU.

There are no completions with pmem, the entire I/O path is
synchronous.  Ideally, this attribute would disappear for a pmem
queue, not be set to 2.

> 7. With mmap() allowing less than logical block sized accesses
> to the device, this could be considered misleading:
> /sys/block/pmem0/queue/physical_block_size:512

I don't see how it is misleading.  If you access it as a block device
the block size is 512.  If the application is mmap() + DAX aware it
knows that the physical_block_size is being bypassed.

>
> Perhaps that needs to be 1 byte or a cacheline size (64 bytes
> on x86) to indicate that direct partial logical block accesses
> are possible.

No, because that breaks the definition of a block device.  Through the
bdev interface it's always accessed a block at a time.

> The btt driver could report 512 as one indication
> it is different.
>
> I wouldn't be surprised if smaller values than the logical block
> size confused some software, though.

Precisely why we shouldn't go there with pmem.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device support

2015-04-28 Thread Elliott, Robert (Server Storage)
> -Original Message-
> From: Linux-nvdimm [mailto:linux-nvdimm-boun...@lists.01.org] On Behalf Of
> Dan Williams
> Sent: Tuesday, April 28, 2015 1:24 PM
> To: linux-nvd...@lists.01.org
> Cc: Neil Brown; Dave Chinner; H. Peter Anvin; Christoph Hellwig; Rafael J.
> Wysocki; Robert Moore; Ingo Molnar; linux-a...@vger.kernel.org; Jens Axboe;
> Borislav Petkov; Thomas Gleixner; Greg KH; linux-kernel@vger.kernel.org;
> Andy Lutomirski; Andrew Morton; Linus Torvalds
> Subject: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device
> support
> 
> Changes since v1 [1]: Incorporates feedback received prior to April 24.

Here are some comments on the sysfs properties reported for a pmem device.
They are based on v1, but I don't think v2 changes anything.

1. This confuses lsblk (part of util-linux):
/sys/block/pmem0/device/type:4

lsblk shows:
NAME  MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
pmem0 251:00 8G  0 worm
pmem1 251:16   0 8G  0 worm
pmem2 251:32   0 8G  0 worm
pmem3 251:48   0 8G  0 worm
pmem4 251:64   0 8G  0 worm
pmem5 251:80   0 8G  0 worm
pmem6 251:96   0 8G  0 worm
pmem7 251:112  0 8G  0 worm

lsblk's blkdev_scsi_type_to_name() considers 4 to mean 
SCSI_TYPE_WORM (write once read many ... used for certain optical
and tape drives).

I'm not sure what nd and pmem are doing to result in that value.

2. To avoid confusing software trying to detect fast storage vs.
slow storage devices via sysfs, this value should be 0:
/sys/block/pmem0/queue/rotational:1

That can be done by adding this shortly after the blk_alloc_queue call:
queue_flag_set_unlocked(QUEUE_FLAG_NONROT, pmem->pmem_queue);

3. Is there any reason to have a 512 KiB limit on the transfer
length?
/sys/block/pmem0/queue/max_hw_sectors_kb:512

That is from:
   blk_queue_max_hw_sectors(pmem->pmem_queue, 1024);

4. These are read-writeable, but IOs never reach a queue, so 
the queue size is irrelevant and merging never happens:
/sys/block/pmem0/queue/nomerges:0
/sys/block/pmem0/queue/nr_requests:128

Consider making them both read-only with: 
* nomerges set to 2 (no merging happening) 
* nr_requests as small as the block layer allows to avoid 
wasting memory.

5. No scatter-gather lists are created by the driver, so these
read-only fields are meaningless:
/sys/block/pmem0/queue/max_segments:128
/sys/block/pmem0/queue/max_segment_size:65536

Is there a better way to report them as irrelevant?

6. There is no completion processing, so the read-writeable
cpu affinity is not used:
/sys/block/pmem0/queue/rq_affinity:0

Consider making it read-only and set to 2, meaning the
completions always run on the requesting CPU.

7. With mmap() allowing less than logical block sized accesses
to the device, this could be considered misleading:
/sys/block/pmem0/queue/physical_block_size:512

Perhaps that needs to be 1 byte or a cacheline size (64 bytes
on x86) to indicate that direct partial logical block accesses
are possible.  The btt driver could report 512 as one indication
it is different.

I wouldn't be surprised if smaller values than the logical block
size confused some software, though.

---
Robert Elliott, HP Server Storage
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 00/20] libnd: non-volatile memory device support

2015-04-28 Thread Andy Lutomirski
On Tue, Apr 28, 2015 at 1:59 PM, Dan Williams  wrote:
> On Tue, Apr 28, 2015 at 1:52 PM, Andy Lutomirski  wrote:
>> On Tue, Apr 28, 2015 at 11:24 AM, Dan Williams  
>> wrote:
>>> Changes since v1 [1]: Incorporates feedback received prior to April 24.
>>>
>>> 1/ Ingo said [2]:
>>>
>>>"So why on earth is this whole concept and the naming itself
>>>('drivers/block/nd/' stands for 'NFIT Defined', apparently)
>>>revolving around a specific 'firmware' mindset and revolving
>>>around specific, weirdly named, overly complicated looking
>>>firmware interfaces that come with their own new weird
>>>glossary??"
>>>
>>>Indeed, we of course consulted the NFIT specification to determine
>>>the shape of the sub-system, but then let its terms and data
>>>structures permeate too deep into the implementation.  That is fixed
>>>now with all NFIT specifics factored out into acpi.c.  The NFIT is no
>>>longer required reading to review libnd.  Only three concepts are
>>>needed:
>>>
>>>   i/ PMEM - contiguous memory range where cpu stores are
>>>  persistent once they are flushed through the memory
>>>  controller.
>>>
>>>  ii/ BLK - mmio apertures (sliding windows) that can be
>>>  programmed to access an aperture's-worth of persistent
>>>  media at a time.
>>>
>>> iii/ DPA - "dimm-physical-address", address space local to a
>>>  dimm.  A dimm may provide both PMEM-mode and BLK-mode
>>>  access to a range of DPA.  libnd manages allocation of DPA
>>>  to either PMEM or BLK-namespaces to resolve this aliasing.
>>
>> Mostly for my understanding: is there a name for "address relative to
>> the address lines on the DIMM"?  That is, a DIMM that exposes 8 GB of
>> apparent physical memory, possibly interleaved, broken up, or weirdly
>> remapped by the memory controller, would still have addresses between
>> 0 and 8 GB.  Some of those might be PMEM windows, some might be MMIO,
>> some might be BLK apertures, etc.
>>
>> IIUC "DPA" refers to actual addressable storage, not this type of address?
>
> No, DPA is exactly as you describe above.  You can't directly access
> it except through a PMEM mapping (possibly interleaved with DPA from
> other DIMMs) or a BLK aperture (mmio window into DPA).

So the thing I'm describing has no name, then?  Oh, well.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 00/20] libnd: non-volatile memory device support

2015-04-28 Thread Dan Williams
On Tue, Apr 28, 2015 at 1:52 PM, Andy Lutomirski  wrote:
> On Tue, Apr 28, 2015 at 11:24 AM, Dan Williams  
> wrote:
>> Changes since v1 [1]: Incorporates feedback received prior to April 24.
>>
>> 1/ Ingo said [2]:
>>
>>"So why on earth is this whole concept and the naming itself
>>('drivers/block/nd/' stands for 'NFIT Defined', apparently)
>>revolving around a specific 'firmware' mindset and revolving
>>around specific, weirdly named, overly complicated looking
>>firmware interfaces that come with their own new weird
>>glossary??"
>>
>>Indeed, we of course consulted the NFIT specification to determine
>>the shape of the sub-system, but then let its terms and data
>>structures permeate too deep into the implementation.  That is fixed
>>now with all NFIT specifics factored out into acpi.c.  The NFIT is no
>>longer required reading to review libnd.  Only three concepts are
>>needed:
>>
>>   i/ PMEM - contiguous memory range where cpu stores are
>>  persistent once they are flushed through the memory
>>  controller.
>>
>>  ii/ BLK - mmio apertures (sliding windows) that can be
>>  programmed to access an aperture's-worth of persistent
>>  media at a time.
>>
>> iii/ DPA - "dimm-physical-address", address space local to a
>>  dimm.  A dimm may provide both PMEM-mode and BLK-mode
>>  access to a range of DPA.  libnd manages allocation of DPA
>>  to either PMEM or BLK-namespaces to resolve this aliasing.
>
> Mostly for my understanding: is there a name for "address relative to
> the address lines on the DIMM"?  That is, a DIMM that exposes 8 GB of
> apparent physical memory, possibly interleaved, broken up, or weirdly
> remapped by the memory controller, would still have addresses between
> 0 and 8 GB.  Some of those might be PMEM windows, some might be MMIO,
> some might be BLK apertures, etc.
>
> IIUC "DPA" refers to actual addressable storage, not this type of address?

No, DPA is exactly as you describe above.  You can't directly access
it except through a PMEM mapping (possibly interleaved with DPA from
other DIMMs) or a BLK aperture (mmio window into DPA).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 00/20] libnd: non-volatile memory device support

2015-04-28 Thread Andy Lutomirski
On Tue, Apr 28, 2015 at 11:24 AM, Dan Williams  wrote:
> Changes since v1 [1]: Incorporates feedback received prior to April 24.
>
> 1/ Ingo said [2]:
>
>"So why on earth is this whole concept and the naming itself
>('drivers/block/nd/' stands for 'NFIT Defined', apparently)
>revolving around a specific 'firmware' mindset and revolving
>around specific, weirdly named, overly complicated looking
>firmware interfaces that come with their own new weird
>glossary??"
>
>Indeed, we of course consulted the NFIT specification to determine
>the shape of the sub-system, but then let its terms and data
>structures permeate too deep into the implementation.  That is fixed
>now with all NFIT specifics factored out into acpi.c.  The NFIT is no
>longer required reading to review libnd.  Only three concepts are
>needed:
>
>   i/ PMEM - contiguous memory range where cpu stores are
>  persistent once they are flushed through the memory
>  controller.
>
>  ii/ BLK - mmio apertures (sliding windows) that can be
>  programmed to access an aperture's-worth of persistent
>  media at a time.
>
> iii/ DPA - "dimm-physical-address", address space local to a
>  dimm.  A dimm may provide both PMEM-mode and BLK-mode
>  access to a range of DPA.  libnd manages allocation of DPA
>  to either PMEM or BLK-namespaces to resolve this aliasing.

Mostly for my understanding: is there a name for "address relative to
the address lines on the DIMM"?  That is, a DIMM that exposes 8 GB of
apparent physical memory, possibly interleaved, broken up, or weirdly
remapped by the memory controller, would still have addresses between
0 and 8 GB.  Some of those might be PMEM windows, some might be MMIO,
some might be BLK apertures, etc.

IIUC "DPA" refers to actual addressable storage, not this type of address?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 00/20] libnd: non-volatile memory device support

2015-04-28 Thread Dan Williams
Changes since v1 [1]: Incorporates feedback received prior to April 24.

1/ Ingo said [2]:

   "So why on earth is this whole concept and the naming itself
   ('drivers/block/nd/' stands for 'NFIT Defined', apparently)
   revolving around a specific 'firmware' mindset and revolving
   around specific, weirdly named, overly complicated looking
   firmware interfaces that come with their own new weird
   glossary??"

   Indeed, we of course consulted the NFIT specification to determine
   the shape of the sub-system, but then let its terms and data
   structures permeate too deep into the implementation.  That is fixed
   now with all NFIT specifics factored out into acpi.c.  The NFIT is no
   longer required reading to review libnd.  Only three concepts are
   needed:

  i/ PMEM - contiguous memory range where cpu stores are
 persistent once they are flushed through the memory
 controller.

 ii/ BLK - mmio apertures (sliding windows) that can be
 programmed to access an aperture's-worth of persistent
 media at a time.

iii/ DPA - "dimm-physical-address", address space local to a
 dimm.  A dimm may provide both PMEM-mode and BLK-mode
 access to a range of DPA.  libnd manages allocation of DPA
 to either PMEM or BLK-namespaces to resolve this aliasing. 

   The v1..v2 diffstat below shows the migration of nfit-specifics to
   acpi.c and the new state of libnd being nfit-free.  "nd" now only
   refers to "non-volatile devices".  Note, reworked documentation will
   return once the review has settled.

   Documentation/blockdev/nd.txt |  867 -
   MAINTAINERS   |   34 +-
   arch/ia64/kernel/efi.c|5 +-
   arch/x86/kernel/e820.c|   11 +-
   arch/x86/kernel/pmem.c|2 +-
   drivers/block/Makefile|2 +-
   drivers/block/nd/Kconfig  |  135 ++--
   drivers/block/nd/Makefile |   32 +-
   drivers/block/nd/acpi.c   | 1506 
+++--
   drivers/block/nd/acpi_nfit.h  |  321 
   drivers/block/nd/blk.c|   27 +-
   drivers/block/nd/btt.c|6 +-
   drivers/block/nd/btt_devs.c   |8 +-
   drivers/block/nd/bus.c|  337 +
   drivers/block/nd/core.c   |  574 +-
   drivers/block/nd/dimm.c   |   11 -
   drivers/block/nd/dimm_devs.c  |  292 ++-
   drivers/block/nd/e820.c   |  100 +++
   drivers/block/nd/libnd.h  |  122 +++
   drivers/block/nd/namespace_devs.c |   10 +-
   drivers/block/nd/nd-private.h |  107 +--
   drivers/block/nd/nd.h |   91 +--
   drivers/block/nd/nfit.h   |  238 --
   drivers/block/nd/pmem.c   |   56 +-
   drivers/block/nd/region.c |   78 +-
   drivers/block/nd/region_devs.c|  783 +++
   drivers/block/nd/test/iomap.c |   86 +--
   drivers/block/nd/test/nfit.c  | 1115 +++
   drivers/block/nd/test/nfit_test.h |   15 +-
   include/uapi/linux/ndctl.h|  130 ++--
   30 files changed, 3166 insertions(+), 3935 deletions(-)
   delete mode 100644 Documentation/blockdev/nd.txt
   create mode 100644 drivers/block/nd/acpi_nfit.h
   create mode 100644 drivers/block/nd/e820.c
   create mode 100644 drivers/block/nd/libnd.h
   delete mode 100644 drivers/block/nd/nfit.h

   [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-April/000484.html
   [2]: https://lists.01.org/pipermail/linux-nvdimm/2015-April/000520.html

2/ Christoph asked the pmem ida conversion to be moved to its own patch
   (done), and to consider leaving the current pmem.c in drivers/block/.
   Instead, I converted the e820-type-12 enabling to be the first
   non-ACPI-NFIT based consumer of libnd.  The new nd_e820 driver simply
   registers e820-type-12 ranges as libnd PMEM regions.  Among other
   things this conversion enables BTT for these ranges.  The alternative
   is to move drivers/block/nd/nd.h internals out to include/linux/
   which I think is worse.

3/ Toshi reported that the NFIT parsing fails to handle the case of a
   PMEM range with a single-dimm (non-aliasing) interleave description.
   Support for this case was added and is tested by default by the
   nfit_test.1 configuration.

4/ Toshi reported that we should not be treating a missing _STA property
   as a "dimm disabled by firmware" case.  (fixed).

5/ Christoph noted that ND_ARCH_HAS_IOREMAP_CACHE needs to be moved to
   arch code.  It is gone for now and we'll revisit when adding cached
   mappings back to the PMEM driver.

6/ Toshi mentioned that the presence of two different nd_bus_probe()
   functions was confusing.  (cleaned up).

7/ Robert asked for s/btt_checksum/nd_btt_checksum/ (done).

8/ Linda asked for nfit_test to honor dynamic cma reservations via the
   cma= command line (done).  The cma requirements have also been
   

Re: [PATCH v2 00/20] libnd: non-volatile memory device support

2015-04-28 Thread Andy Lutomirski
On Tue, Apr 28, 2015 at 3:28 PM, Dan Williams dan.j.willi...@intel.com wrote:
 On Tue, Apr 28, 2015 at 2:06 PM, Andy Lutomirski l...@amacapital.net wrote:
 On Tue, Apr 28, 2015 at 1:59 PM, Dan Williams dan.j.willi...@intel.com 
 wrote:
 On Tue, Apr 28, 2015 at 1:52 PM, Andy Lutomirski l...@amacapital.net 
 wrote:
 On Tue, Apr 28, 2015 at 11:24 AM, Dan Williams dan.j.willi...@intel.com 
 wrote:
 Changes since v1 [1]: Incorporates feedback received prior to April 24.

 1/ Ingo said [2]:

So why on earth is this whole concept and the naming itself
('drivers/block/nd/' stands for 'NFIT Defined', apparently)
revolving around a specific 'firmware' mindset and revolving
around specific, weirdly named, overly complicated looking
firmware interfaces that come with their own new weird
glossary??

Indeed, we of course consulted the NFIT specification to determine
the shape of the sub-system, but then let its terms and data
structures permeate too deep into the implementation.  That is fixed
now with all NFIT specifics factored out into acpi.c.  The NFIT is no
longer required reading to review libnd.  Only three concepts are
needed:

   i/ PMEM - contiguous memory range where cpu stores are
  persistent once they are flushed through the memory
  controller.

  ii/ BLK - mmio apertures (sliding windows) that can be
  programmed to access an aperture's-worth of persistent
  media at a time.

 iii/ DPA - dimm-physical-address, address space local to a
  dimm.  A dimm may provide both PMEM-mode and BLK-mode
  access to a range of DPA.  libnd manages allocation of DPA
  to either PMEM or BLK-namespaces to resolve this aliasing.

 Mostly for my understanding: is there a name for address relative to
 the address lines on the DIMM?  That is, a DIMM that exposes 8 GB of
 apparent physical memory, possibly interleaved, broken up, or weirdly
 remapped by the memory controller, would still have addresses between
 0 and 8 GB.  Some of those might be PMEM windows, some might be MMIO,
 some might be BLK apertures, etc.

 IIUC DPA refers to actual addressable storage, not this type of address?

 No, DPA is exactly as you describe above.  You can't directly access
 it except through a PMEM mapping (possibly interleaved with DPA from
 other DIMMs) or a BLK aperture (mmio window into DPA).

 So the thing I'm describing has no name, then?  Oh, well.

 What?  The thing you are describing *is* DPA.

I'm confused.  Here are the two things I have in mind:

1. An address into on-DIMM storage.  If I have a DIMM that is mapped
to 8 GB of SPA but has 64 GB of usable storage (accessed through BLK
apertures, say), then this address runs from 0 to 64 GB.

2. An address into the DIMM's view of physical address space.  If I
have a DIMM that is mapped to 8 GB of SPA but has 64 GB of usable
storage (accessed through BLK apertures, say), then this address runs
from 0 to 8 GB.  There's a one-to-one mapping between SPA and this
type of address.

Since you said a dimm may provide both PMEM-mode and BLK-mode access
to a range of DPA., I thought that DPA was #1.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 00/20] libnd: non-volatile memory device support

2015-04-28 Thread Dan Williams
On Tue, Apr 28, 2015 at 5:25 PM, Rafael J. Wysocki r...@rjwysocki.net wrote:
 On Tuesday, April 28, 2015 02:24:12 PM Dan Williams wrote:
 Changes since v1 [1]: Incorporates feedback received prior to April 24.

 1/ Ingo said [2]:

So why on earth is this whole concept and the naming itself
('drivers/block/nd/' stands for 'NFIT Defined', apparently)
revolving around a specific 'firmware' mindset and revolving
around specific, weirdly named, overly complicated looking
firmware interfaces that come with their own new weird
glossary??

Indeed, we of course consulted the NFIT specification to determine
the shape of the sub-system, but then let its terms and data
structures permeate too deep into the implementation.  That is fixed
now with all NFIT specifics factored out into acpi.c.  The NFIT is no
longer required reading to review libnd.  Only three concepts are
needed:

   i/ PMEM - contiguous memory range where cpu stores are
persistent once they are flushed through the memory
  controller.

  ii/ BLK - mmio apertures (sliding windows) that can be
programmed to access an aperture's-worth of persistent
  media at a time.

 iii/ DPA - dimm-physical-address, address space local to a
dimm.  A dimm may provide both PMEM-mode and BLK-mode
access to a range of DPA.  libnd manages allocation of DPA
  to either PMEM or BLK-namespaces to resolve this aliasing.

The v1..v2 diffstat below shows the migration of nfit-specifics to
acpi.c and the new state of libnd being nfit-free.  nd now only
refers to non-volatile devices.  Note, reworked documentation will
return once the review has settled.

Documentation/blockdev/nd.txt |  867 -
MAINTAINERS   |   34 +-
arch/ia64/kernel/efi.c|5 +-
arch/x86/kernel/e820.c|   11 +-
arch/x86/kernel/pmem.c|2 +-
drivers/block/Makefile|2 +-
drivers/block/nd/Kconfig  |  135 ++--
drivers/block/nd/Makefile |   32 +-
drivers/block/nd/acpi.c   | 1506 
 +++--
drivers/block/nd/acpi_nfit.h  |  321 
drivers/block/nd/blk.c|   27 +-
drivers/block/nd/btt.c|6 +-
drivers/block/nd/btt_devs.c   |8 +-
drivers/block/nd/bus.c|  337 +
drivers/block/nd/core.c   |  574 +-
drivers/block/nd/dimm.c   |   11 -
drivers/block/nd/dimm_devs.c  |  292 ++-
drivers/block/nd/e820.c   |  100 +++
drivers/block/nd/libnd.h  |  122 +++
drivers/block/nd/namespace_devs.c |   10 +-
drivers/block/nd/nd-private.h |  107 +--
drivers/block/nd/nd.h |   91 +--
drivers/block/nd/nfit.h   |  238 --
drivers/block/nd/pmem.c   |   56 +-
drivers/block/nd/region.c |   78 +-
drivers/block/nd/region_devs.c|  783 +++
drivers/block/nd/test/iomap.c |   86 +--
drivers/block/nd/test/nfit.c  | 1115 +++
drivers/block/nd/test/nfit_test.h |   15 +-
include/uapi/linux/ndctl.h|  130 ++--
30 files changed, 3166 insertions(+), 3935 deletions(-)
delete mode 100644 Documentation/blockdev/nd.txt
create mode 100644 drivers/block/nd/acpi_nfit.h
create mode 100644 drivers/block/nd/e820.c
create mode 100644 drivers/block/nd/libnd.h
delete mode 100644 drivers/block/nd/nfit.h

[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-April/000484.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2015-April/000520.html

 2/ Christoph asked the pmem ida conversion to be moved to its own patch
(done), and to consider leaving the current pmem.c in drivers/block/.
Instead, I converted the e820-type-12 enabling to be the first
non-ACPI-NFIT based consumer of libnd.  The new nd_e820 driver simply
registers e820-type-12 ranges as libnd PMEM regions.  Among other
things this conversion enables BTT for these ranges.  The alternative
is to move drivers/block/nd/nd.h internals out to include/linux/
which I think is worse.

 3/ Toshi reported that the NFIT parsing fails to handle the case of a
PMEM range with a single-dimm (non-aliasing) interleave description.
Support for this case was added and is tested by default by the
nfit_test.1 configuration.

 4/ Toshi reported that we should not be treating a missing _STA property
as a dimm disabled by firmware case.  (fixed).

 5/ Christoph noted that ND_ARCH_HAS_IOREMAP_CACHE needs to be moved to
arch code.  It is gone for now and we'll revisit when adding cached
mappings back to the PMEM driver.

 6/ Toshi mentioned that the presence of two different nd_bus_probe()
functions was confusing.  (cleaned 

Re: [PATCH v2 00/20] libnd: non-volatile memory device support

2015-04-28 Thread Rafael J. Wysocki
On Tuesday, April 28, 2015 02:24:12 PM Dan Williams wrote:
 Changes since v1 [1]: Incorporates feedback received prior to April 24.
 
 1/ Ingo said [2]:
 
So why on earth is this whole concept and the naming itself
('drivers/block/nd/' stands for 'NFIT Defined', apparently)
revolving around a specific 'firmware' mindset and revolving
around specific, weirdly named, overly complicated looking
firmware interfaces that come with their own new weird
glossary??
 
Indeed, we of course consulted the NFIT specification to determine
the shape of the sub-system, but then let its terms and data
structures permeate too deep into the implementation.  That is fixed
now with all NFIT specifics factored out into acpi.c.  The NFIT is no
longer required reading to review libnd.  Only three concepts are
needed:
 
   i/ PMEM - contiguous memory range where cpu stores are
persistent once they are flushed through the memory
  controller.
 
  ii/ BLK - mmio apertures (sliding windows) that can be
programmed to access an aperture's-worth of persistent
  media at a time.
 
 iii/ DPA - dimm-physical-address, address space local to a
dimm.  A dimm may provide both PMEM-mode and BLK-mode
access to a range of DPA.  libnd manages allocation of DPA
  to either PMEM or BLK-namespaces to resolve this aliasing. 
 
The v1..v2 diffstat below shows the migration of nfit-specifics to
acpi.c and the new state of libnd being nfit-free.  nd now only
refers to non-volatile devices.  Note, reworked documentation will
return once the review has settled.
 
Documentation/blockdev/nd.txt |  867 -
MAINTAINERS   |   34 +-
arch/ia64/kernel/efi.c|5 +-
arch/x86/kernel/e820.c|   11 +-
arch/x86/kernel/pmem.c|2 +-
drivers/block/Makefile|2 +-
drivers/block/nd/Kconfig  |  135 ++--
drivers/block/nd/Makefile |   32 +-
drivers/block/nd/acpi.c   | 1506 
 +++--
drivers/block/nd/acpi_nfit.h  |  321 
drivers/block/nd/blk.c|   27 +-
drivers/block/nd/btt.c|6 +-
drivers/block/nd/btt_devs.c   |8 +-
drivers/block/nd/bus.c|  337 +
drivers/block/nd/core.c   |  574 +-
drivers/block/nd/dimm.c   |   11 -
drivers/block/nd/dimm_devs.c  |  292 ++-
drivers/block/nd/e820.c   |  100 +++
drivers/block/nd/libnd.h  |  122 +++
drivers/block/nd/namespace_devs.c |   10 +-
drivers/block/nd/nd-private.h |  107 +--
drivers/block/nd/nd.h |   91 +--
drivers/block/nd/nfit.h   |  238 --
drivers/block/nd/pmem.c   |   56 +-
drivers/block/nd/region.c |   78 +-
drivers/block/nd/region_devs.c|  783 +++
drivers/block/nd/test/iomap.c |   86 +--
drivers/block/nd/test/nfit.c  | 1115 +++
drivers/block/nd/test/nfit_test.h |   15 +-
include/uapi/linux/ndctl.h|  130 ++--
30 files changed, 3166 insertions(+), 3935 deletions(-)
delete mode 100644 Documentation/blockdev/nd.txt
create mode 100644 drivers/block/nd/acpi_nfit.h
create mode 100644 drivers/block/nd/e820.c
create mode 100644 drivers/block/nd/libnd.h
delete mode 100644 drivers/block/nd/nfit.h
 
[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-April/000484.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2015-April/000520.html
 
 2/ Christoph asked the pmem ida conversion to be moved to its own patch
(done), and to consider leaving the current pmem.c in drivers/block/.
Instead, I converted the e820-type-12 enabling to be the first
non-ACPI-NFIT based consumer of libnd.  The new nd_e820 driver simply
registers e820-type-12 ranges as libnd PMEM regions.  Among other
things this conversion enables BTT for these ranges.  The alternative
is to move drivers/block/nd/nd.h internals out to include/linux/
which I think is worse.
 
 3/ Toshi reported that the NFIT parsing fails to handle the case of a
PMEM range with a single-dimm (non-aliasing) interleave description.
Support for this case was added and is tested by default by the
nfit_test.1 configuration.
 
 4/ Toshi reported that we should not be treating a missing _STA property
as a dimm disabled by firmware case.  (fixed).
 
 5/ Christoph noted that ND_ARCH_HAS_IOREMAP_CACHE needs to be moved to
arch code.  It is gone for now and we'll revisit when adding cached
mappings back to the PMEM driver.
 
 6/ Toshi mentioned that the presence of two different nd_bus_probe()
functions was confusing.  (cleaned up).
 
 7/ Robert asked for s/btt_checksum/nd_btt_checksum/ 

Re: [PATCH v2 00/20] libnd: non-volatile memory device support

2015-04-28 Thread Andy Lutomirski
On Tue, Apr 28, 2015 at 1:59 PM, Dan Williams dan.j.willi...@intel.com wrote:
 On Tue, Apr 28, 2015 at 1:52 PM, Andy Lutomirski l...@amacapital.net wrote:
 On Tue, Apr 28, 2015 at 11:24 AM, Dan Williams dan.j.willi...@intel.com 
 wrote:
 Changes since v1 [1]: Incorporates feedback received prior to April 24.

 1/ Ingo said [2]:

So why on earth is this whole concept and the naming itself
('drivers/block/nd/' stands for 'NFIT Defined', apparently)
revolving around a specific 'firmware' mindset and revolving
around specific, weirdly named, overly complicated looking
firmware interfaces that come with their own new weird
glossary??

Indeed, we of course consulted the NFIT specification to determine
the shape of the sub-system, but then let its terms and data
structures permeate too deep into the implementation.  That is fixed
now with all NFIT specifics factored out into acpi.c.  The NFIT is no
longer required reading to review libnd.  Only three concepts are
needed:

   i/ PMEM - contiguous memory range where cpu stores are
  persistent once they are flushed through the memory
  controller.

  ii/ BLK - mmio apertures (sliding windows) that can be
  programmed to access an aperture's-worth of persistent
  media at a time.

 iii/ DPA - dimm-physical-address, address space local to a
  dimm.  A dimm may provide both PMEM-mode and BLK-mode
  access to a range of DPA.  libnd manages allocation of DPA
  to either PMEM or BLK-namespaces to resolve this aliasing.

 Mostly for my understanding: is there a name for address relative to
 the address lines on the DIMM?  That is, a DIMM that exposes 8 GB of
 apparent physical memory, possibly interleaved, broken up, or weirdly
 remapped by the memory controller, would still have addresses between
 0 and 8 GB.  Some of those might be PMEM windows, some might be MMIO,
 some might be BLK apertures, etc.

 IIUC DPA refers to actual addressable storage, not this type of address?

 No, DPA is exactly as you describe above.  You can't directly access
 it except through a PMEM mapping (possibly interleaved with DPA from
 other DIMMs) or a BLK aperture (mmio window into DPA).

So the thing I'm describing has no name, then?  Oh, well.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 00/20] libnd: non-volatile memory device support

2015-04-28 Thread Dan Williams
On Tue, Apr 28, 2015 at 1:52 PM, Andy Lutomirski l...@amacapital.net wrote:
 On Tue, Apr 28, 2015 at 11:24 AM, Dan Williams dan.j.willi...@intel.com 
 wrote:
 Changes since v1 [1]: Incorporates feedback received prior to April 24.

 1/ Ingo said [2]:

So why on earth is this whole concept and the naming itself
('drivers/block/nd/' stands for 'NFIT Defined', apparently)
revolving around a specific 'firmware' mindset and revolving
around specific, weirdly named, overly complicated looking
firmware interfaces that come with their own new weird
glossary??

Indeed, we of course consulted the NFIT specification to determine
the shape of the sub-system, but then let its terms and data
structures permeate too deep into the implementation.  That is fixed
now with all NFIT specifics factored out into acpi.c.  The NFIT is no
longer required reading to review libnd.  Only three concepts are
needed:

   i/ PMEM - contiguous memory range where cpu stores are
  persistent once they are flushed through the memory
  controller.

  ii/ BLK - mmio apertures (sliding windows) that can be
  programmed to access an aperture's-worth of persistent
  media at a time.

 iii/ DPA - dimm-physical-address, address space local to a
  dimm.  A dimm may provide both PMEM-mode and BLK-mode
  access to a range of DPA.  libnd manages allocation of DPA
  to either PMEM or BLK-namespaces to resolve this aliasing.

 Mostly for my understanding: is there a name for address relative to
 the address lines on the DIMM?  That is, a DIMM that exposes 8 GB of
 apparent physical memory, possibly interleaved, broken up, or weirdly
 remapped by the memory controller, would still have addresses between
 0 and 8 GB.  Some of those might be PMEM windows, some might be MMIO,
 some might be BLK apertures, etc.

 IIUC DPA refers to actual addressable storage, not this type of address?

No, DPA is exactly as you describe above.  You can't directly access
it except through a PMEM mapping (possibly interleaved with DPA from
other DIMMs) or a BLK aperture (mmio window into DPA).
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 00/20] libnd: non-volatile memory device support

2015-04-28 Thread Dan Williams
On Tue, Apr 28, 2015 at 2:06 PM, Andy Lutomirski l...@amacapital.net wrote:
 On Tue, Apr 28, 2015 at 1:59 PM, Dan Williams dan.j.willi...@intel.com 
 wrote:
 On Tue, Apr 28, 2015 at 1:52 PM, Andy Lutomirski l...@amacapital.net wrote:
 On Tue, Apr 28, 2015 at 11:24 AM, Dan Williams dan.j.willi...@intel.com 
 wrote:
 Changes since v1 [1]: Incorporates feedback received prior to April 24.

 1/ Ingo said [2]:

So why on earth is this whole concept and the naming itself
('drivers/block/nd/' stands for 'NFIT Defined', apparently)
revolving around a specific 'firmware' mindset and revolving
around specific, weirdly named, overly complicated looking
firmware interfaces that come with their own new weird
glossary??

Indeed, we of course consulted the NFIT specification to determine
the shape of the sub-system, but then let its terms and data
structures permeate too deep into the implementation.  That is fixed
now with all NFIT specifics factored out into acpi.c.  The NFIT is no
longer required reading to review libnd.  Only three concepts are
needed:

   i/ PMEM - contiguous memory range where cpu stores are
  persistent once they are flushed through the memory
  controller.

  ii/ BLK - mmio apertures (sliding windows) that can be
  programmed to access an aperture's-worth of persistent
  media at a time.

 iii/ DPA - dimm-physical-address, address space local to a
  dimm.  A dimm may provide both PMEM-mode and BLK-mode
  access to a range of DPA.  libnd manages allocation of DPA
  to either PMEM or BLK-namespaces to resolve this aliasing.

 Mostly for my understanding: is there a name for address relative to
 the address lines on the DIMM?  That is, a DIMM that exposes 8 GB of
 apparent physical memory, possibly interleaved, broken up, or weirdly
 remapped by the memory controller, would still have addresses between
 0 and 8 GB.  Some of those might be PMEM windows, some might be MMIO,
 some might be BLK apertures, etc.

 IIUC DPA refers to actual addressable storage, not this type of address?

 No, DPA is exactly as you describe above.  You can't directly access
 it except through a PMEM mapping (possibly interleaved with DPA from
 other DIMMs) or a BLK aperture (mmio window into DPA).

 So the thing I'm describing has no name, then?  Oh, well.

What?  The thing you are describing *is* DPA.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 00/20] libnd: non-volatile memory device support

2015-04-28 Thread Andy Lutomirski
On Tue, Apr 28, 2015 at 11:24 AM, Dan Williams dan.j.willi...@intel.com wrote:
 Changes since v1 [1]: Incorporates feedback received prior to April 24.

 1/ Ingo said [2]:

So why on earth is this whole concept and the naming itself
('drivers/block/nd/' stands for 'NFIT Defined', apparently)
revolving around a specific 'firmware' mindset and revolving
around specific, weirdly named, overly complicated looking
firmware interfaces that come with their own new weird
glossary??

Indeed, we of course consulted the NFIT specification to determine
the shape of the sub-system, but then let its terms and data
structures permeate too deep into the implementation.  That is fixed
now with all NFIT specifics factored out into acpi.c.  The NFIT is no
longer required reading to review libnd.  Only three concepts are
needed:

   i/ PMEM - contiguous memory range where cpu stores are
  persistent once they are flushed through the memory
  controller.

  ii/ BLK - mmio apertures (sliding windows) that can be
  programmed to access an aperture's-worth of persistent
  media at a time.

 iii/ DPA - dimm-physical-address, address space local to a
  dimm.  A dimm may provide both PMEM-mode and BLK-mode
  access to a range of DPA.  libnd manages allocation of DPA
  to either PMEM or BLK-namespaces to resolve this aliasing.

Mostly for my understanding: is there a name for address relative to
the address lines on the DIMM?  That is, a DIMM that exposes 8 GB of
apparent physical memory, possibly interleaved, broken up, or weirdly
remapped by the memory controller, would still have addresses between
0 and 8 GB.  Some of those might be PMEM windows, some might be MMIO,
some might be BLK apertures, etc.

IIUC DPA refers to actual addressable storage, not this type of address?

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device support

2015-04-28 Thread Elliott, Robert (Server Storage)
 -Original Message-
 From: Linux-nvdimm [mailto:linux-nvdimm-boun...@lists.01.org] On Behalf Of
 Dan Williams
 Sent: Tuesday, April 28, 2015 1:24 PM
 To: linux-nvd...@lists.01.org
 Cc: Neil Brown; Dave Chinner; H. Peter Anvin; Christoph Hellwig; Rafael J.
 Wysocki; Robert Moore; Ingo Molnar; linux-a...@vger.kernel.org; Jens Axboe;
 Borislav Petkov; Thomas Gleixner; Greg KH; linux-kernel@vger.kernel.org;
 Andy Lutomirski; Andrew Morton; Linus Torvalds
 Subject: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device
 support
 
 Changes since v1 [1]: Incorporates feedback received prior to April 24.

Here are some comments on the sysfs properties reported for a pmem device.
They are based on v1, but I don't think v2 changes anything.

1. This confuses lsblk (part of util-linux):
/sys/block/pmem0/device/type:4

lsblk shows:
NAME  MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
pmem0 251:00 8G  0 worm
pmem1 251:16   0 8G  0 worm
pmem2 251:32   0 8G  0 worm
pmem3 251:48   0 8G  0 worm
pmem4 251:64   0 8G  0 worm
pmem5 251:80   0 8G  0 worm
pmem6 251:96   0 8G  0 worm
pmem7 251:112  0 8G  0 worm

lsblk's blkdev_scsi_type_to_name() considers 4 to mean 
SCSI_TYPE_WORM (write once read many ... used for certain optical
and tape drives).

I'm not sure what nd and pmem are doing to result in that value.

2. To avoid confusing software trying to detect fast storage vs.
slow storage devices via sysfs, this value should be 0:
/sys/block/pmem0/queue/rotational:1

That can be done by adding this shortly after the blk_alloc_queue call:
queue_flag_set_unlocked(QUEUE_FLAG_NONROT, pmem-pmem_queue);

3. Is there any reason to have a 512 KiB limit on the transfer
length?
/sys/block/pmem0/queue/max_hw_sectors_kb:512

That is from:
   blk_queue_max_hw_sectors(pmem-pmem_queue, 1024);

4. These are read-writeable, but IOs never reach a queue, so 
the queue size is irrelevant and merging never happens:
/sys/block/pmem0/queue/nomerges:0
/sys/block/pmem0/queue/nr_requests:128

Consider making them both read-only with: 
* nomerges set to 2 (no merging happening) 
* nr_requests as small as the block layer allows to avoid 
wasting memory.

5. No scatter-gather lists are created by the driver, so these
read-only fields are meaningless:
/sys/block/pmem0/queue/max_segments:128
/sys/block/pmem0/queue/max_segment_size:65536

Is there a better way to report them as irrelevant?

6. There is no completion processing, so the read-writeable
cpu affinity is not used:
/sys/block/pmem0/queue/rq_affinity:0

Consider making it read-only and set to 2, meaning the
completions always run on the requesting CPU.

7. With mmap() allowing less than logical block sized accesses
to the device, this could be considered misleading:
/sys/block/pmem0/queue/physical_block_size:512

Perhaps that needs to be 1 byte or a cacheline size (64 bytes
on x86) to indicate that direct partial logical block accesses
are possible.  The btt driver could report 512 as one indication
it is different.

I wouldn't be surprised if smaller values than the logical block
size confused some software, though.

---
Robert Elliott, HP Server Storage
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device support

2015-04-28 Thread Dan Williams
On Tue, Apr 28, 2015 at 2:24 PM, Elliott, Robert (Server Storage)
elli...@hp.com wrote:
 -Original Message-
 From: Linux-nvdimm [mailto:linux-nvdimm-boun...@lists.01.org] On Behalf Of
 Dan Williams
 Sent: Tuesday, April 28, 2015 1:24 PM
 To: linux-nvd...@lists.01.org
 Cc: Neil Brown; Dave Chinner; H. Peter Anvin; Christoph Hellwig; Rafael J.
 Wysocki; Robert Moore; Ingo Molnar; linux-a...@vger.kernel.org; Jens Axboe;
 Borislav Petkov; Thomas Gleixner; Greg KH; linux-kernel@vger.kernel.org;
 Andy Lutomirski; Andrew Morton; Linus Torvalds
 Subject: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device
 support

 Changes since v1 [1]: Incorporates feedback received prior to April 24.

 Here are some comments on the sysfs properties reported for a pmem device.
 They are based on v1, but I don't think v2 changes anything.

 1. This confuses lsblk (part of util-linux):
 /sys/block/pmem0/device/type:4

 lsblk shows:
 NAME  MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
 pmem0 251:00 8G  0 worm
 pmem1 251:16   0 8G  0 worm
 pmem2 251:32   0 8G  0 worm
 pmem3 251:48   0 8G  0 worm
 pmem4 251:64   0 8G  0 worm
 pmem5 251:80   0 8G  0 worm
 pmem6 251:96   0 8G  0 worm
 pmem7 251:112  0 8G  0 worm

 lsblk's blkdev_scsi_type_to_name() considers 4 to mean
 SCSI_TYPE_WORM (write once read many ... used for certain optical
 and tape drives).

Why is lsblk assuming these are scsi devices?  I'll need to go check that out.

 I'm not sure what nd and pmem are doing to result in that value.

That is their libnd specific device type number from
include/uapi/ndctl.h.  4 == ND_DEVICE_NAMESPACE_IO.   lsblk has no
business interpreting this as something SCSI specific.

 2. To avoid confusing software trying to detect fast storage vs.
 slow storage devices via sysfs, this value should be 0:
 /sys/block/pmem0/queue/rotational:1

 That can be done by adding this shortly after the blk_alloc_queue call:
 queue_flag_set_unlocked(QUEUE_FLAG_NONROT, pmem-pmem_queue);

Yeah, good catch.

 3. Is there any reason to have a 512 KiB limit on the transfer
 length?
 /sys/block/pmem0/queue/max_hw_sectors_kb:512

 That is from:
blk_queue_max_hw_sectors(pmem-pmem_queue, 1024);

I'd only change this from the default if performance testing showed it
made a non-trivial difference.

 4. These are read-writeable, but IOs never reach a queue, so
 the queue size is irrelevant and merging never happens:
 /sys/block/pmem0/queue/nomerges:0
 /sys/block/pmem0/queue/nr_requests:128

 Consider making them both read-only with:
 * nomerges set to 2 (no merging happening)
 * nr_requests as small as the block layer allows to avoid
 wasting memory.

 5. No scatter-gather lists are created by the driver, so these
 read-only fields are meaningless:
 /sys/block/pmem0/queue/max_segments:128
 /sys/block/pmem0/queue/max_segment_size:65536

 Is there a better way to report them as irrelevant?

Again it comes back to the question of whether these default settings
are actively harmful.


 6. There is no completion processing, so the read-writeable
 cpu affinity is not used:
 /sys/block/pmem0/queue/rq_affinity:0

 Consider making it read-only and set to 2, meaning the
 completions always run on the requesting CPU.

There are no completions with pmem, the entire I/O path is
synchronous.  Ideally, this attribute would disappear for a pmem
queue, not be set to 2.

 7. With mmap() allowing less than logical block sized accesses
 to the device, this could be considered misleading:
 /sys/block/pmem0/queue/physical_block_size:512

I don't see how it is misleading.  If you access it as a block device
the block size is 512.  If the application is mmap() + DAX aware it
knows that the physical_block_size is being bypassed.


 Perhaps that needs to be 1 byte or a cacheline size (64 bytes
 on x86) to indicate that direct partial logical block accesses
 are possible.

No, because that breaks the definition of a block device.  Through the
bdev interface it's always accessed a block at a time.

 The btt driver could report 512 as one indication
 it is different.

 I wouldn't be surprised if smaller values than the logical block
 size confused some software, though.

Precisely why we shouldn't go there with pmem.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 00/20] libnd: non-volatile memory device support

2015-04-28 Thread Dan Williams
Changes since v1 [1]: Incorporates feedback received prior to April 24.

1/ Ingo said [2]:

   So why on earth is this whole concept and the naming itself
   ('drivers/block/nd/' stands for 'NFIT Defined', apparently)
   revolving around a specific 'firmware' mindset and revolving
   around specific, weirdly named, overly complicated looking
   firmware interfaces that come with their own new weird
   glossary??

   Indeed, we of course consulted the NFIT specification to determine
   the shape of the sub-system, but then let its terms and data
   structures permeate too deep into the implementation.  That is fixed
   now with all NFIT specifics factored out into acpi.c.  The NFIT is no
   longer required reading to review libnd.  Only three concepts are
   needed:

  i/ PMEM - contiguous memory range where cpu stores are
 persistent once they are flushed through the memory
 controller.

 ii/ BLK - mmio apertures (sliding windows) that can be
 programmed to access an aperture's-worth of persistent
 media at a time.

iii/ DPA - dimm-physical-address, address space local to a
 dimm.  A dimm may provide both PMEM-mode and BLK-mode
 access to a range of DPA.  libnd manages allocation of DPA
 to either PMEM or BLK-namespaces to resolve this aliasing. 

   The v1..v2 diffstat below shows the migration of nfit-specifics to
   acpi.c and the new state of libnd being nfit-free.  nd now only
   refers to non-volatile devices.  Note, reworked documentation will
   return once the review has settled.

   Documentation/blockdev/nd.txt |  867 -
   MAINTAINERS   |   34 +-
   arch/ia64/kernel/efi.c|5 +-
   arch/x86/kernel/e820.c|   11 +-
   arch/x86/kernel/pmem.c|2 +-
   drivers/block/Makefile|2 +-
   drivers/block/nd/Kconfig  |  135 ++--
   drivers/block/nd/Makefile |   32 +-
   drivers/block/nd/acpi.c   | 1506 
+++--
   drivers/block/nd/acpi_nfit.h  |  321 
   drivers/block/nd/blk.c|   27 +-
   drivers/block/nd/btt.c|6 +-
   drivers/block/nd/btt_devs.c   |8 +-
   drivers/block/nd/bus.c|  337 +
   drivers/block/nd/core.c   |  574 +-
   drivers/block/nd/dimm.c   |   11 -
   drivers/block/nd/dimm_devs.c  |  292 ++-
   drivers/block/nd/e820.c   |  100 +++
   drivers/block/nd/libnd.h  |  122 +++
   drivers/block/nd/namespace_devs.c |   10 +-
   drivers/block/nd/nd-private.h |  107 +--
   drivers/block/nd/nd.h |   91 +--
   drivers/block/nd/nfit.h   |  238 --
   drivers/block/nd/pmem.c   |   56 +-
   drivers/block/nd/region.c |   78 +-
   drivers/block/nd/region_devs.c|  783 +++
   drivers/block/nd/test/iomap.c |   86 +--
   drivers/block/nd/test/nfit.c  | 1115 +++
   drivers/block/nd/test/nfit_test.h |   15 +-
   include/uapi/linux/ndctl.h|  130 ++--
   30 files changed, 3166 insertions(+), 3935 deletions(-)
   delete mode 100644 Documentation/blockdev/nd.txt
   create mode 100644 drivers/block/nd/acpi_nfit.h
   create mode 100644 drivers/block/nd/e820.c
   create mode 100644 drivers/block/nd/libnd.h
   delete mode 100644 drivers/block/nd/nfit.h

   [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-April/000484.html
   [2]: https://lists.01.org/pipermail/linux-nvdimm/2015-April/000520.html

2/ Christoph asked the pmem ida conversion to be moved to its own patch
   (done), and to consider leaving the current pmem.c in drivers/block/.
   Instead, I converted the e820-type-12 enabling to be the first
   non-ACPI-NFIT based consumer of libnd.  The new nd_e820 driver simply
   registers e820-type-12 ranges as libnd PMEM regions.  Among other
   things this conversion enables BTT for these ranges.  The alternative
   is to move drivers/block/nd/nd.h internals out to include/linux/
   which I think is worse.

3/ Toshi reported that the NFIT parsing fails to handle the case of a
   PMEM range with a single-dimm (non-aliasing) interleave description.
   Support for this case was added and is tested by default by the
   nfit_test.1 configuration.

4/ Toshi reported that we should not be treating a missing _STA property
   as a dimm disabled by firmware case.  (fixed).

5/ Christoph noted that ND_ARCH_HAS_IOREMAP_CACHE needs to be moved to
   arch code.  It is gone for now and we'll revisit when adding cached
   mappings back to the PMEM driver.

6/ Toshi mentioned that the presence of two different nd_bus_probe()
   functions was confusing.  (cleaned up).

7/ Robert asked for s/btt_checksum/nd_btt_checksum/ (done).

8/ Linda asked for nfit_test to honor dynamic cma reservations via the
   cma= command line (done).  The cma requirements have also been
   reduced